nips nips2010 nips2010-42 knowledge-graph by maker-knowledge-mining

42 nips-2010-Boosting Classifier Cascades


Source: pdf

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The problem of optimal and automatic design of a detector cascade is considered. [sent-4, score-0.671]

2 A boosting algorithm, FCBoost, is proposed for fully automated cascade design. [sent-7, score-0.638]

3 It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. [sent-8, score-0.597]

4 It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. [sent-9, score-0.696]

5 This problem has been the focus of substantial attention since the introduction of the detector cascade architecture by Viola and Jones (VJ) in [13]. [sent-14, score-0.612]

6 This architecture was used to design the first real time face detector with state-of-the-art classification accuracy. [sent-15, score-0.223]

7 The detector has, since, been deployed in many practical applications of broad interest, e. [sent-16, score-0.118]

8 While the resulting detector is fast and accurate, the process of designing a cascade is not. [sent-20, score-0.624]

9 In particular, VJ did not address the problem of how to automatically determine the optimal cascade configuration, e. [sent-21, score-0.507]

10 the numbers of cascade stages and weak learners per stage, or even how to design individual stages so as to guarantee optimality of the cascade as a whole. [sent-23, score-1.305]

11 In result, extensive manual supervision is required to design cascades with good speed/accuracy trade off. [sent-24, score-0.177]

12 This includes trialand-error tuning of the false positive/detection rate of each stage, and of the cascade configuration. [sent-25, score-0.523]

13 In practice, the design of a good cascade can take up several weeks. [sent-26, score-0.54]

14 This has motivated a number of enhancements to the VJ training procedure, which can be organized into three main areas: 1) enhancement of the boosting algorithms used in cascade design, e. [sent-27, score-0.657]

15 cost-sensitive variations of boosting [12, 4, 8], float Boost [5] or KLBoost [6], 2) post processing of a learned cascade, by adjusting stage thresholds, to improve performance [7], and 3) specialized cascade architectures which simplify the learning process, e. [sent-29, score-0.765]

16 the embedded cascade (ChainBoost) of [15], where each stage contains all weak learners of previous stages. [sent-31, score-0.764]

17 These enhancements do not address the fundamental limitations of the VJ design, namely how to guarantee overall cascade optimality. [sent-32, score-0.513]

18 However, the proposed algorithms still rely on sequential learning of cascade stages, which is suboptimal, sometimes require manual supervision, do not search over cascade configurations, and frequently lack a precise mathematical model for the cascade. [sent-43, score-1.001]

19 The first is a mathematical model for a detector cascade, which is analytically tractable, accounts for both classification and complexity, and is amenable to recursive computation. [sent-45, score-0.152]

20 The second is a boosting algorithm, FCBoost, that exploits this model to solve the cascade learning problem. [sent-46, score-0.638]

21 FCBoost solves a Lagrangian optimization problem, where the classification risk is minimized under complexity constraints. [sent-47, score-0.108]

22 The risk is that of the entire cascade, which is learned holistically, rather than through sequential stage design, and FCBoost determines the optimal cascade configuration automatically. [sent-48, score-0.706]

23 It is also compatible with bootstrapping and cost sensitive boosting extensions, enabling efficient sampling of negative examples and explicit control of the false positive/detection rate trade off. [sent-49, score-0.303]

24 , (xn , yn )} is a set of training examples, yi ∈ {1, −1} the class label of example xi , and L[y, f (x)] a loss function. [sent-55, score-0.12]

25 This is achieved by defining a computational risk 1 RC (f ) = EX,Y {LC [y, C(f (x))]} ≃ LC [yi , C(f (xi ))] (2) |St | i where C(f (x)) is the complexity of evaluating f (x), and LC [y, C(f (x))] a loss function that encodes the cost of this operation. [sent-59, score-0.136]

26 For example in boosting, where the decision rule is a combination of weak rules, a finer approximation of the classification boundary (smaller error) requires more weak learners and computation. [sent-65, score-0.197]

27 Figure 1 illustrates this trade-off, by plotting the evolution of RL and L as a function of the boosting iteration, for the AdaBoost algorithm [2]. [sent-68, score-0.144]

28 While the risk always decreases with the addition of weak learners, this is not true for the Lagrangian. [sent-69, score-0.146]

29 The design of classifiers under complexity constraints has been addressed through the introduction of detector cascades. [sent-71, score-0.2]

30 A detector cascade H(x) implements a sequence of binary decisions hi (x), i = 1 . [sent-72, score-0.647]

31 An example x is declared a target (y = 1) if and only if it is declared a target by all stages of H, i. [sent-76, score-0.114]

32 For applications where the majority of examples can be rejected after a small number of cascade stages, the average classification time is very small. [sent-80, score-0.531]

33 However, the problem of designing an optimal detector cascade is still poorly understood. [sent-81, score-0.637]

34 A popular approach, known as ChainBoost or embedded cascade [15], is to 1) use standard boosting algorithms to design a detector, and 2) insert a rejection point after each weak learner. [sent-82, score-0.793]

35 This is simple to implement, and creates a cascade with as many stages as weak learners. [sent-83, score-0.642]

36 However, the introduction of the intermediate rejection points, a posteriori of detector design, sacrifices the risk-optimality of the detector. [sent-84, score-0.133]

37 the addition of weak learners no longer carries a large complexity penalty. [sent-88, score-0.159]

38 This is due to the fact that most negative examples are rejected in the earliest cascade stages. [sent-89, score-0.531]

39 3 Classifier cascades In this work, we seek the design of cascades that are provably optimal under (4). [sent-92, score-0.229]

40 We start by introducing a mathematical model for a detector cascade. [sent-93, score-0.118]

41 , hm (x)} be a cascade of m detectors hi (x) = sgn[fi (x)]. [sent-98, score-0.534]

42 The cascade implements the decision rule H(F)(x) = sgn[F(x)] (5) where F(x) = F(f1 , f2 )(x) = f1 (x) f2 (x) if f1 (x) < 0 if f1 (x) ≥ 0 = f1 u(−f1 ) + u(f1 )f2 (6) (7) is denoted the cascade predictor, u(. [sent-100, score-1.002]

43 This equation can be extended to a cascade of m stages, by replacing the predictor of the second stage, when m = 2, with the predictor of the remaining cascade, when m is larger. [sent-102, score-0.674]

44 , fm ) be the cascade predictor for the cascade composed of stages j to m F = F1 = f1 u(−f1 ) + u(f1 )F2 . [sent-106, score-1.276]

45 (8) More generally, the following recursion holds Fk = fk u(−fk ) + u(fk )Fk+1 (9) with initial condition Fm = fm . [sent-107, score-0.772]

46 In Appendix A, it is shown that combining (8) and (9) recursively leads to F = T1,m + T2,m fm = T1,k + T2,k fk u(−fk ) + T2,k Fk+1 u(fk ), k < m. [sent-108, score-0.784]

47 (10) (11) with initial conditions T1,0 = 0, T2,0 = 1 and T1,k+1 = T1,k + fk u(−fk ) T2,k , T2,k+1 = T2,k u(fk ). [sent-109, score-0.648]

48 (12) Since T1,k , T2,k , and Fk+1 do not depend on fk , (10) and (11) make explicit the dependence of the cascade predictor, F, on the predictor of the k th stage. [sent-110, score-1.263]

49 fm ), the design of boosting algorithms requires the evaluation of both F(fk + ǫg), and the functional derivative of F with respect to each fk , along any direction g d < δF(fk ), g >= F(fk + ǫg) . [sent-116, score-0.987]

50 dǫ ǫ=0 These are straightforward for the last stage since, from (10), F(fm + ǫg) = am + ǫbm g, < δF(fm ), g >= bm g, (13) where am = T1,m + T2,m fm = F(fm ), bm = T2,m . [sent-117, score-0.297]

51 Using this approximation in (11), 2 F = F(fk ) = T1,k + T2,k fk (1 − u(fk )) + T2,k Fk+1 u(fk ) (15) 1 ≈ T1,k + T2,k fk + T2,k [Fk+1 − fk ][tanh(σfk ) + 1]. [sent-121, score-1.944]

52 (16) 2 It follows that < δF(fk ), g > = bk g (17) 1 T2,k [1 − tanh(σfk )] + σ[Fk+1 − fk ][1 − tanh2 (σfk )] . [sent-122, score-0.703]

53 (18) bk = 2 F(fk + ǫg) can also be simplified by resorting to a first order Taylor series expansion around fk F(fk + ǫg) ≈ ak + ǫbk g (19) 1 ak = F(fk ) = T1,k + T2,k fk + [Fk+1 − fk ][tanh(σfk ) + 1] . [sent-123, score-2.069]

54 Denoting by C(fk ) the complexity of evaluating fk , it is shown that C(F) = P1,k + P2,k C(fk ) + P2,k u(fk )C(Fk+1 ). [sent-126, score-0.684]

55 (22) This makes explicit the dependence of the cascade complexity on the complexity of the k th stage. [sent-128, score-0.597]

56 In practice, fk = l cl gl for gl ∈ U, where U is a set of functions of approximately identical complexity. [sent-129, score-0.708]

57 The first is a predictor that is also used in a previous cascade stage, e. [sent-132, score-0.584]

58 fk (x) = fk−1 (x) + cg(x) for an embedded cascade. [sent-134, score-0.668]

59 In this case, fk−1 (x) has already been evaluated in stage k − 1 and is available with no computational cost. [sent-135, score-0.127]

60 The second is the set O(fk ) of features that have been used in some stage j ≤ k. [sent-136, score-0.139]

61 The third is the set N (fk ) of features that have not been used in any stage j ≤ k. [sent-138, score-0.139]

62 This implies that repeating the last stage of a cascade does not change its decision rule. [sent-152, score-0.621]

63 For this reason n(x) = fm (x) is referred to as the neutral predictor of a cascade of m stages. [sent-153, score-0.751]

64 4 Boosting classifier cascades In this section, we introduce a boosting algorithm for cascade design. [sent-154, score-0.723]

65 1 Boosting Boosting algorithms combine weak learners to produce a complex decision boundary. [sent-156, score-0.123]

66 Boosting iterations are gradient descent steps towards the predictor f (x) of minimum risk for the loss L[y, f (x)] = e−yf (x) [3]. [sent-157, score-0.203]

67 Given a set U of weak learners, the functional derivative of RL along the direction of weak leaner g is 1 |St | < δRL (f ), g > = i d −yi (f (xi )+ǫg(xi )) e dǫ =− ǫ=0 1 |St | yi wi g(xi ), (28) i where wi = e−yi f (xi ) is the weight of xi . [sent-158, score-0.357]

68 ∗ i wi I(yi = g (xi )) 1 log 2 i (30) The predictor is updated into f (x) = f (x) + c∗ g ∗ (x) and the procedure iterated. [sent-161, score-0.129]

69 2 Cascade risk minimization To derive a boosting algorithm for the design of detector cascades, we adopt the loss L[y, F(f1 , . [sent-163, score-0.394]

70 ,fm )(x) , and minimize the cascade risk RL (F) = EX,Y {e−yF (f1 ,. [sent-169, score-0.566]

71 i Using (13) and (19), < δRL (F(fk )), g >= k wi −yi ak (xi ) 1 |St | d −yi [ak (xi )+ǫbk (xi )g(xi )] e dǫ i bk i k k =− ǫ=0 1 |St | k yi wi bk g(xi ) (31) i i k where = e , = b (xi ) and a , b are given by (14), (18), and (20). [sent-176, score-0.283]

72 The optimal descent direction and step size for the k th stage are then ∗ gk = arg max < −δRL (F(fk )), g > c∗ k = arg min g∈U c∈R k ∗ k wi e−yi bi cgk (xi ) . [sent-177, score-0.306]

73 Given the optimal c∗ , g ∗ for all stages, the impact of each update in the overall cascade risk, RL , is evaluated and the stage of largest impact is updated. [sent-182, score-0.664]

74 3 Adding a new stage Searching for the optimal cascade configuration requires support for the addition of new stages, whenever necessary. [sent-184, score-0.634]

75 This is accomplished by including a neutral predictor as the last stage of the cascade. [sent-185, score-0.26]

76 If adding a weak learner to the neutral stage reduces the risk further than the corresponding addition to any other stage, a new stage (containing the neutral predictor plus the weak learner) is created. [sent-186, score-0.65]

77 Since this new stage includes the last stage of the previous cascade, the process mimics the design of an embedded cascade. [sent-187, score-0.32]

78 However, there are no restrictions that a new stage should be added at each boosting iteration, or consist of a single weak learner. [sent-188, score-0.345]

79 This requires the computation of the functional derivatives < δRC (F(fk )), g >= 1 − |St | d LC [C(F(fk + ǫg)(xi )] dǫ s yi i (34) ǫ=0 s where yi = I(yi = −1). [sent-191, score-0.12]

80 The derivative of (4) with respect to the k th stage predictor is then < δL(F(fk )), g > = < δRL (F(fk )), g > +η < δRC (F(fk )), g > k yi wi bk ys ψk β k i = − + η i i i g(xi ) − |St | |St | i (38) (39) k k with wi = e−yi a (xi ) and ak and bk given by (14), (18), and (20). [sent-195, score-0.543]

81 Given a set of weak learners U, the optimal descent direction and step size for the k th stage are then ∗ gk = arg max < −δL(F(fk )), g > (40) g∈U c∗ = arg min k c∈R 1 |St | k ∗ k wi e−yi bi cgk (xi ) + i η − |St | ∗ s k k yi ψi βi ecgk (xi ) . [sent-196, score-0.489]

82 The one k,1 that most reduces (4) is selected as the best update for the k th stage and the stage with the largest impact is updated. [sent-198, score-0.3]

83 For example, the risk of CS-AdaBoost: RL (f ) = EX,Y {y c e−yf (x) } [12] or Asym-AdaBoost: RL (f ) = c EX,Y {e−y yf (x) } [8], where y c = CI(y = −1) + (1 − C)I(y = 1) and C is a cost factor. [sent-203, score-0.166]

84 36 Train Set pos neg 9,000 9,000 1,000 10,000 1,000 10,000 Test Set pos neg 832 832 100 2,000 200 2,000 0. [sent-205, score-0.094]

85 Right: Trade-off between the error (RL ) and complexity (RC ) components of the risk as η changes in (4). [sent-209, score-0.108]

86 Since FCBoost learns all cascade stages simultaneously, and any stage can change after bootstrapping, this condition is violated. [sent-240, score-0.695]

87 This method is used to update the training set whenever the false positive rate of the cascade being learned reaches 50%. [sent-243, score-0.523]

88 Figure 2 plots the accuracy component of the risk, RL , as a function of the complexity component, RC , on the face data set, for cascades trained with different η. [sent-248, score-0.18]

89 As expected, as η decreases the cascade has lower error and higher complexity. [sent-251, score-0.494]

90 Cascade comparison: Figure 3 (a) repeats the plots of the Lagrangian of the risk shown in Figure 1, for classifiers trained with 50 boosting iterations, on the face data. [sent-254, score-0.275]

91 This reflects the fact that FCBoost (η = 0) does produce a cascade, but this cascade has worse accuracy/complexity trade-off than that of ChainBoost. [sent-258, score-0.494]

92 On the other hand, ChainBoost cascades are always the fastest, at the cost of the highest classification error. [sent-263, score-0.099]

93 02) achieves the best accuracy/complexity trade-off: its cascade has the lowest risk Lagrangian L. [sent-265, score-0.566]

94 4 0 10 20 30 40 80 50 0 25 50 75 100 125 Number of False Positives Iterations (a) 150 (b) Figure 3: a) Lagrangian of the risk for classifiers trained with various boosting algorithms. [sent-275, score-0.216]

95 b) ROC of various detector cascades on the MIT-CMU data set. [sent-276, score-0.203]

96 2 Face detection: We finish with a face detector designed with FCBoost (η = 0. [sent-283, score-0.177]

97 To make the detector cost-sensitive, we used CS-AdaBoost with C = 0. [sent-285, score-0.118]

98 Table 2 presents a similar comparison for the detector speed (average number of features evaluated per patch). [sent-288, score-0.13]

99 Note the superior performance of the FCBoost cascade in terms of both accuracy and speed. [sent-289, score-0.494]

100 To the best of our knowledge, this is the fastest face detector reported to date. [sent-290, score-0.192]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fk', 0.648), ('cascade', 0.494), ('fcboost', 0.273), ('chainboost', 0.169), ('boosting', 0.144), ('stage', 0.127), ('fm', 0.124), ('detector', 0.118), ('rl', 0.118), ('lc', 0.118), ('adaboost', 0.099), ('st', 0.093), ('predictor', 0.09), ('cascades', 0.085), ('rc', 0.085), ('yf', 0.08), ('weak', 0.074), ('stages', 0.074), ('risk', 0.072), ('yi', 0.06), ('face', 0.059), ('bootstrapping', 0.058), ('haar', 0.057), ('bk', 0.055), ('learners', 0.049), ('lagrangian', 0.049), ('classi', 0.046), ('design', 0.046), ('xi', 0.046), ('pedestrian', 0.044), ('neutral', 0.043), ('wi', 0.039), ('vj', 0.038), ('complexity', 0.036), ('ak', 0.035), ('floatboost', 0.034), ('waldboost', 0.034), ('tanh', 0.033), ('er', 0.032), ('th', 0.031), ('car', 0.03), ('false', 0.029), ('ers', 0.027), ('cgk', 0.026), ('neg', 0.026), ('saberian', 0.026), ('rejected', 0.025), ('detection', 0.024), ('bm', 0.023), ('efk', 0.023), ('gl', 0.022), ('positives', 0.022), ('nuno', 0.021), ('pos', 0.021), ('hi', 0.021), ('embedded', 0.02), ('declared', 0.02), ('negatives', 0.02), ('fj', 0.02), ('detectors', 0.019), ('enhancements', 0.019), ('jolla', 0.019), ('cg', 0.019), ('viola', 0.019), ('supervision', 0.018), ('jones', 0.017), ('accounts', 0.017), ('sensitive', 0.017), ('recursive', 0.017), ('cl', 0.016), ('letting', 0.016), ('fastest', 0.015), ('rejection', 0.015), ('iterations', 0.015), ('impact', 0.015), ('arg', 0.015), ('sgn', 0.015), ('trade', 0.015), ('fi', 0.015), ('gk', 0.015), ('cost', 0.014), ('implements', 0.014), ('guration', 0.014), ('diego', 0.014), ('loss', 0.014), ('compatible', 0.014), ('direction', 0.013), ('predictors', 0.013), ('manual', 0.013), ('optimal', 0.013), ('designing', 0.012), ('roc', 0.012), ('derivative', 0.012), ('examples', 0.012), ('la', 0.012), ('features', 0.012), ('descent', 0.012), ('recursively', 0.012), ('oat', 0.011), ('outstanding', 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

2 0.47915101 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

3 0.21120144 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

4 0.1171642 15 nips-2010-A Theory of Multiclass Boosting

Author: Indraneel Mukherjee, Robert E. Schapire

Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1

5 0.1121522 282 nips-2010-Variable margin losses for classifier design

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1

6 0.082674451 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

7 0.076805636 190 nips-2010-On the Convexity of Latent Social Network Inference

8 0.055018805 243 nips-2010-Smoothness, Low Noise and Fast Rates

9 0.053203776 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

10 0.052091345 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning

11 0.051546775 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

12 0.048873536 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

13 0.047474392 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

14 0.046500362 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars

15 0.046498325 192 nips-2010-Online Classification with Specificity Constraints

16 0.045021493 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

17 0.044940546 66 nips-2010-Double Q-learning

18 0.044838306 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

19 0.044565555 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

20 0.041945744 22 nips-2010-Active Estimation of F-Measures


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.119), (1, 0.022), (2, 0.009), (3, -0.109), (4, 0.037), (5, 0.051), (6, -0.164), (7, -0.023), (8, 0.034), (9, 0.028), (10, -0.157), (11, 0.047), (12, 0.054), (13, 0.146), (14, -0.05), (15, 0.07), (16, 0.179), (17, 0.3), (18, -0.463), (19, 0.185), (20, -0.027), (21, -0.027), (22, -0.1), (23, 0.017), (24, -0.081), (25, 0.013), (26, -0.103), (27, -0.076), (28, -0.109), (29, -0.016), (30, -0.078), (31, -0.01), (32, 0.026), (33, 0.06), (34, 0.005), (35, -0.038), (36, 0.004), (37, -0.025), (38, 0.029), (39, 0.008), (40, 0.051), (41, -0.014), (42, 0.021), (43, 0.003), (44, -0.009), (45, 0.045), (46, -0.012), (47, 0.033), (48, 0.034), (49, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96783412 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

2 0.84840059 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

3 0.5784632 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

4 0.49670812 15 nips-2010-A Theory of Multiclass Boosting

Author: Indraneel Mukherjee, Robert E. Schapire

Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1

5 0.36022621 282 nips-2010-Variable margin losses for classifier design

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1

6 0.34669098 190 nips-2010-On the Convexity of Latent Social Network Inference

7 0.3335847 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

8 0.27436876 192 nips-2010-Online Classification with Specificity Constraints

9 0.25267243 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

10 0.22894678 228 nips-2010-Reverse Multi-Label Learning

11 0.22875622 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

12 0.19768627 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

13 0.18416503 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

14 0.17859223 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies

15 0.17704621 19 nips-2010-A rational decision making framework for inhibitory control

16 0.17134729 22 nips-2010-Active Estimation of F-Measures

17 0.17052409 278 nips-2010-Universal Consistency of Multi-Class Support Vector Classification

18 0.16262197 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

19 0.16106969 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

20 0.16021632 115 nips-2010-Identifying Dendritic Processing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.026), (17, 0.016), (27, 0.037), (30, 0.034), (45, 0.145), (50, 0.455), (52, 0.023), (60, 0.025), (77, 0.05), (78, 0.031), (90, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90032387 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

Abstract: Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L´ vy and Gaussian distrie butions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise. 1

2 0.89935237 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

3 0.87769932 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random fields (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive definite systems, this yields a very efficient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efficient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the first layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as finite or infinite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly benefit from the proposed methods. 1

same-paper 4 0.8634581 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

5 0.81100029 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

6 0.75512469 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

7 0.60592067 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

8 0.60335416 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

9 0.60314918 54 nips-2010-Copula Processes

10 0.59111786 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

11 0.57558924 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

12 0.57171249 96 nips-2010-Fractionally Predictive Spiking Neurons

13 0.56760412 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

14 0.55870092 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

15 0.54751503 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

16 0.54293734 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

17 0.54133254 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

18 0.53670275 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

19 0.53635556 158 nips-2010-Learning via Gaussian Herding

20 0.53585702 217 nips-2010-Probabilistic Multi-Task Feature Selection