nips nips2007 nips2007-137 knowledge-graph by maker-knowledge-mining

137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

Source: pdf

Author: Cha Zhang, Paul A. Viola

Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classiﬁer can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate signiﬁcant performance advantages. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. [sent-5, score-0.592]

2 In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. [sent-6, score-0.299]

3 This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. [sent-7, score-0.906]

4 The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classiﬁer can be safely pruned early; ii) face detection is a multiple instance learning problem. [sent-8, score-0.834]

5 The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. [sent-9, score-0.463]

6 1 Introduction The state of the art in real-time face detection has progressed rapidly in recently years. [sent-11, score-0.408]

7 There are many required parameters: the number and shapes of rectangle ﬁlters, the number of stages, the number of weak classiﬁers in each stage, and the target detection rate for each cascade stage. [sent-14, score-0.691]

8 In each paper, the original cascade structure of distinct and separate stages is relaxed so that earlier computation of weak classiﬁer scores can be combined with later weak classiﬁers. [sent-19, score-0.465]

9 The score assigned to a detection window by the soft cascade is simply a weighted sum of the weak classiﬁers: sk (T ) = j≤T αj hj (xk ), where T is the total number of weak classiﬁers; hj (xk ) is the j th feature computed on example xk ; αj is the vote on weak classiﬁer j. [sent-21, score-1.057]

10 Computation of the sum is terminated early whenever the partial sum falls below a rejection threshold: sk (t) < θ(t). [sent-22, score-0.438]

11 Note the soft cascade 1 is similar to, but simpler than both the boosting chain approach of Xiao, Zhu, and Zhang and the WaldBoost approach of Sochman and Matas. [sent-23, score-0.366]

12 The rejection thresholds θ(t), t ∈ {1, · · · , T − 1} are critical to the performance and speed of the complete classiﬁer. [sent-24, score-0.698]

13 One possibility is to set the rejection thresholds so that no positive example is lost; this leads to very conservative thresholds and a very slow detector. [sent-26, score-0.965]

14 Since the complete classiﬁer will not achieve 100% detection (Note, given practical considerations, the ﬁnal threshold of the complete classiﬁer is set to reject some positive examples because they are difﬁcult to detect. [sent-27, score-0.798]

15 ), it seems justiﬁed to reject positive examples early in return for fast detection speed. [sent-29, score-0.622]

16 The main question is which positive examples can be rejected and when. [sent-30, score-0.303]

17 A key criticism of all previous cascade learning approaches is that none has a scheme to determine which examples are best to reject. [sent-31, score-0.306]

18 Viola-Jones attempted to reject zero positive examples until this become impossible and then reluctantly gave up on one positive example at a time. [sent-32, score-0.406]

19 Bourdev and Brandt proposed a method for setting rejection thresholds based on an ad hoc detection rate target called a “rejection distribution vector”, which is a parameterized exponential curve. [sent-33, score-1.036]

20 SochmanMatas used a ratio test to determine the rejection thresholds. [sent-36, score-0.354]

21 This is a particular problem for the ﬁrst few rejection thresholds, and can lead to low detection rates on test data. [sent-38, score-0.614]

22 This paper proposes a new mechanism for setting the rejection thresholds of any soft-cascade which is conceptually simple, has no tunable parameters beyond the ﬁnal detection rate target, yet yields a cascade which is both highly accurate and very fast. [sent-39, score-1.145]

23 Training data is used to set all reject thresholds after the ﬁnal classiﬁer is learned. [sent-40, score-0.384]

24 There are no assumptions about probability distributions, statistical independence, or ad hoc intermediate targets for detection rate (or false positive rate). [sent-41, score-0.624]

25 We propose a novel algorithm, multiple instance pruning (MIP), to set the reject thresholds automatically, which results in a very efﬁcient cascade detector with superior performance. [sent-43, score-0.956]

26 Our key insight is quite simple: the reject thresholds are set so that they give up on precisely those positive examples which are rejected by the complete classiﬁer. [sent-49, score-0.735]

27 The full classiﬁer rejects a positive example if its ﬁnal score sk (T ) falls below the ﬁnal threshold θ(T ). [sent-51, score-0.351]

28 In the simplest version of our threshold setting algorithm, all trajectories from positive windows which fall below the ﬁnal threshold are removed. [sent-52, score-0.692]

29 Each rejection threshold is then simply: θ(t) = min sk (t) k sk (T )>θ(T ),yk =1 where {xk , yk } is the training set in which yk = 1 indicates positive windows and yk = −1 indicates negative windows. [sent-53, score-1.134]

30 These thresholds produce a reasonably fast classiﬁer which is guaranteed to produce no more errors than the complete classiﬁer (on the training dataset). [sent-54, score-0.415]

31 One might question whether the minimum of all retained trajectories is robust to mislabeled or noisy examples in the training set. [sent-56, score-0.366]

32 Note that the ﬁnal threshold of the complete classiﬁer will often reject mislabeled or noisy examples (though they will be considered false negatives). [sent-57, score-0.525]

33 examples play no role in setting the rejection thresholds. [sent-60, score-0.439]

34 In past approaches, thresholds are set to reject the largest number of negative examples and only a small percentage of positive examples. [sent-62, score-0.61]

35 In the new approach, the ﬁnal threshold of the complete soft-cascade is set to achieve the require detection rate. [sent-64, score-0.441]

36 Rejection thresholds are then set to reject the largest number of negative examples and retain all positive examples which are retained by the complete classiﬁer. [sent-65, score-0.889]

37 The important difference is that the particular positive examples which are rejected are those which are destined to be rejected by the ﬁnal classiﬁer. [sent-66, score-0.458]

38 Note, some negative examples that eventually pass the complete classiﬁer threshold may be pruned by earlier rejection thresholds. [sent-69, score-0.785]

39 In contrast, although the detection rate on the training set can also be guaranteed in Bourdev-Brandt’s algorithm, there is no guarantee that false positive rate will not increase. [sent-71, score-0.65]

40 Bourdev-Brandt propose reordering the weak classiﬁers based on the separation between the mean score of the positive examples and the mean score of the negative examples. [sent-72, score-0.428]

41 While the rejection thresholds are learned using a large set of training examples, this one image demonstrates the basic concepts. [sent-75, score-0.665]

42 The single physical face is consistent with a set of positive detection windows that are within an acceptable range of positions and scales. [sent-77, score-0.809]

43 Typically there are tens of acceptable windows for each face. [sent-78, score-0.304]

44 The blue and magenta trajectories correspond to acceptable windows which fall above the ﬁnal detection threshold. [sent-79, score-0.73]

45 The cyan trajectories are potentially positive windows which fall below the ﬁnal threshold. [sent-80, score-0.477]

46 Since the cyan trajectories are rejected by the ﬁnal classiﬁer, rejection thresholds need only retain the blue and magenta trajectories. [sent-81, score-0.947]

47 Compared with existing approaches, that set the reject thresholds in a heuristic manner, our approach is data-driven and hence more principled. [sent-86, score-0.384]

48 3 Multiple Instance Pruning The notion of an “acceptable detection window” plays a critical role in an improved process for setting rejection thresholds. [sent-87, score-0.614]

49 Recall that the detection process scans the image generating a large, but ﬁnite, collection of overlapping windows at various scales and locations. [sent-94, score-0.459]

50 Even in the absence of ambiguity, some slop is required to ensure that at least one of the generated windows is considered a successful detection for each face. [sent-95, score-0.459]

51 Using typical scanning parameters this can lead to tens of windows which are all equally valid positive detections. [sent-97, score-0.296]

52 If any of these windows is classiﬁed positive then this face is consider detected. [sent-98, score-0.444]

53 Even though all face detection algorithms must address the “multiple window” issue, few papers have discussed it. [sent-99, score-0.448]

54 In this paper, we do not directly address soft-cascade learning, though we will incorporate the “multiple window” observation into the determination of the rejection thresholds. [sent-103, score-0.354]

55 One need only retain one “acceptable” window for each face which is detected by the ﬁnal classiﬁer. [sent-104, score-0.318]

56 A more aggressive threshold is deﬁned as:   θ(t) = min  i∈P max sk (t) k k∈Fi ∩Ri ,yk =1 where i is the index of ground-truth faces; Fi is the set of acceptable windows associated with ground-truth face i and Ri is the set of windows which are “retained” (see below). [sent-105, score-0.891]

57 We call this pruning method multiple instance pruning (MIP). [sent-108, score-0.347]

58 Initially the trajectories from the positive bags which fall above the ﬁnal threshold are retained. [sent-113, score-0.414]

59 The set of retained examples is further reduced as the earlier thresholds are set. [sent-114, score-0.485]

60 This is in contrast to the simpler DBP algorithm where the thresholds are set to preserve all retained positive examples. [sent-115, score-0.463]

61 It guarantees the same face detection rate on the training dataset as the complete classiﬁer. [sent-119, score-0.563]

62 Note that the algorithm is greedy, setting earlier thresholds ﬁrst so that all positive bags are retained and the fewest number of negative examples pass. [sent-120, score-0.68]

63 Theoretically it is possible that delaying the rejection of a particular example may result in a better threshold at a later stage. [sent-121, score-0.487]

64 The MIP algorithm is however guaranteed to generate a soft-cascade that is at least as fast as DBP, since the criteria for setting the thresholds is less restrictive. [sent-123, score-0.313]

65 • A large training set (the whole training set to learn the cascade detector can be reused here). [sent-126, score-0.508]

66 Collect all windows that are above the ﬁnal threshold θ(T ). [sent-128, score-0.332]

67 Record all intermediate scores as s(i, j, t), where i = 1, · · · , N is the face index; j = 1, · · · , Mi is the index of windows that match with face i; t = 1, · · · , T is the index of the feature node. [sent-129, score-0.597]

68 Set θ(t) = mini s(i, t) − as the rejection threshold of node t, = 10−6 . [sent-134, score-0.487]

69 (b) ROC curves of the detector after MIP pruning using the original training set. [sent-140, score-0.388]

70 A soft cascade classiﬁer is learned through a new framework based on weight trimming and bootstrapping (see Appendix). [sent-144, score-0.499]

71 A detected rectangle is considered to be a true detection if it has less than 50% variation in shift and scale from the ground-truth. [sent-149, score-0.35]

72 The original data set for learning the soft cascade is reused for pruning the detector. [sent-193, score-0.496]

73 , the two curves of Bourdev-Brandt soft cascade in Figure 3(a)). [sent-202, score-0.355]

74 One explanation is that in order to achieve higher detection rates one must retain windows which are “ambiguous” and may contain faces. [sent-204, score-0.496]

75 The proposed MIP-based detector yields a much lower false positive rate than the 25-feature Bourdev-Brandt soft cascade and nearly 35% improvement on detection speed. [sent-205, score-0.988]

76 One would expect that if the required detection rate is higher, it world be more difﬁcult to prune. [sent-213, score-0.313]

77 In MIP, when the detection rate increases, there are two conﬂicting factors involved. [sent-214, score-0.313]

78 Second, for each face the number of retained and acceptable windows increases. [sent-216, score-0.561]

79 Both algorithms prune the strong classiﬁer for a target detection rate of 97. [sent-223, score-0.368]

80 It can be seen that the MIP pruned detector has the best detection performance. [sent-228, score-0.499]

81 , α = 4), the B-B pruned detector is still worse than the MIP pruned detector, and its speed is 5 times slower (56. [sent-231, score-0.365]

82 Note, adjusting α leads to changes both in detection time and false positive rate. [sent-238, score-0.462]

83 MIP is fully automated and guarantees detection rate with no increase in false positive rate on the training set. [sent-240, score-0.622]

84 On the other hand, if speed is the dominant factor, one can specify a target detection rate and target execution time and use B-B to ﬁnd a solution. [sent-242, score-0.414]

85 While with our detector the performance of B-B is acceptable even when α = −16, the performance of the detector in [1] drops signiﬁcantly from 37 features to 25 features, as shown in Fig. [sent-262, score-0.409]

86 5 Conclusions We have presented a simple yet effective way to set the rejection thresholds of a given soft-cascade, called multiple instance pruning (MIP). [sent-265, score-0.81]

87 MIP then adds a set of rejection thresholds to construct a cascade detector. [sent-267, score-0.832]

88 The rejection thresholds are determined so that every face which was detected by the original strong classiﬁer is guaranteed to be detected by the soft cascade. [sent-268, score-0.995]

89 The algorithm also guarantees that the false positive rate on the training set will not increase. [sent-269, score-0.309]

90 There is only one parameter used throughout the cascade training process, the target detection rate for the ﬁnal system. [sent-270, score-0.619]

91 Moreover, there are no required assumptions about probability distributions, statistical independence, or ad hoc intermediate targets for detection rate or false positive rate. [sent-271, score-0.624]

92 These preliminary rejection thresholds are extremely conservative, retaining all positive examples in the training set. [sent-283, score-0.875]

93 We set the preliminary thresholds only to moderately speed up the computation of ROC curves before MIP. [sent-286, score-0.358]

94 Initialize • Take all positive examples and randomly sample negative examples to form a subset of Q examples. [sent-290, score-0.311]

95 • Take all positive examples and randomly sample negative examples from the trimmed training set to form a new subset of Q examples. [sent-303, score-0.365]

96 Set preliminary rejection threshold θ(t) of α h as the minimum score of all positive j=1 j j examples at stage t. [sent-305, score-0.786]

97 Output Weak classiﬁers ht , t = 1, · · · , T , the associated conﬁdences αt and preliminary rejection thresholds θ(t). [sent-306, score-0.667]

98 Rapid object detection using a boosted cascade of simple features. [sent-382, score-0.481]

99 Fast rotation invariant multi-view face detection based on real adaboost. [sent-398, score-0.408]

100 Learning a rare event detection cascade by direct feature selection. [sent-407, score-0.481]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mip', 0.447), ('rejection', 0.354), ('detection', 0.26), ('thresholds', 0.257), ('cascade', 0.221), ('windows', 0.199), ('detector', 0.152), ('face', 0.148), ('pruning', 0.148), ('dbp', 0.137), ('threshold', 0.133), ('er', 0.129), ('classi', 0.128), ('reject', 0.127), ('rejected', 0.121), ('trimming', 0.119), ('retained', 0.109), ('acceptable', 0.105), ('false', 0.105), ('soft', 0.1), ('nal', 0.099), ('positive', 0.097), ('trajectories', 0.091), ('window', 0.091), ('pruned', 0.087), ('examples', 0.085), ('cmu', 0.08), ('weak', 0.078), ('bourdev', 0.068), ('waldboost', 0.068), ('viola', 0.068), ('alpha', 0.063), ('score', 0.062), ('sk', 0.059), ('ers', 0.058), ('bags', 0.054), ('training', 0.054), ('detectors', 0.054), ('rate', 0.053), ('brandt', 0.051), ('cascaded', 0.051), ('sochman', 0.051), ('cyan', 0.051), ('complete', 0.048), ('rectangle', 0.048), ('zhang', 0.047), ('hoc', 0.046), ('boosting', 0.045), ('final', 0.045), ('yk', 0.045), ('negative', 0.044), ('detected', 0.042), ('frontal', 0.04), ('papers', 0.04), ('speed', 0.039), ('faces', 0.039), ('fall', 0.039), ('retain', 0.037), ('wu', 0.037), ('magenta', 0.036), ('weight', 0.035), ('ad', 0.035), ('destined', 0.034), ('floatboost', 0.034), ('indows', 0.034), ('xiao', 0.034), ('zhu', 0.034), ('curves', 0.034), ('earlier', 0.034), ('cvpr', 0.033), ('roc', 0.032), ('visited', 0.031), ('target', 0.031), ('bag', 0.03), ('xk', 0.03), ('dences', 0.03), ('nowlan', 0.03), ('adaboost', 0.029), ('preliminary', 0.028), ('stages', 0.028), ('guaranteed', 0.028), ('intermediate', 0.028), ('fast', 0.028), ('ht', 0.028), ('eccv', 0.027), ('reused', 0.027), ('negatives', 0.027), ('mislabeled', 0.027), ('aggressively', 0.027), ('instance', 0.027), ('stage', 0.027), ('scores', 0.026), ('early', 0.025), ('cult', 0.025), ('multiple', 0.024), ('platt', 0.024), ('bootstrapping', 0.024), ('aggressive', 0.024), ('index', 0.024), ('strong', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

Author: Cha Zhang, Paul A. Viola

2 0.14844388 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

Author: Duan Tran, David A. Forsyth

Abstract: Fair discriminative pedestrian ﬁnders are now available. In fact, these pedestrian ﬁnders make most errors on pedestrians in conﬁgurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human conﬁguration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian ﬁnder which ﬁrst ﬁnds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that conﬁguration to an SVM classiﬁer. We show, using the INRIA Person dataset, that estimates of conﬁguration signiﬁcantly improve the accuracy of a discriminative pedestrian ﬁnder. 1

3 0.13455682 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch

Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency signiﬁcantly outperforms a low-level model in predicting locations humans ﬁxate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, ﬁxate on a face with a probability of over 80% within their ﬁrst two ﬁxations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1

4 0.10681415 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

Author: Michael Ross, Andrew Cohen

Abstract: This paper describes a new model for human visual classiﬁcation that enables the recovery of image features that explain human subjects’ performance on different visual classiﬁcation tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classiﬁer operating on raw image pixels. Instead, it represents classiﬁcation as the combination of multiple feature detectors. This approach extracts more information about human visual classiﬁcation than previous methods and provides a foundation for further exploration. 1

5 0.10673878 147 nips-2007-One-Pass Boosting

Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio

Abstract: This paper studies boosting algorithms that make a single pass over a set of base classiﬁers. We ﬁrst analyze a one-pass algorithm in the setting of boosting with diverse base classiﬁers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classiﬁers can outperform the standard AdaBoost algorithm, which uses every base classiﬁer, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

6 0.10454079 39 nips-2007-Boosting the Area under the ROC Curve

7 0.097914606 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

8 0.080674686 143 nips-2007-Object Recognition by Scene Alignment

9 0.069858484 136 nips-2007-Multiple-Instance Active Learning

10 0.067666441 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

11 0.060945369 109 nips-2007-Kernels on Attributed Pointsets with Applications

12 0.059244394 166 nips-2007-Regularized Boost for Semi-Supervised Learning

13 0.058144066 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

14 0.053070456 175 nips-2007-Semi-Supervised Multitask Learning

15 0.052297954 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

16 0.051213071 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

17 0.051144842 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

18 0.050480932 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

19 0.048789505 110 nips-2007-Learning Bounds for Domain Adaptation

20 0.048495043 69 nips-2007-Discriminative Batch Mode Active Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.157), (1, 0.041), (2, -0.047), (3, 0.023), (4, 0.054), (5, 0.206), (6, 0.108), (7, 0.054), (8, 0.079), (9, -0.062), (10, -0.014), (11, -0.023), (12, -0.096), (13, -0.039), (14, -0.114), (15, 0.027), (16, 0.0), (17, 0.045), (18, -0.1), (19, -0.023), (20, -0.004), (21, 0.092), (22, -0.073), (23, 0.172), (24, -0.012), (25, -0.044), (26, 0.006), (27, 0.092), (28, 0.013), (29, 0.095), (30, -0.101), (31, -0.05), (32, 0.045), (33, 0.082), (34, 0.081), (35, 0.031), (36, -0.084), (37, 0.016), (38, 0.114), (39, 0.057), (40, 0.016), (41, -0.009), (42, 0.101), (43, 0.03), (44, 0.084), (45, -0.148), (46, 0.066), (47, -0.125), (48, 0.114), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96359646 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

Author: Cha Zhang, Paul A. Viola

2 0.69260395 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

Author: Duan Tran, David A. Forsyth

3 0.69241858 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

Author: Michael Ross, Andrew Cohen

4 0.61962622 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

Author: Marco Barreno, Alvaro Cardenas, J. D. Tygar

Abstract: We present a new analysis for the combination of binary classiﬁers. Our analysis makes use of the Neyman-Pearson lemma as a theoretical basis to analyze combinations of classiﬁers. We give a method for ﬁnding the optimal decision rule for a combination of classiﬁers and prove that it has the optimal ROC curve. We show how our method generalizes and improves previous work on combining classiﬁers and generating ROC curves. 1

5 0.58795065 39 nips-2007-Boosting the Area under the ROC Curve

Author: Phil Long, Rocco Servedio

Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efﬁciently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassiﬁcation noise, given access to a noise-tolerant weak ranker.

6 0.5369913 147 nips-2007-One-Pass Boosting

7 0.43241739 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection

8 0.42174506 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI

9 0.39034998 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

10 0.38412136 143 nips-2007-Object Recognition by Scene Alignment

11 0.3819322 113 nips-2007-Learning Visual Attributes

12 0.37278086 175 nips-2007-Semi-Supervised Multitask Learning

13 0.36522594 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

14 0.3573007 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events

15 0.35529473 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

16 0.35497448 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

17 0.34132126 109 nips-2007-Kernels on Attributed Pointsets with Applications

18 0.32764715 174 nips-2007-Selecting Observations against Adversarial Objectives

19 0.32717136 171 nips-2007-Scan Strategies for Meteorological Radars

20 0.31559333 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.01), (5, 0.034), (13, 0.028), (16, 0.034), (18, 0.012), (19, 0.016), (21, 0.067), (31, 0.027), (34, 0.015), (35, 0.041), (47, 0.081), (76, 0.333), (83, 0.157), (85, 0.012), (87, 0.014), (90, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77277094 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr

Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random ﬁeld (i.e. a random ﬁeld deﬁned using a ﬁnite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation ﬁrst proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the ﬂexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by deﬁning a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.

same-paper 2 0.75463426 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

Author: Cha Zhang, Paul A. Viola

3 0.68344301 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

Author: Tony Jebara, Yingbo Song, Kapil Thadani

Abstract: A method is proposed for semiparametric estimation where parametric and nonparametric criteria are exploited in density estimation and unsupervised learning. This is accomplished by making sampling assumptions on a dataset that smoothly interpolate between the extreme of independently distributed (or id) sample data (as in nonparametric kernel density estimators) to the extreme of independent identically distributed (or iid) sample data. This article makes independent similarly distributed (or isd) sampling assumptions and interpolates between these two using a scalar parameter. The parameter controls a Bhattacharyya afﬁnity penalty between pairs of distributions on samples. Surprisingly, the isd method maintains certain consistency and unimodality properties akin to maximum likelihood estimation. The proposed isd scheme is an alternative for handling nonstationarity in data without making drastic hidden variable assumptions which often make estimation difﬁcult and laden with local optima. Experiments in density estimation on a variety of datasets conﬁrm the value of isd over iid estimation, id estimation and mixture modeling.

4 0.52217031 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efﬁcient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

5 0.51759744 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

Author: Charles L. Isbell, Michael P. Holmes, Alexander G. Gray

Abstract: Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n2 ) or higher, which severely limits application to large datasets. We present a multi-stage stratiﬁed Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as standard multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 1014 , many orders of magnitude beyond the previous state of the art. 1

6 0.51638246 63 nips-2007-Convex Relaxations of Latent Variable Training

7 0.51615191 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

8 0.51603431 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

9 0.5145247 116 nips-2007-Learning the structure of manifolds using random projections

10 0.51342207 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

11 0.51341009 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

12 0.51330423 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

13 0.513165 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

14 0.51297754 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

15 0.51268935 115 nips-2007-Learning the 2-D Topology of Images

16 0.51266307 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

17 0.51221889 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

18 0.51221776 16 nips-2007-A learning framework for nearest neighbor search

19 0.51200986 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

20 0.51020616 187 nips-2007-Structured Learning with Approximate Inference