nips nips2007 nips2007-137 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. [sent-5, score-0.592]
2 In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. [sent-6, score-0.299]
3 This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. [sent-7, score-0.906]
4 The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. [sent-8, score-0.834]
5 The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. [sent-9, score-0.463]
6 1 Introduction The state of the art in real-time face detection has progressed rapidly in recently years. [sent-11, score-0.408]
7 There are many required parameters: the number and shapes of rectangle filters, the number of stages, the number of weak classifiers in each stage, and the target detection rate for each cascade stage. [sent-14, score-0.691]
8 In each paper, the original cascade structure of distinct and separate stages is relaxed so that earlier computation of weak classifier scores can be combined with later weak classifiers. [sent-19, score-0.465]
9 The score assigned to a detection window by the soft cascade is simply a weighted sum of the weak classifiers: sk (T ) = j≤T αj hj (xk ), where T is the total number of weak classifiers; hj (xk ) is the j th feature computed on example xk ; αj is the vote on weak classifier j. [sent-21, score-1.057]
10 Computation of the sum is terminated early whenever the partial sum falls below a rejection threshold: sk (t) < θ(t). [sent-22, score-0.438]
11 Note the soft cascade 1 is similar to, but simpler than both the boosting chain approach of Xiao, Zhu, and Zhang and the WaldBoost approach of Sochman and Matas. [sent-23, score-0.366]
12 The rejection thresholds θ(t), t ∈ {1, · · · , T − 1} are critical to the performance and speed of the complete classifier. [sent-24, score-0.698]
13 One possibility is to set the rejection thresholds so that no positive example is lost; this leads to very conservative thresholds and a very slow detector. [sent-26, score-0.965]
14 Since the complete classifier will not achieve 100% detection (Note, given practical considerations, the final threshold of the complete classifier is set to reject some positive examples because they are difficult to detect. [sent-27, score-0.798]
15 ), it seems justified to reject positive examples early in return for fast detection speed. [sent-29, score-0.622]
16 The main question is which positive examples can be rejected and when. [sent-30, score-0.303]
17 A key criticism of all previous cascade learning approaches is that none has a scheme to determine which examples are best to reject. [sent-31, score-0.306]
18 Viola-Jones attempted to reject zero positive examples until this become impossible and then reluctantly gave up on one positive example at a time. [sent-32, score-0.406]
19 Bourdev and Brandt proposed a method for setting rejection thresholds based on an ad hoc detection rate target called a “rejection distribution vector”, which is a parameterized exponential curve. [sent-33, score-1.036]
20 SochmanMatas used a ratio test to determine the rejection thresholds. [sent-36, score-0.354]
21 This is a particular problem for the first few rejection thresholds, and can lead to low detection rates on test data. [sent-38, score-0.614]
22 This paper proposes a new mechanism for setting the rejection thresholds of any soft-cascade which is conceptually simple, has no tunable parameters beyond the final detection rate target, yet yields a cascade which is both highly accurate and very fast. [sent-39, score-1.145]
23 Training data is used to set all reject thresholds after the final classifier is learned. [sent-40, score-0.384]
24 There are no assumptions about probability distributions, statistical independence, or ad hoc intermediate targets for detection rate (or false positive rate). [sent-41, score-0.624]
25 We propose a novel algorithm, multiple instance pruning (MIP), to set the reject thresholds automatically, which results in a very efficient cascade detector with superior performance. [sent-43, score-0.956]
26 Our key insight is quite simple: the reject thresholds are set so that they give up on precisely those positive examples which are rejected by the complete classifier. [sent-49, score-0.735]
27 The full classifier rejects a positive example if its final score sk (T ) falls below the final threshold θ(T ). [sent-51, score-0.351]
28 In the simplest version of our threshold setting algorithm, all trajectories from positive windows which fall below the final threshold are removed. [sent-52, score-0.692]
29 Each rejection threshold is then simply: θ(t) = min sk (t) k sk (T )>θ(T ),yk =1 where {xk , yk } is the training set in which yk = 1 indicates positive windows and yk = −1 indicates negative windows. [sent-53, score-1.134]
30 These thresholds produce a reasonably fast classifier which is guaranteed to produce no more errors than the complete classifier (on the training dataset). [sent-54, score-0.415]
31 One might question whether the minimum of all retained trajectories is robust to mislabeled or noisy examples in the training set. [sent-56, score-0.366]
32 Note that the final threshold of the complete classifier will often reject mislabeled or noisy examples (though they will be considered false negatives). [sent-57, score-0.525]
33 examples play no role in setting the rejection thresholds. [sent-60, score-0.439]
34 In past approaches, thresholds are set to reject the largest number of negative examples and only a small percentage of positive examples. [sent-62, score-0.61]
35 In the new approach, the final threshold of the complete soft-cascade is set to achieve the require detection rate. [sent-64, score-0.441]
36 Rejection thresholds are then set to reject the largest number of negative examples and retain all positive examples which are retained by the complete classifier. [sent-65, score-0.889]
37 The important difference is that the particular positive examples which are rejected are those which are destined to be rejected by the final classifier. [sent-66, score-0.458]
38 Note, some negative examples that eventually pass the complete classifier threshold may be pruned by earlier rejection thresholds. [sent-69, score-0.785]
39 In contrast, although the detection rate on the training set can also be guaranteed in Bourdev-Brandt’s algorithm, there is no guarantee that false positive rate will not increase. [sent-71, score-0.65]
40 Bourdev-Brandt propose reordering the weak classifiers based on the separation between the mean score of the positive examples and the mean score of the negative examples. [sent-72, score-0.428]
41 While the rejection thresholds are learned using a large set of training examples, this one image demonstrates the basic concepts. [sent-75, score-0.665]
42 The single physical face is consistent with a set of positive detection windows that are within an acceptable range of positions and scales. [sent-77, score-0.809]
43 Typically there are tens of acceptable windows for each face. [sent-78, score-0.304]
44 The blue and magenta trajectories correspond to acceptable windows which fall above the final detection threshold. [sent-79, score-0.73]
45 The cyan trajectories are potentially positive windows which fall below the final threshold. [sent-80, score-0.477]
46 Since the cyan trajectories are rejected by the final classifier, rejection thresholds need only retain the blue and magenta trajectories. [sent-81, score-0.947]
47 Compared with existing approaches, that set the reject thresholds in a heuristic manner, our approach is data-driven and hence more principled. [sent-86, score-0.384]
48 3 Multiple Instance Pruning The notion of an “acceptable detection window” plays a critical role in an improved process for setting rejection thresholds. [sent-87, score-0.614]
49 Recall that the detection process scans the image generating a large, but finite, collection of overlapping windows at various scales and locations. [sent-94, score-0.459]
50 Even in the absence of ambiguity, some slop is required to ensure that at least one of the generated windows is considered a successful detection for each face. [sent-95, score-0.459]
51 Using typical scanning parameters this can lead to tens of windows which are all equally valid positive detections. [sent-97, score-0.296]
52 If any of these windows is classified positive then this face is consider detected. [sent-98, score-0.444]
53 Even though all face detection algorithms must address the “multiple window” issue, few papers have discussed it. [sent-99, score-0.448]
54 In this paper, we do not directly address soft-cascade learning, though we will incorporate the “multiple window” observation into the determination of the rejection thresholds. [sent-103, score-0.354]
55 One need only retain one “acceptable” window for each face which is detected by the final classifier. [sent-104, score-0.318]
56 A more aggressive threshold is defined as: θ(t) = min i∈P max sk (t) k k∈Fi ∩Ri ,yk =1 where i is the index of ground-truth faces; Fi is the set of acceptable windows associated with ground-truth face i and Ri is the set of windows which are “retained” (see below). [sent-105, score-0.891]
57 We call this pruning method multiple instance pruning (MIP). [sent-108, score-0.347]
58 Initially the trajectories from the positive bags which fall above the final threshold are retained. [sent-113, score-0.414]
59 The set of retained examples is further reduced as the earlier thresholds are set. [sent-114, score-0.485]
60 This is in contrast to the simpler DBP algorithm where the thresholds are set to preserve all retained positive examples. [sent-115, score-0.463]
61 It guarantees the same face detection rate on the training dataset as the complete classifier. [sent-119, score-0.563]
62 Note that the algorithm is greedy, setting earlier thresholds first so that all positive bags are retained and the fewest number of negative examples pass. [sent-120, score-0.68]
63 Theoretically it is possible that delaying the rejection of a particular example may result in a better threshold at a later stage. [sent-121, score-0.487]
64 The MIP algorithm is however guaranteed to generate a soft-cascade that is at least as fast as DBP, since the criteria for setting the thresholds is less restrictive. [sent-123, score-0.313]
65 • A large training set (the whole training set to learn the cascade detector can be reused here). [sent-126, score-0.508]
66 Collect all windows that are above the final threshold θ(T ). [sent-128, score-0.332]
67 Record all intermediate scores as s(i, j, t), where i = 1, · · · , N is the face index; j = 1, · · · , Mi is the index of windows that match with face i; t = 1, · · · , T is the index of the feature node. [sent-129, score-0.597]
68 Set θ(t) = mini s(i, t) − as the rejection threshold of node t, = 10−6 . [sent-134, score-0.487]
69 (b) ROC curves of the detector after MIP pruning using the original training set. [sent-140, score-0.388]
70 A soft cascade classifier is learned through a new framework based on weight trimming and bootstrapping (see Appendix). [sent-144, score-0.499]
71 A detected rectangle is considered to be a true detection if it has less than 50% variation in shift and scale from the ground-truth. [sent-149, score-0.35]
72 The original data set for learning the soft cascade is reused for pruning the detector. [sent-193, score-0.496]
73 , the two curves of Bourdev-Brandt soft cascade in Figure 3(a)). [sent-202, score-0.355]
74 One explanation is that in order to achieve higher detection rates one must retain windows which are “ambiguous” and may contain faces. [sent-204, score-0.496]
75 The proposed MIP-based detector yields a much lower false positive rate than the 25-feature Bourdev-Brandt soft cascade and nearly 35% improvement on detection speed. [sent-205, score-0.988]
76 One would expect that if the required detection rate is higher, it world be more difficult to prune. [sent-213, score-0.313]
77 In MIP, when the detection rate increases, there are two conflicting factors involved. [sent-214, score-0.313]
78 Second, for each face the number of retained and acceptable windows increases. [sent-216, score-0.561]
79 Both algorithms prune the strong classifier for a target detection rate of 97. [sent-223, score-0.368]
80 It can be seen that the MIP pruned detector has the best detection performance. [sent-228, score-0.499]
81 , α = 4), the B-B pruned detector is still worse than the MIP pruned detector, and its speed is 5 times slower (56. [sent-231, score-0.365]
82 Note, adjusting α leads to changes both in detection time and false positive rate. [sent-238, score-0.462]
83 MIP is fully automated and guarantees detection rate with no increase in false positive rate on the training set. [sent-240, score-0.622]
84 On the other hand, if speed is the dominant factor, one can specify a target detection rate and target execution time and use B-B to find a solution. [sent-242, score-0.414]
85 While with our detector the performance of B-B is acceptable even when α = −16, the performance of the detector in [1] drops significantly from 37 features to 25 features, as shown in Fig. [sent-262, score-0.409]
86 5 Conclusions We have presented a simple yet effective way to set the rejection thresholds of a given soft-cascade, called multiple instance pruning (MIP). [sent-265, score-0.81]
87 MIP then adds a set of rejection thresholds to construct a cascade detector. [sent-267, score-0.832]
88 The rejection thresholds are determined so that every face which was detected by the original strong classifier is guaranteed to be detected by the soft cascade. [sent-268, score-0.995]
89 The algorithm also guarantees that the false positive rate on the training set will not increase. [sent-269, score-0.309]
90 There is only one parameter used throughout the cascade training process, the target detection rate for the final system. [sent-270, score-0.619]
91 Moreover, there are no required assumptions about probability distributions, statistical independence, or ad hoc intermediate targets for detection rate or false positive rate. [sent-271, score-0.624]
92 These preliminary rejection thresholds are extremely conservative, retaining all positive examples in the training set. [sent-283, score-0.875]
93 We set the preliminary thresholds only to moderately speed up the computation of ROC curves before MIP. [sent-286, score-0.358]
94 Initialize • Take all positive examples and randomly sample negative examples to form a subset of Q examples. [sent-290, score-0.311]
95 • Take all positive examples and randomly sample negative examples from the trimmed training set to form a new subset of Q examples. [sent-303, score-0.365]
96 Set preliminary rejection threshold θ(t) of α h as the minimum score of all positive j=1 j j examples at stage t. [sent-305, score-0.786]
97 Output Weak classifiers ht , t = 1, · · · , T , the associated confidences αt and preliminary rejection thresholds θ(t). [sent-306, score-0.667]
98 Rapid object detection using a boosted cascade of simple features. [sent-382, score-0.481]
99 Fast rotation invariant multi-view face detection based on real adaboost. [sent-398, score-0.408]
100 Learning a rare event detection cascade by direct feature selection. [sent-407, score-0.481]
wordName wordTfidf (topN-words)
[('mip', 0.447), ('rejection', 0.354), ('detection', 0.26), ('thresholds', 0.257), ('cascade', 0.221), ('windows', 0.199), ('detector', 0.152), ('face', 0.148), ('pruning', 0.148), ('dbp', 0.137), ('threshold', 0.133), ('er', 0.129), ('classi', 0.128), ('reject', 0.127), ('rejected', 0.121), ('trimming', 0.119), ('retained', 0.109), ('acceptable', 0.105), ('false', 0.105), ('soft', 0.1), ('nal', 0.099), ('positive', 0.097), ('trajectories', 0.091), ('window', 0.091), ('pruned', 0.087), ('examples', 0.085), ('cmu', 0.08), ('weak', 0.078), ('bourdev', 0.068), ('waldboost', 0.068), ('viola', 0.068), ('alpha', 0.063), ('score', 0.062), ('sk', 0.059), ('ers', 0.058), ('bags', 0.054), ('training', 0.054), ('detectors', 0.054), ('rate', 0.053), ('brandt', 0.051), ('cascaded', 0.051), ('sochman', 0.051), ('cyan', 0.051), ('complete', 0.048), ('rectangle', 0.048), ('zhang', 0.047), ('hoc', 0.046), ('boosting', 0.045), ('final', 0.045), ('yk', 0.045), ('negative', 0.044), ('detected', 0.042), ('frontal', 0.04), ('papers', 0.04), ('speed', 0.039), ('faces', 0.039), ('fall', 0.039), ('retain', 0.037), ('wu', 0.037), ('magenta', 0.036), ('weight', 0.035), ('ad', 0.035), ('destined', 0.034), ('floatboost', 0.034), ('indows', 0.034), ('xiao', 0.034), ('zhu', 0.034), ('curves', 0.034), ('earlier', 0.034), ('cvpr', 0.033), ('roc', 0.032), ('visited', 0.031), ('target', 0.031), ('bag', 0.03), ('xk', 0.03), ('dences', 0.03), ('nowlan', 0.03), ('adaboost', 0.029), ('preliminary', 0.028), ('stages', 0.028), ('guaranteed', 0.028), ('intermediate', 0.028), ('fast', 0.028), ('ht', 0.028), ('eccv', 0.027), ('reused', 0.027), ('negatives', 0.027), ('mislabeled', 0.027), ('aggressively', 0.027), ('instance', 0.027), ('stage', 0.027), ('scores', 0.026), ('early', 0.025), ('cult', 0.025), ('multiple', 0.024), ('platt', 0.024), ('bootstrapping', 0.024), ('aggressive', 0.024), ('index', 0.024), ('strong', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
2 0.14844388 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
3 0.13455682 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch
Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1
4 0.10681415 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
Author: Michael Ross, Andrew Cohen
Abstract: This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects’ performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it represents classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than previous methods and provides a foundation for further exploration. 1
5 0.10673878 147 nips-2007-One-Pass Boosting
Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio
Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.
6 0.10454079 39 nips-2007-Boosting the Area under the ROC Curve
7 0.097914606 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
8 0.080674686 143 nips-2007-Object Recognition by Scene Alignment
9 0.069858484 136 nips-2007-Multiple-Instance Active Learning
10 0.067666441 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection
11 0.060945369 109 nips-2007-Kernels on Attributed Pointsets with Applications
12 0.059244394 166 nips-2007-Regularized Boost for Semi-Supervised Learning
13 0.058144066 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
14 0.053070456 175 nips-2007-Semi-Supervised Multitask Learning
15 0.052297954 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
16 0.051213071 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
17 0.051144842 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
18 0.050480932 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets
19 0.048789505 110 nips-2007-Learning Bounds for Domain Adaptation
20 0.048495043 69 nips-2007-Discriminative Batch Mode Active Learning
topicId topicWeight
[(0, -0.157), (1, 0.041), (2, -0.047), (3, 0.023), (4, 0.054), (5, 0.206), (6, 0.108), (7, 0.054), (8, 0.079), (9, -0.062), (10, -0.014), (11, -0.023), (12, -0.096), (13, -0.039), (14, -0.114), (15, 0.027), (16, 0.0), (17, 0.045), (18, -0.1), (19, -0.023), (20, -0.004), (21, 0.092), (22, -0.073), (23, 0.172), (24, -0.012), (25, -0.044), (26, 0.006), (27, 0.092), (28, 0.013), (29, 0.095), (30, -0.101), (31, -0.05), (32, 0.045), (33, 0.082), (34, 0.081), (35, 0.031), (36, -0.084), (37, 0.016), (38, 0.114), (39, 0.057), (40, 0.016), (41, -0.009), (42, 0.101), (43, 0.03), (44, 0.084), (45, -0.148), (46, 0.066), (47, -0.125), (48, 0.114), (49, -0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.96359646 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
2 0.69260395 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
3 0.69241858 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
Author: Michael Ross, Andrew Cohen
Abstract: This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects’ performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it represents classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than previous methods and provides a foundation for further exploration. 1
4 0.61962622 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
Author: Marco Barreno, Alvaro Cardenas, J. D. Tygar
Abstract: We present a new analysis for the combination of binary classifiers. Our analysis makes use of the Neyman-Pearson lemma as a theoretical basis to analyze combinations of classifiers. We give a method for finding the optimal decision rule for a combination of classifiers and prove that it has the optimal ROC curve. We show how our method generalizes and improves previous work on combining classifiers and generating ROC curves. 1
5 0.58795065 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
6 0.5369913 147 nips-2007-One-Pass Boosting
7 0.43241739 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
8 0.42174506 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
9 0.39034998 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
10 0.38412136 143 nips-2007-Object Recognition by Scene Alignment
11 0.3819322 113 nips-2007-Learning Visual Attributes
12 0.37278086 175 nips-2007-Semi-Supervised Multitask Learning
13 0.36522594 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection
14 0.3573007 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
15 0.35529473 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
16 0.35497448 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets
17 0.34132126 109 nips-2007-Kernels on Attributed Pointsets with Applications
18 0.32764715 174 nips-2007-Selecting Observations against Adversarial Objectives
19 0.32717136 171 nips-2007-Scan Strategies for Meteorological Radars
20 0.31559333 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
topicId topicWeight
[(4, 0.01), (5, 0.034), (13, 0.028), (16, 0.034), (18, 0.012), (19, 0.016), (21, 0.067), (31, 0.027), (34, 0.015), (35, 0.041), (47, 0.081), (76, 0.333), (83, 0.157), (85, 0.012), (87, 0.014), (90, 0.039)]
simIndex simValue paperId paperTitle
1 0.77277094 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation
Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr
Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random field (i.e. a random field defined using a finite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by defining a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.
same-paper 2 0.75463426 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
3 0.68344301 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions
Author: Tony Jebara, Yingbo Song, Kapil Thadani
Abstract: A method is proposed for semiparametric estimation where parametric and nonparametric criteria are exploited in density estimation and unsupervised learning. This is accomplished by making sampling assumptions on a dataset that smoothly interpolate between the extreme of independently distributed (or id) sample data (as in nonparametric kernel density estimators) to the extreme of independent identically distributed (or iid) sample data. This article makes independent similarly distributed (or isd) sampling assumptions and interpolates between these two using a scalar parameter. The parameter controls a Bhattacharyya affinity penalty between pairs of distributions on samples. Surprisingly, the isd method maintains certain consistency and unimodality properties akin to maximum likelihood estimation. The proposed isd scheme is an alternative for handling nonstationarity in data without making drastic hidden variable assumptions which often make estimation difficult and laden with local optima. Experiments in density estimation on a variety of datasets confirm the value of isd over iid estimation, id estimation and mixture modeling.
4 0.52217031 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
5 0.51759744 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
Author: Charles L. Isbell, Michael P. Holmes, Alexander G. Gray
Abstract: Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n2 ) or higher, which severely limits application to large datasets. We present a multi-stage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as standard multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 1014 , many orders of magnitude beyond the previous state of the art. 1
6 0.51638246 63 nips-2007-Convex Relaxations of Latent Variable Training
7 0.51615191 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
8 0.51603431 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
9 0.5145247 116 nips-2007-Learning the structure of manifolds using random projections
10 0.51342207 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
11 0.51341009 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
12 0.51330423 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
13 0.513165 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
14 0.51297754 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
15 0.51268935 115 nips-2007-Learning the 2-D Topology of Images
16 0.51266307 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
17 0.51221889 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect
18 0.51221776 16 nips-2007-A learning framework for nearest neighbor search
19 0.51200986 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
20 0.51020616 187 nips-2007-Structured Learning with Approximate Inference