nips nips2002 nips2002-92 knowledge-graph by maker-knowledge-mining

92 nips-2002-FloatBoost Learning for Classification


Source: pdf

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, the ultimate goal in applications of pattern classification is always minimum error rate. [sent-3, score-0.083]

2 On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. [sent-4, score-0.453]

3 In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. [sent-5, score-0.077]

4 FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. [sent-6, score-0.469]

5 The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. [sent-7, score-0.561]

6 We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. [sent-8, score-0.521]

7 Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. [sent-9, score-0.583]

8 While designing such a classifier is difficult, AdaBoost learning methods, introduced by Freund and Schapire [3], provides an effective stagewise approach: It learns a sequence of more easily learnable “weak classifiers”, and boosts them into a single strong classifier by a linear combination of them. [sent-12, score-0.19]

9 It is shown that the AdaBoost learning minimizes an upper error bound which is an exponential function of the margin on the training set [14]. [sent-13, score-0.205]

10 Boosting learning originated from the PAC (probably approximately correct) learning theory [17, 6]. [sent-14, score-0.042]

11 Given that weak classifiers can perform slightly better than random guessing ¢ http://research. [sent-15, score-0.348]

12 £ ¤ on every distribution over the training set, AdaBoost can provably achieve arbitrarily good bounds on its training and generalization errors [3, 15]. [sent-18, score-0.13]

13 It is shown that such simple weak classifiers, when boosted, can capture complex decision boundaries [1]. [sent-19, score-0.348]

14 A number of gradient boosting algorithms are proposed [4, 8, 21]. [sent-21, score-0.137]

15 [5] who show that the AdaBoost algorithms minimize an exponential loss function which is closely related to Bernoulli likelihood. [sent-23, score-0.029]

16 AdaBoost minimizes an exponential (some another form of ) function of the margin over the training set. [sent-25, score-0.126]

17 However, the ultimate goal in applications is always minimum error rate. [sent-27, score-0.083]

18 A strong classifier learned by AdaBoost may not necessarily be best in this criterion. [sent-28, score-0.059]

19 This problem has been noted, eg by [2], but no solutions have been found in literature. [sent-29, score-0.03]

20 An effective and tractable algorithm for learning weak classifiers is needed. [sent-31, score-0.397]

21 Learning the optimal weak classifier, such as the log posterior ratio given in [15, 5], requires estimation of densities in the input data space. [sent-32, score-0.428]

22 FloatBoost incorporates into AdaBoost the idea of Floating Search originally proposed in [11] for feature selection. [sent-35, score-0.048]

23 A backtrack mechanism therein allows deletion of those weak classifiers that are non-effective or unfavorable in terms of the error rate. [sent-36, score-0.511]

24 This leads to a strong classifier consisting of fewer weak classifiers. [sent-37, score-0.438]

25 Because deletions in backtrack is performed according to the error rate, an improvement in classification error is also obtained. [sent-38, score-0.179]

26 To solve the second problem above, we provide a statistical model (Section 4) for learning weak classifiers and effective feature selection in high dimensional feature space. [sent-39, score-0.549]

27 A base set of weak classifiers, defined as the log posterior ratio, are derived based on an overcomplete set of scalar features. [sent-40, score-0.449]

28 Experimental results are presented in (Section 5) using a difficult classification problem, face detection. [sent-41, score-0.154]

29 Comparisons are made between FloatBoost and AdaBoost in terms of the error rate and complexity of boosted classifier. [sent-42, score-0.167]

30 Results clear show that FloatBoost yields a strong classifier consisting of fewer weak classifiers yet achieves lower error rates. [sent-43, score-0.546]

31 For two class problems, a set of labelled training examples is given as , where is the class label associated with example . [sent-45, score-0.126]

32 A stronger classifier is a linear combination of weak classifiers 2 ) 1 ¦ ) ( & 3%#0§'%#¨ $ "    ! [sent-46, score-0.348]

33  ©£©§¥£¡ ¨ ¦  ¢ ¡ ¦    ¦  ¤ ¨ ¦ ¤ ¢ 8 6 $ " 9754¢ @ B  ¢ S¡ (1) H G Q R¤ E  ¢ FD£¡ B CA I PH 6 TCD£¡ $  ¢ In this real version of AdaBoost, the weak classifiers can take a real value, , and have absorbed the coefficients needed in the discrete version (there, ). [sent-47, score-0.348]

34 During the learning process, the weights are updated dynamically in such a way that more emphasis is p ) ( ) 1 0W¦ %VUQ D£¡ $  ¢ H  ¢ D£¡ B gA p H Q h  ¢ i£¡ B gA fd c©`XD£¡ e b a Y E  ¢ A ¢ placed on hard examples which are erroneously classified previously. [sent-50, score-0.132]

35 (Input) , (1) Training examples where ; of which examples have and examples have ; of weak classifiers to be combined; (2) The maximum number 1. [sent-54, score-0.57]

36 (Initialization) for those examples with or for those examples with . [sent-55, score-0.148]

37 The “margin” of an example An error occurs when achieved by on the training set examples is defined as . [sent-64, score-0.208]

38 The upper bound on classification error achieved by can be derived as the following exponential loss function [14]  ¢ D£¡ ¨ 6 $  ¢ 7UD£¡ Q Q Q j —kj ”¦† l ihg p onm B CA G E 5 f A ¡ B "  ¢ S¡ by stage-wise minimization of Eq. [sent-66, score-0.111]

39 can be adjusted † † to balance between detection rate and false alarm (ROC curve). [sent-70, score-0.358]

40  p ¤ B h m € ¦ D¨ ¢ S¡ p } 3 FloatBoost Learning B FloatBoost backtracks after the newest weak classifier is added and delete unfavorable weak classifiers from the ensemble (1), following the idea of Floating Search [11]. [sent-74, score-0.822]

41 Floating Search [11] is originally aimed to deal with non-monotonicity of straight sequential feature selection, non-monotonicity meaning that adding an additional feature may lead to drop in performance. [sent-75, score-0.12]

42 When a new feature is added, backtracks are performed to delete those features that cause performance drops. [sent-76, score-0.166]

43 Limitations of sequential feature selection is thus amended, improvement gained with the cost of increased computation due to the extended search. [sent-77, score-0.08]

44 (Input) (1) Training examples , ; of which examples have where and examples have ; of weak classifiers; (2) The maximum number (3) The error rate , and the acceptance threshold . [sent-79, score-0.681]

45 (Initialization) (1) for those examples with or for those examples with ; (2) max-value (for ), , . [sent-81, score-0.148]

46 (Conditional Exclusion) (1) ; , then (2) If (a) ; ; ; (b) ; (c) goto 3. [sent-85, score-0.05]

47 (1); (3) else (a) if or , then goto 4; (b) ; goto 2. [sent-86, score-0.1]

48 2 Let be the so-far-best set of weak classifiers; be the error rate achieved by (or a weighted sum of missing rate and false alarm rate which is usually the criterion in one-class detection problem); be the minimum error rate achieved so far with an ensemble of weak classifiers. [sent-96, score-1.377]

49  ¢ D£¡ 6 H Q ¤ B I PH q EF¢SQ ¡ B A Q )  B A ¡ 0 @ 5 4H20 3 1 In Step 2 (forward inclusion), given already selected, the best weak classifier is added one at a time, which is the same as in AdaBoost. [sent-97, score-0.37]

50 In Step 3 (conditional exclusion), FloatBoost removes the least significant weak classifier from , subject to the condition that the removal leads to a lower error rate . [sent-98, score-0.481]

51 The procedure terminates when the risk on the training set is below or the maximum number is reached. [sent-100, score-0.052]

52 B ) f   ¤ 5 4B1 0 3 h ¢ @   ¡1 Incorporating the conditional exclusion, FloatBoost renders both effective feature selection and classifier learning. [sent-101, score-0.133]

53 It usually needs fewer weak classifiers than AdaBoost to achieve the same error rate . [sent-102, score-0.57]

54 Since deriving a weak classifier in high dimensional space is a non-trivial task, here we provide a statistical model for stagewise learning of weak classifiers based on some scalar features. [sent-105, score-0.868]

55 A scaler feature of is computed by a transform from the -dimensional data space to the real line, . [sent-106, score-0.074]

56 A feature can be the coefficient of, say, a wavelet transform in signal and image processing. [sent-107, score-0.101]

57 A dictionary of candidate scalar the transform, features can be created . [sent-109, score-0.119]

58 In the following, we use to denote the feature selected in the -th stage, while is the feature computed from using the -th transform. [sent-110, score-0.096]

59 Therefore, we have ¤  (10)  (11) ¤ B ¦ D¨ ¦ D¨ ¡ p  H p ¡ p  ¦ D¨ B  ¦ D¨ ¤ p ¡  ¤ p H ¤ H p B ¤ ¦ D¨ ¦    ¦ H ¢ £¡ p p m¥ ¤ ¡ B On the right-hand side of the above equation, all conditional densities are fixed except the last one . [sent-114, score-0.052]

60 Learning the best weak classifier at stage is to choose the best feature for such that is minimized according to Eq. [sent-115, score-0.486]

61 ) 0( E ¨ ¤  @ B ¦ D¨ B  ¤ B ¦ D¨ ¡ p C ¡ The conditional probability densities for the positive class and the negative class can be estimated using the histograms computed from the weighted voting of the training examples using the weights . [sent-117, score-0.24]

62 We can derive the set of candidate weaker classifiers as 2 ¢ £† and among all in for the new strong classifier is given by Eq. [sent-119, score-0.089]

63 (3) among all , for which the optimal weak classifier has been derived as (7). [sent-120, score-0.348]

64 An alternative selection scheme is simply to choose so that the error rate (or some risk), computed from the two histograms and , is minimized. [sent-122, score-0.198]

65 Q  p ¤ B h £ m € ¦ ) #0( E ¨ p ¦¥ ¡ p m f § ¦ B 1 „  p ¤ B h m € ¦ ) 1 #%'E ¨ p ¦ #¥ ) ¡ 5 Experimental Results Face Detection The face detection problem here is to classifier an image of standard size (eg 20x20 pixels) into either face or nonface (imposter). [sent-123, score-0.548]

66 This is essentially a one-class problem in that everything not a face is a nonface. [sent-124, score-0.154]

67 Learning based methods have been the main approach for solving the problem , eg [13, 16, 9, 12]. [sent-126, score-0.03]

68 There, AdaBoost is used for learning face detection; it performs two important tasks: feature selection from a large collection features; and constructing classifiers using selected features. [sent-128, score-0.255]

69 Data Sets A set of 5000 face images are collected from various sources. [sent-129, score-0.154]

70 Another set of 5000 nonface examples of the same size are collected from images containing no faces. [sent-131, score-0.199]

71 The 5000 examples in each set is divided into a training set of 4000 examples and a test set of 1000 examples. [sent-132, score-0.2]

72 3 for a random sample of 10 face and 10 nonface examples. [sent-134, score-0.279]

73 Figure 3: Face (top) and nonface (bottom) examples. [sent-135, score-0.125]

74 Scalar Features Three basic types of scalar features are derived from each example, as shown in Fig. [sent-136, score-0.088]

75 Each candidate weak classifier is constructed as the log likelihood ratio (12) computed from the two histograms of a scalar feature for the face ( ) and nonface ( ) examples (cf. [sent-140, score-0.921]

76 ¦¥ ¦¥ ¦¥  p ¤ h B   ¨ # ¦ m € ¦ D¨ p ¦¥ ¡ „ ) %1 E ¨ ) 0( E ¢ ¦ ¨ # D©¦ ¨ ¢ „ p m   ¢ Figure 4: The three types of simple Harr wavelet like features defined on a sub-window . [sent-142, score-0.061]

77 Each feature takes ) sum of the pixels in the rectangles. [sent-144, score-0.048]

78 The performance is measured by false alarm error rate given the detection rate fixed at 99. [sent-146, score-0.469]

79 While a cascade of stronger classifiers are needed to achiever very low false alarm [19, 7], here we present the learning curves for the first strong classifier composed of up to one thousand weak classifiers. [sent-148, score-0.663]

80 Interested reader is referred to [7] for a complete system which achieved a false alarm of with the detection rate of 95%. [sent-150, score-0.382]

81 (A live demo of multi-view face detection system, the first real-time system of the kind in the world, is being submitted to the conference). [sent-151, score-0.269]

82 35 100 200 300 400 500 600 # Weak Classifiers 700 800 900 1000 Figure 5: Error Rates of FloatBoost vs AdaBoost for frontal face detection. [sent-162, score-0.154]

83 The training and testing error curves for FloatBoost and AdaBoost are shown in Fig. [sent-163, score-0.133]

84 The following conclusions can be made from these curves: (1) Given the same number of learned features or weak classifiers, FloatBoost always achieves lower training error and lower test error than AdaBoost. [sent-166, score-0.622]

85 For example, on the test set, by combining 1000 weak classifiers, the false alarm of FloatBoost is 0. [sent-167, score-0.538]

86 (2) FloatBoost needs many fewer weak classifiers than AdaBoost in order to achieve the same false alarms. [sent-170, score-0.524]

87 For example, the lowest test error for AdaBoost is 0. [sent-171, score-0.058]

88 481 with 800 weak classifiers, whereas FloatBoost needs only 230 weak classifiers to achieve the same performance. [sent-172, score-0.754]

89 This clearly demonstrates the strength of FloatBoost in learning to achieve lower error rate. [sent-173, score-0.127]

90 It needs fewer weaker classifiers than AdaBoost to achieve a similar error rate, or achieves lower a error rate with the same number of weak classifiers. [sent-175, score-0.699]

91 Such a performance improvement is achieved with the cost of longer training time, about 5 times longer for the experiments reported in this paper. [sent-176, score-0.076]

92 Several methods can be used to make the training more efficient with little drop in the training performance. [sent-178, score-0.128]

93 Noticing that only examples with large weigh values are influential, Friedman et al. [sent-179, score-0.074]

94 [5] propose to select examples with large weights, i. [sent-180, score-0.074]

95 those which in the past have been wrongly classified by the learned weak classifiers, for the training weak classifier in t+- he next round. [sent-182, score-0.748]

96 Top examples within a fraction of of the total weight mass are used, where . [sent-183, score-0.074]

97 “Invited discussion on ‘Additive logistic regression: a statistical view of boosting (friedman, hastie and tibshirani)’ ”. [sent-191, score-0.131]

98 “Training support vector machines: An application to face detection”. [sent-244, score-0.154]

99 “Boosting the margin: A new explanation for the effectiveness of voting methods”. [sent-278, score-0.028]

100 “Rapid object detection using a boosted cascade of simple features”. [sent-304, score-0.236]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('floatboost', 0.565), ('adaboost', 0.368), ('weak', 0.348), ('classi', 0.297), ('ers', 0.212), ('face', 0.154), ('er', 0.128), ('alarm', 0.125), ('nonface', 0.125), ('detection', 0.115), ('boosting', 0.11), ('floating', 0.105), ('beijing', 0.084), ('china', 0.084), ('examples', 0.074), ('stagewise', 0.073), ('false', 0.065), ('asia', 0.063), ('backtrack', 0.063), ('viola', 0.062), ('error', 0.058), ('boosted', 0.056), ('exclusion', 0.055), ('scalar', 0.054), ('rate', 0.053), ('fewer', 0.053), ('training', 0.052), ('microsoft', 0.051), ('goto', 0.05), ('freund', 0.049), ('feature', 0.048), ('zhang', 0.047), ('margin', 0.045), ('annals', 0.044), ('cascade', 0.044), ('ga', 0.044), ('backtracks', 0.042), ('delete', 0.042), ('feda', 0.042), ('fxs', 0.042), ('realboost', 0.042), ('rqpig', 0.042), ('unfavorable', 0.042), ('inclusion', 0.042), ('friedman', 0.041), ('fd', 0.037), ('strong', 0.037), ('fedb', 0.036), ('histograms', 0.034), ('features', 0.034), ('selection', 0.032), ('needs', 0.032), ('schapire', 0.032), ('ratio', 0.031), ('learnable', 0.031), ('candidate', 0.031), ('eg', 0.03), ('exponential', 0.029), ('effective', 0.028), ('achieves', 0.028), ('voting', 0.028), ('gradient', 0.027), ('densities', 0.027), ('vancouver', 0.027), ('april', 0.027), ('normalize', 0.027), ('october', 0.027), ('wavelet', 0.027), ('transform', 0.026), ('achieve', 0.026), ('overcomplete', 0.025), ('ultimate', 0.025), ('bartlett', 0.025), ('search', 0.025), ('conditional', 0.025), ('stage', 0.025), ('achieved', 0.024), ('vision', 0.024), ('drop', 0.024), ('dimensional', 0.024), ('curves', 0.023), ('ay', 0.023), ('december', 0.023), ('ih', 0.023), ('proceedings', 0.023), ('cult', 0.023), ('detector', 0.022), ('log', 0.022), ('lower', 0.022), ('ca', 0.022), ('best', 0.022), ('forward', 0.021), ('weaker', 0.021), ('choose', 0.021), ('dif', 0.021), ('canada', 0.021), ('hastie', 0.021), ('ph', 0.021), ('object', 0.021), ('learning', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

2 0.28788644 46 nips-2002-Boosting Density Estimation

Author: Saharon Rosset, Eran Segal

Abstract: Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property applies to this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.

3 0.22066589 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

Abstract: We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, which provides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded off for accuracy). 1

4 0.17530936 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

Author: Sepp Hochreiter, Klaus Obermayer

Abstract: We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microarrays, where “column” objects correspond to samples, “row” objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classification and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classification results. 1

5 0.16336094 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

Abstract: The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.

6 0.16313949 181 nips-2002-Self Supervised Boosting

7 0.13782567 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

8 0.1199066 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

9 0.11080026 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

10 0.1091973 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

11 0.1069838 55 nips-2002-Combining Features for BCI

12 0.10668721 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

13 0.1062542 173 nips-2002-Recovering Intrinsic Images from a Single Image

14 0.10546837 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

15 0.1019488 140 nips-2002-Margin Analysis of the LVQ Algorithm

16 0.098169781 120 nips-2002-Kernel Design Using Boosting

17 0.092713863 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

18 0.091211818 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

19 0.089847088 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

20 0.087420307 149 nips-2002-Multiclass Learning by Probabilistic Embeddings


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.22), (1, -0.136), (2, 0.1), (3, 0.006), (4, 0.349), (5, -0.027), (6, -0.024), (7, -0.083), (8, 0.006), (9, -0.09), (10, -0.138), (11, 0.082), (12, -0.012), (13, 0.164), (14, -0.023), (15, 0.12), (16, -0.138), (17, -0.053), (18, 0.005), (19, -0.008), (20, -0.037), (21, 0.105), (22, -0.013), (23, 0.01), (24, -0.076), (25, 0.15), (26, -0.002), (27, -0.043), (28, 0.001), (29, -0.035), (30, 0.014), (31, 0.016), (32, -0.056), (33, 0.068), (34, 0.1), (35, -0.049), (36, 0.013), (37, 0.005), (38, 0.004), (39, -0.045), (40, 0.011), (41, -0.022), (42, 0.006), (43, -0.02), (44, -0.026), (45, 0.013), (46, -0.02), (47, 0.017), (48, 0.057), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95511115 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

2 0.74000061 46 nips-2002-Boosting Density Estimation

Author: Saharon Rosset, Eran Segal

Abstract: Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property applies to this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.

3 0.7174083 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

Abstract: We introduce a novel learning algorithm for binary classification with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a confidence-rated version of AdaBoost, which provides a sound strategy for searching through the finite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classifiers was found to be comparable to that of SVMs and k-NN classifiers. Furthermore, the computational cost of classification (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we offer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded off for accuracy). 1

4 0.67041159 181 nips-2002-Self Supervised Boosting

Author: Max Welling, Richard S. Zemel, Geoffrey E. Hinton

Abstract: Boosting algorithms and successful applications thereof abound for classification and regression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random field model by training them to improve classification performance between the data and an equal-sized sample of “negative examples” generated from the model’s current estimate of the data density. Training in each boosting round proceeds in three stages: first we sample negative examples from the model’s current Boltzmann distribution. Next, a feature is trained to improve classification performance between data and negative examples. Finally, a coefficient is learned which determines the importance of this feature relative to ones already in the pool. Negative examples only need to be generated once to learn each new feature. The validity of the approach is demonstrated on binary digits and continuous synthetic data.

5 0.63257796 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

Abstract: The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification.

6 0.62142622 55 nips-2002-Combining Features for BCI

7 0.61611432 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

8 0.59389114 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

9 0.57882863 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

10 0.56961107 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

11 0.54348713 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

12 0.50811249 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

13 0.47183925 140 nips-2002-Margin Analysis of the LVQ Algorithm

14 0.46740094 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

15 0.45712435 58 nips-2002-Conditional Models on the Ranking Poset

16 0.4537203 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis

17 0.44847989 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

18 0.4333078 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

19 0.4217518 173 nips-2002-Recovering Intrinsic Images from a Single Image

20 0.41892523 89 nips-2002-Feature Selection by Maximum Marginal Diversity


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.014), (42, 0.071), (54, 0.081), (55, 0.031), (67, 0.012), (68, 0.011), (74, 0.071), (92, 0.032), (98, 0.569)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98874176 129 nips-2002-Learning in Spiking Neural Assemblies

Author: David Barber

Abstract: We consider a statistical framework for learning in a class of networks of spiking neurons. Our aim is to show how optimal local learning rules can be readily derived once the neural dynamics and desired functionality of the neural assembly have been specified, in contrast to other models which assume (sub-optimal) learning rules. Within this framework we derive local rules for learning temporal sequences in a model of spiking neurons and demonstrate its superior performance to correlation (Hebbian) based approaches. We further show how to include mechanisms such as synaptic depression and outline how the framework is readily extensible to learning in networks of highly complex spiking neurons. A stochastic quantal vesicle release mechanism is considered and implications on the complexity of learning discussed. 1

2 0.9861694 103 nips-2002-How Linear are Auditory Cortical Responses?

Author: Maneesh Sahani, Jennifer F. Linden

Abstract: By comparison to some other sensory cortices, the functional properties of cells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptive field (STRF), which amounts to a model of the stimulusresponse function (SRF) that is linear in the spectrogram of the stimulus. How well can such a model account for neural responses at the very first stages of auditory cortical processing? To answer this question, we develop a novel methodology for evaluating the fraction of stimulus-related response power in a population that can be captured by a given type of SRF model. We use this technique to show that, in the thalamo-recipient layers of primary auditory cortex, STRF models account for no more than 40% of the stimulus-related power in neural responses.

same-paper 3 0.97620589 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

4 0.97579378 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Author: Ralf Herbrich, Neil D. Lawrence, Matthias Seeger

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d–sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2 ), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation. 1

5 0.96654987 56 nips-2002-Concentration Inequalities for the Missing Mass and for Histogram Rule Error

Author: Luis E. Ortiz, David A. McAllester

Abstract: This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding’s inequality, the Angluin-Valiant bound, Bernstein’s inequality, Bennett’s inequality, or McDiarmid’s theorem.

6 0.92440933 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

7 0.87310386 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

8 0.84131259 50 nips-2002-Circuit Model of Short-Term Synaptic Dynamics

9 0.82995921 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

10 0.82564211 43 nips-2002-Binary Coding in Auditory Cortex

11 0.82537985 110 nips-2002-Incremental Gaussian Processes

12 0.81277442 12 nips-2002-A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise

13 0.80425799 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

14 0.78789574 41 nips-2002-Bayesian Monte Carlo

15 0.78390211 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

16 0.78033644 180 nips-2002-Selectivity and Metaplasticity in a Unified Calcium-Dependent Model

17 0.77387083 199 nips-2002-Timing and Partial Observability in the Dopamine System

18 0.7667287 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

19 0.75770217 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

20 0.74610412 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex