nips nips2002 nips2002-92 knowledge-graph by maker-knowledge-mining

92 nips-2002-FloatBoost Learning for Classification

Source: pdf

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

Abstract: AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classiﬁcation is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classiﬁers, which by itself is difﬁcult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classiﬁer. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classiﬁers which cause higher error rates. The resulting ﬂoat-boosted classiﬁer consists of fewer weak classiﬁers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classiﬁers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisons of FloatBoost and AdaBoost are provided through a difﬁcult classiﬁcation problem, face detection, where the goal is to learn from training examples a highly nonlinear classiﬁer to differentiate between face and nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 However, the ultimate goal in applications of pattern classiﬁcation is always minimum error rate. [sent-3, score-0.083]

2 On the other hand, AdaBoost needs an effective procedure for learning weak classiﬁers, which by itself is difﬁcult especially for high dimensional data. [sent-4, score-0.453]

3 In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classiﬁer. [sent-5, score-0.077]

4 FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classiﬁers which cause higher error rates. [sent-6, score-0.469]

5 The resulting ﬂoat-boosted classiﬁer consists of fewer weak classiﬁers yet achieves lower error rates than AdaBoost in both training and test. [sent-7, score-0.561]

6 We also propose a statistical model for learning weak classiﬁers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. [sent-8, score-0.521]

7 Experimental comparisons of FloatBoost and AdaBoost are provided through a difﬁcult classiﬁcation problem, face detection, where the goal is to learn from training examples a highly nonlinear classiﬁer to differentiate between face and nonface patterns in a high dimensional space. [sent-9, score-0.583]

8 While designing such a classiﬁer is difﬁcult, AdaBoost learning methods, introduced by Freund and Schapire [3], provides an effective stagewise approach: It learns a sequence of more easily learnable “weak classiﬁers”, and boosts them into a single strong classiﬁer by a linear combination of them. [sent-12, score-0.19]

9 It is shown that the AdaBoost learning minimizes an upper error bound which is an exponential function of the margin on the training set [14]. [sent-13, score-0.205]

10 Boosting learning originated from the PAC (probably approximately correct) learning theory [17, 6]. [sent-14, score-0.042]

11 Given that weak classiﬁers can perform slightly better than random guessing ¢ http://research. [sent-15, score-0.348]

12 £ ¤ on every distribution over the training set, AdaBoost can provably achieve arbitrarily good bounds on its training and generalization errors [3, 15]. [sent-18, score-0.13]

13 It is shown that such simple weak classiﬁers, when boosted, can capture complex decision boundaries [1]. [sent-19, score-0.348]

14 A number of gradient boosting algorithms are proposed [4, 8, 21]. [sent-21, score-0.137]

15 [5] who show that the AdaBoost algorithms minimize an exponential loss function which is closely related to Bernoulli likelihood. [sent-23, score-0.029]

16 AdaBoost minimizes an exponential (some another form of ) function of the margin over the training set. [sent-25, score-0.126]

17 However, the ultimate goal in applications is always minimum error rate. [sent-27, score-0.083]

18 A strong classiﬁer learned by AdaBoost may not necessarily be best in this criterion. [sent-28, score-0.059]

19 This problem has been noted, eg by [2], but no solutions have been found in literature. [sent-29, score-0.03]

20 An effective and tractable algorithm for learning weak classiﬁers is needed. [sent-31, score-0.397]

21 Learning the optimal weak classiﬁer, such as the log posterior ratio given in [15, 5], requires estimation of densities in the input data space. [sent-32, score-0.428]

22 FloatBoost incorporates into AdaBoost the idea of Floating Search originally proposed in [11] for feature selection. [sent-35, score-0.048]

23 A backtrack mechanism therein allows deletion of those weak classiﬁers that are non-effective or unfavorable in terms of the error rate. [sent-36, score-0.511]

24 This leads to a strong classiﬁer consisting of fewer weak classiﬁers. [sent-37, score-0.438]

25 Because deletions in backtrack is performed according to the error rate, an improvement in classiﬁcation error is also obtained. [sent-38, score-0.179]

26 To solve the second problem above, we provide a statistical model (Section 4) for learning weak classiﬁers and effective feature selection in high dimensional feature space. [sent-39, score-0.549]

27 A base set of weak classiﬁers, deﬁned as the log posterior ratio, are derived based on an overcomplete set of scalar features. [sent-40, score-0.449]

28 Experimental results are presented in (Section 5) using a difﬁcult classiﬁcation problem, face detection. [sent-41, score-0.154]

29 Comparisons are made between FloatBoost and AdaBoost in terms of the error rate and complexity of boosted classiﬁer. [sent-42, score-0.167]

30 Results clear show that FloatBoost yields a strong classiﬁer consisting of fewer weak classiﬁers yet achieves lower error rates. [sent-43, score-0.546]

31 For two class problems, a set of labelled training examples is given as , where is the class label associated with example . [sent-45, score-0.126]

32 A stronger classiﬁer is a linear combination of weak classiﬁers 2 ) 1 ¦ ) ( & 3%#0§'%#¨ $ " ! [sent-46, score-0.348]

33 ©£©§¥£¡ ¨ ¦ ¢ ¡ ¦ ¦ ¤ ¨ ¦ ¤ ¢ 8 6 $ " 9754¢ @ B ¢ S¡ (1) H G Q R¤ E ¢ FD£¡ B CA I PH 6 TCD£¡ $ ¢ In this real version of AdaBoost, the weak classiﬁers can take a real value, , and have absorbed the coefﬁcients needed in the discrete version (there, ). [sent-47, score-0.348]

34 During the learning process, the weights are updated dynamically in such a way that more emphasis is p ) ( ) 1 0W¦ %VUQ D£¡ $ ¢ H ¢ D£¡ B gA p H Q h ¢ i£¡ B gA fd c©`XD£¡ e b a Y E ¢ A ¢ placed on hard examples which are erroneously classiﬁed previously. [sent-50, score-0.132]

35 (Input) , (1) Training examples where ; of which examples have and examples have ; of weak classiﬁers to be combined; (2) The maximum number 1. [sent-54, score-0.57]

36 (Initialization) for those examples with or for those examples with . [sent-55, score-0.148]

37 The “margin” of an example An error occurs when achieved by on the training set examples is deﬁned as . [sent-64, score-0.208]

38 The upper bound on classiﬁcation error achieved by can be derived as the following exponential loss function [14] ¢ D£¡ ¨ 6 $ ¢ 7UD£¡ Q Q Q j kj ¦ l ihg p onm B CA G E 5 f A ¡ B " ¢ S¡ by stage-wise minimization of Eq. [sent-66, score-0.111]

39 can be adjusted to balance between detection rate and false alarm (ROC curve). [sent-70, score-0.358]

40 p ¤ B h m ¦ D¨ ¢ S¡ p } 3 FloatBoost Learning B FloatBoost backtracks after the newest weak classiﬁer is added and delete unfavorable weak classiﬁers from the ensemble (1), following the idea of Floating Search [11]. [sent-74, score-0.822]

41 Floating Search [11] is originally aimed to deal with non-monotonicity of straight sequential feature selection, non-monotonicity meaning that adding an additional feature may lead to drop in performance. [sent-75, score-0.12]

42 When a new feature is added, backtracks are performed to delete those features that cause performance drops. [sent-76, score-0.166]

43 Limitations of sequential feature selection is thus amended, improvement gained with the cost of increased computation due to the extended search. [sent-77, score-0.08]

44 (Input) (1) Training examples , ; of which examples have where and examples have ; of weak classiﬁers; (2) The maximum number (3) The error rate , and the acceptance threshold . [sent-79, score-0.681]

45 (Initialization) (1) for those examples with or for those examples with ; (2) max-value (for ), , . [sent-81, score-0.148]

46 (Conditional Exclusion) (1) ; , then (2) If (a) ; ; ; (b) ; (c) goto 3. [sent-85, score-0.05]

47 (1); (3) else (a) if or , then goto 4; (b) ; goto 2. [sent-86, score-0.1]

48 2 Let be the so-far-best set of weak classiﬁers; be the error rate achieved by (or a weighted sum of missing rate and false alarm rate which is usually the criterion in one-class detection problem); be the minimum error rate achieved so far with an ensemble of weak classiﬁers. [sent-96, score-1.377]

49 ¢ D£¡ 6 H Q ¤ B I PH q EF¢SQ ¡ B A Q ) B A ¡ 0 @ 5 4H20 3 1 In Step 2 (forward inclusion), given already selected, the best weak classiﬁer is added one at a time, which is the same as in AdaBoost. [sent-97, score-0.37]

50 In Step 3 (conditional exclusion), FloatBoost removes the least signiﬁcant weak classiﬁer from , subject to the condition that the removal leads to a lower error rate . [sent-98, score-0.481]

51 The procedure terminates when the risk on the training set is below or the maximum number is reached. [sent-100, score-0.052]

52 B ) f ¤ 5 4B1 0 3 h ¢ @ ¡1 Incorporating the conditional exclusion, FloatBoost renders both effective feature selection and classiﬁer learning. [sent-101, score-0.133]

53 It usually needs fewer weak classiﬁers than AdaBoost to achieve the same error rate . [sent-102, score-0.57]

54 Since deriving a weak classiﬁer in high dimensional space is a non-trivial task, here we provide a statistical model for stagewise learning of weak classiﬁers based on some scalar features. [sent-105, score-0.868]

55 A scaler feature of is computed by a transform from the -dimensional data space to the real line, . [sent-106, score-0.074]

56 A feature can be the coefﬁcient of, say, a wavelet transform in signal and image processing. [sent-107, score-0.101]

57 A dictionary of candidate scalar the transform, features can be created . [sent-109, score-0.119]

58 In the following, we use to denote the feature selected in the -th stage, while is the feature computed from using the -th transform. [sent-110, score-0.096]

59 Therefore, we have ¤ (10) (11) ¤ B ¦ D¨ ¦ D¨ ¡ p H p ¡ p ¦ D¨ B ¦ D¨ ¤ p ¡ ¤ p H ¤ H p B ¤ ¦ D¨ ¦ ¦ H ¢ £¡ p p m¥ ¤ ¡ B On the right-hand side of the above equation, all conditional densities are ﬁxed except the last one . [sent-114, score-0.052]

60 Learning the best weak classiﬁer at stage is to choose the best feature for such that is minimized according to Eq. [sent-115, score-0.486]

61 ) 0( E ¨ ¤ @ B ¦ D¨ B ¤ B ¦ D¨ ¡ p C ¡ The conditional probability densities for the positive class and the negative class can be estimated using the histograms computed from the weighted voting of the training examples using the weights . [sent-117, score-0.24]

62 We can derive the set of candidate weaker classiﬁers as 2 ¢ £ and among all in for the new strong classiﬁer is given by Eq. [sent-119, score-0.089]

63 (3) among all , for which the optimal weak classiﬁer has been derived as (7). [sent-120, score-0.348]

64 An alternative selection scheme is simply to choose so that the error rate (or some risk), computed from the two histograms and , is minimized. [sent-122, score-0.198]

65 Q p ¤ B h £ m ¦ ) #0( E ¨ p ¦¥ ¡ p m f § ¦ B 1 p ¤ B h m ¦ ) 1 #%'E ¨ p ¦ #¥ ) ¡ 5 Experimental Results Face Detection The face detection problem here is to classiﬁer an image of standard size (eg 20x20 pixels) into either face or nonface (imposter). [sent-123, score-0.548]

66 This is essentially a one-class problem in that everything not a face is a nonface. [sent-124, score-0.154]

67 Learning based methods have been the main approach for solving the problem , eg [13, 16, 9, 12]. [sent-126, score-0.03]

68 There, AdaBoost is used for learning face detection; it performs two important tasks: feature selection from a large collection features; and constructing classiﬁers using selected features. [sent-128, score-0.255]

69 Data Sets A set of 5000 face images are collected from various sources. [sent-129, score-0.154]

70 Another set of 5000 nonface examples of the same size are collected from images containing no faces. [sent-131, score-0.199]

71 The 5000 examples in each set is divided into a training set of 4000 examples and a test set of 1000 examples. [sent-132, score-0.2]

72 3 for a random sample of 10 face and 10 nonface examples. [sent-134, score-0.279]

73 Figure 3: Face (top) and nonface (bottom) examples. [sent-135, score-0.125]

74 Scalar Features Three basic types of scalar features are derived from each example, as shown in Fig. [sent-136, score-0.088]

75 Each candidate weak classiﬁer is constructed as the log likelihood ratio (12) computed from the two histograms of a scalar feature for the face ( ) and nonface ( ) examples (cf. [sent-140, score-0.921]

76 ¦¥ ¦¥ ¦¥ p ¤ h B ¨ # ¦ m ¦ D¨ p ¦¥ ¡ ) %1 E ¨ ) 0( E ¢ ¦ ¨ # D©¦ ¨ ¢ p m ¢ Figure 4: The three types of simple Harr wavelet like features deﬁned on a sub-window . [sent-142, score-0.061]

77 Each feature takes ) sum of the pixels in the rectangles. [sent-144, score-0.048]

78 The performance is measured by false alarm error rate given the detection rate ﬁxed at 99. [sent-146, score-0.469]

79 While a cascade of stronger classiﬁers are needed to achiever very low false alarm [19, 7], here we present the learning curves for the ﬁrst strong classiﬁer composed of up to one thousand weak classiﬁers. [sent-148, score-0.663]

80 Interested reader is referred to [7] for a complete system which achieved a false alarm of with the detection rate of 95%. [sent-150, score-0.382]

81 (A live demo of multi-view face detection system, the ﬁrst real-time system of the kind in the world, is being submitted to the conference). [sent-151, score-0.269]

82 35 100 200 300 400 500 600 # Weak Classifiers 700 800 900 1000 Figure 5: Error Rates of FloatBoost vs AdaBoost for frontal face detection. [sent-162, score-0.154]

83 The training and testing error curves for FloatBoost and AdaBoost are shown in Fig. [sent-163, score-0.133]

84 The following conclusions can be made from these curves: (1) Given the same number of learned features or weak classiﬁers, FloatBoost always achieves lower training error and lower test error than AdaBoost. [sent-166, score-0.622]

85 For example, on the test set, by combining 1000 weak classiﬁers, the false alarm of FloatBoost is 0. [sent-167, score-0.538]

86 (2) FloatBoost needs many fewer weak classiﬁers than AdaBoost in order to achieve the same false alarms. [sent-170, score-0.524]

87 For example, the lowest test error for AdaBoost is 0. [sent-171, score-0.058]

88 481 with 800 weak classiﬁers, whereas FloatBoost needs only 230 weak classiﬁers to achieve the same performance. [sent-172, score-0.754]

89 This clearly demonstrates the strength of FloatBoost in learning to achieve lower error rate. [sent-173, score-0.127]

90 It needs fewer weaker classiﬁers than AdaBoost to achieve a similar error rate, or achieves lower a error rate with the same number of weak classiﬁers. [sent-175, score-0.699]

91 Such a performance improvement is achieved with the cost of longer training time, about 5 times longer for the experiments reported in this paper. [sent-176, score-0.076]

92 Several methods can be used to make the training more efﬁcient with little drop in the training performance. [sent-178, score-0.128]

93 Noticing that only examples with large weigh values are inﬂuential, Friedman et al. [sent-179, score-0.074]

94 [5] propose to select examples with large weights, i. [sent-180, score-0.074]

95 those which in the past have been wrongly classiﬁed by the learned weak classiﬁers, for the training weak classiﬁer in t+- he next round. [sent-182, score-0.748]

96 Top examples within a fraction of of the total weight mass are used, where . [sent-183, score-0.074]

97 “Invited discussion on ‘Additive logistic regression: a statistical view of boosting (friedman, hastie and tibshirani)’ ”. [sent-191, score-0.131]

98 “Training support vector machines: An application to face detection”. [sent-244, score-0.154]

99 “Boosting the margin: A new explanation for the effectiveness of voting methods”. [sent-278, score-0.028]

100 “Rapid object detection using a boosted cascade of simple features”. [sent-304, score-0.236]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('floatboost', 0.565), ('adaboost', 0.368), ('weak', 0.348), ('classi', 0.297), ('ers', 0.212), ('face', 0.154), ('er', 0.128), ('alarm', 0.125), ('nonface', 0.125), ('detection', 0.115), ('boosting', 0.11), ('floating', 0.105), ('beijing', 0.084), ('china', 0.084), ('examples', 0.074), ('stagewise', 0.073), ('false', 0.065), ('asia', 0.063), ('backtrack', 0.063), ('viola', 0.062), ('error', 0.058), ('boosted', 0.056), ('exclusion', 0.055), ('scalar', 0.054), ('rate', 0.053), ('fewer', 0.053), ('training', 0.052), ('microsoft', 0.051), ('goto', 0.05), ('freund', 0.049), ('feature', 0.048), ('zhang', 0.047), ('margin', 0.045), ('annals', 0.044), ('cascade', 0.044), ('ga', 0.044), ('backtracks', 0.042), ('delete', 0.042), ('feda', 0.042), ('fxs', 0.042), ('realboost', 0.042), ('rqpig', 0.042), ('unfavorable', 0.042), ('inclusion', 0.042), ('friedman', 0.041), ('fd', 0.037), ('strong', 0.037), ('fedb', 0.036), ('histograms', 0.034), ('features', 0.034), ('selection', 0.032), ('needs', 0.032), ('schapire', 0.032), ('ratio', 0.031), ('learnable', 0.031), ('candidate', 0.031), ('eg', 0.03), ('exponential', 0.029), ('effective', 0.028), ('achieves', 0.028), ('voting', 0.028), ('gradient', 0.027), ('densities', 0.027), ('vancouver', 0.027), ('april', 0.027), ('normalize', 0.027), ('october', 0.027), ('wavelet', 0.027), ('transform', 0.026), ('achieve', 0.026), ('overcomplete', 0.025), ('ultimate', 0.025), ('bartlett', 0.025), ('search', 0.025), ('conditional', 0.025), ('stage', 0.025), ('achieved', 0.024), ('vision', 0.024), ('drop', 0.024), ('dimensional', 0.024), ('curves', 0.023), ('ay', 0.023), ('december', 0.023), ('ih', 0.023), ('proceedings', 0.023), ('cult', 0.023), ('detector', 0.022), ('log', 0.022), ('lower', 0.022), ('ca', 0.022), ('best', 0.022), ('forward', 0.021), ('weaker', 0.021), ('choose', 0.021), ('dif', 0.021), ('canada', 0.021), ('hastie', 0.021), ('ph', 0.021), ('object', 0.021), ('learning', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

2 0.28788644 46 nips-2002-Boosting Density Estimation

Author: Saharon Rosset, Eran Segal

Abstract: Several authors have suggested viewing boosting as a gradient descent search for a good ﬁt in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property applies to this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.

3 0.22066589 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

Abstract: We introduce a novel learning algorithm for binary classiﬁcation with hyperplane discriminants based on pairs of training points from opposite classes (dyadic hypercuts). This algorithm is further extended to nonlinear discriminants using kernel functions satisfying Mercer’s conditions. An ensemble of simple dyadic hypercuts is learned incrementally by means of a conﬁdence-rated version of AdaBoost, which provides a sound strategy for searching through the ﬁnite set of hypercut hypotheses. In experiments with real-world datasets from the UCI repository, the generalization performance of the hypercut classiﬁers was found to be comparable to that of SVMs and k-NN classiﬁers. Furthermore, the computational cost of classiﬁcation (at run time) was found to be similar to, or better than, that of SVM. Similarly to SVMs, boosted dyadic kernel discriminants tend to maximize the margin (via AdaBoost). In contrast to SVMs, however, we oﬀer an on-line and incremental learning machine for building kernel discriminants whose complexity (number of kernel evaluations) can be directly controlled (traded oﬀ for accuracy). 1

4 0.17530936 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

Author: Sepp Hochreiter, Klaus Obermayer

Abstract: We investigate the problem of learning a classiﬁcation task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to diﬀerent sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classiﬁer which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive deﬁnite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microarrays, where “column” objects correspond to samples, “row” objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classiﬁcation and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classiﬁcation results. 1

5 0.16336094 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

Abstract: The constraint classiﬁcation framework captures many ﬂavors of multiclass classiﬁcation including winner-take-all multiclass classiﬁcation, multilabel classiﬁcation and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classiﬁer in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classiﬁcation beneﬁts over existing methods of multiclass classiﬁcation.

6 0.16313949 181 nips-2002-Self Supervised Boosting

7 0.13782567 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

8 0.1199066 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

9 0.11080026 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

10 0.1091973 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

11 0.1069838 55 nips-2002-Combining Features for BCI

12 0.10668721 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

13 0.1062542 173 nips-2002-Recovering Intrinsic Images from a Single Image

14 0.10546837 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

15 0.1019488 140 nips-2002-Margin Analysis of the LVQ Algorithm

16 0.098169781 120 nips-2002-Kernel Design Using Boosting

17 0.092713863 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

18 0.091211818 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

19 0.089847088 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

20 0.087420307 149 nips-2002-Multiclass Learning by Probabilistic Embeddings

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.22), (1, -0.136), (2, 0.1), (3, 0.006), (4, 0.349), (5, -0.027), (6, -0.024), (7, -0.083), (8, 0.006), (9, -0.09), (10, -0.138), (11, 0.082), (12, -0.012), (13, 0.164), (14, -0.023), (15, 0.12), (16, -0.138), (17, -0.053), (18, 0.005), (19, -0.008), (20, -0.037), (21, 0.105), (22, -0.013), (23, 0.01), (24, -0.076), (25, 0.15), (26, -0.002), (27, -0.043), (28, 0.001), (29, -0.035), (30, 0.014), (31, 0.016), (32, -0.056), (33, 0.068), (34, 0.1), (35, -0.049), (36, 0.013), (37, 0.005), (38, 0.004), (39, -0.045), (40, 0.011), (41, -0.022), (42, 0.006), (43, -0.02), (44, -0.026), (45, 0.013), (46, -0.02), (47, 0.017), (48, 0.057), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95511115 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

2 0.74000061 46 nips-2002-Boosting Density Estimation

Author: Saharon Rosset, Eran Segal

3 0.7174083 45 nips-2002-Boosted Dyadic Kernel Discriminants

Author: Baback Moghaddam, Gregory Shakhnarovich

4 0.67041159 181 nips-2002-Self Supervised Boosting

Author: Max Welling, Richard S. Zemel, Geoffrey E. Hinton

Abstract: Boosting algorithms and successful applications thereof abound for classiﬁcation and regression learning problems, but not for unsupervised learning. We propose a sequential approach to adding features to a random ﬁeld model by training them to improve classiﬁcation performance between the data and an equal-sized sample of “negative examples” generated from the model’s current estimate of the data density. Training in each boosting round proceeds in three stages: ﬁrst we sample negative examples from the model’s current Boltzmann distribution. Next, a feature is trained to improve classiﬁcation performance between data and negative examples. Finally, a coefﬁcient is learned which determines the importance of this feature relative to ones already in the pool. Negative examples only need to be generated once to learn each new feature. The validity of the approach is demonstrated on binary digits and continuous synthetic data.

5 0.63257796 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

Author: Sariel Har-Peled, Dan Roth, Dav Zimak

6 0.62142622 55 nips-2002-Combining Features for BCI

7 0.61611432 196 nips-2002-The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging

8 0.59389114 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

9 0.57882863 72 nips-2002-Dyadic Classification Trees via Structural Risk Minimization

10 0.56961107 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

11 0.54348713 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

12 0.50811249 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

13 0.47183925 140 nips-2002-Margin Analysis of the LVQ Algorithm

14 0.46740094 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

15 0.45712435 58 nips-2002-Conditional Models on the Ranking Poset

16 0.4537203 109 nips-2002-Improving a Page Classifier with Anchor Extraction and Link Analysis

17 0.44847989 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

18 0.4333078 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

19 0.4217518 173 nips-2002-Recovering Intrinsic Images from a Single Image

20 0.41892523 89 nips-2002-Feature Selection by Maximum Marginal Diversity

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.014), (42, 0.071), (54, 0.081), (55, 0.031), (67, 0.012), (68, 0.011), (74, 0.071), (92, 0.032), (98, 0.569)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98874176 129 nips-2002-Learning in Spiking Neural Assemblies

Author: David Barber

Abstract: We consider a statistical framework for learning in a class of networks of spiking neurons. Our aim is to show how optimal local learning rules can be readily derived once the neural dynamics and desired functionality of the neural assembly have been speciﬁed, in contrast to other models which assume (sub-optimal) learning rules. Within this framework we derive local rules for learning temporal sequences in a model of spiking neurons and demonstrate its superior performance to correlation (Hebbian) based approaches. We further show how to include mechanisms such as synaptic depression and outline how the framework is readily extensible to learning in networks of highly complex spiking neurons. A stochastic quantal vesicle release mechanism is considered and implications on the complexity of learning discussed. 1

2 0.9861694 103 nips-2002-How Linear are Auditory Cortical Responses?

Author: Maneesh Sahani, Jennifer F. Linden

Abstract: By comparison to some other sensory cortices, the functional properties of cells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptive ﬁeld (STRF), which amounts to a model of the stimulusresponse function (SRF) that is linear in the spectrogram of the stimulus. How well can such a model account for neural responses at the very ﬁrst stages of auditory cortical processing? To answer this question, we develop a novel methodology for evaluating the fraction of stimulus-related response power in a population that can be captured by a given type of SRF model. We use this technique to show that, in the thalamo-recipient layers of primary auditory cortex, STRF models account for no more than 40% of the stimulus-related power in neural responses.

same-paper 3 0.97620589 92 nips-2002-FloatBoost Learning for Classification

Author: Stan Z. Li, Zhenqiu Zhang, Heung-yeung Shum, Hongjiang Zhang

4 0.97579378 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine

Author: Ralf Herbrich, Neil D. Lawrence, Matthias Seeger

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on informationtheoretic principles, previously suggested for active learning. Our goal is not only to learn d–sparse predictors (which can be evaluated in O(d) rather than O(n), d n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2 ), and in large real-world classiﬁcation experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be signiﬁcantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities (‘error bars’), allows for Bayesian model selection and is less complex in implementation. 1

5 0.96654987 56 nips-2002-Concentration Inequalities for the Missing Mass and for Histogram Rule Error

Author: Luis E. Ortiz, David A. McAllester

Abstract: This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding’s inequality, the Angluin-Valiant bound, Bernstein’s inequality, Bennett’s inequality, or McDiarmid’s theorem.

6 0.92440933 59 nips-2002-Constraint Classification for Multiclass Classification and Ranking

7 0.87310386 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

8 0.84131259 50 nips-2002-Circuit Model of Short-Term Synaptic Dynamics

9 0.82995921 184 nips-2002-Spectro-Temporal Receptive Fields of Subthreshold Responses in Auditory Cortex

10 0.82564211 43 nips-2002-Binary Coding in Auditory Cortex

11 0.82537985 110 nips-2002-Incremental Gaussian Processes

12 0.81277442 12 nips-2002-A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise

13 0.80425799 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

14 0.78789574 41 nips-2002-Bayesian Monte Carlo

15 0.78390211 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

16 0.78033644 180 nips-2002-Selectivity and Metaplasticity in a Unified Calcium-Dependent Model

17 0.77387083 199 nips-2002-Timing and Partial Observability in the Dopamine System

18 0.7667287 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

19 0.75770217 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

20 0.74610412 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex