nips nips2007 nips2007-166 knowledge-graph by maker-knowledge-mining

166 nips-2007-Regularized Boost for Semi-Supervised Learning


Source: pdf

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. [sent-4, score-0.418]

2 Several boosting algorithms have been extended to semi-supervised learning with various strategies. [sent-5, score-0.394]

3 To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. [sent-6, score-0.304]

4 In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. [sent-7, score-1.141]

5 Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. [sent-8, score-0.862]

6 Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. [sent-9, score-0.304]

7 We discuss relevant issues and relate our regularizer to previous work. [sent-10, score-0.363]

8 In semi-supervised learning, the ultimate goal is to find out a classification function which not only minimizes classification errors on the labeled training data but also must be compatible with the input distribution by inspecting their values on unlabeled data. [sent-15, score-0.406]

9 To work towards the goal, unlabeled data can be exploited to discover how data is distributed in the input space and then the information acquired from the unlabeled data is used to find a good classifier. [sent-16, score-0.515]

10 As a generic framework, regularization has been used in semi-supervised learning to exploit unlabeled data by working on well known semi-supervised learning assumptions, i. [sent-17, score-0.316]

11 , the smoothness, the cluster, and the manifold assumptions [1], which leads to a number of regularizers applicable to various semi-supervised learning paradigms, e. [sent-19, score-0.11]

12 As a generic ensemble learning framework [9] , boosting works by sequentially constructing a linear combination of base learners that concentrate on difficult examples, which results in a great success in supervised learning. [sent-22, score-0.896]

13 Recently boosting has been extended to semi-supervised learning with different strategies. [sent-23, score-0.394]

14 Within the universal optimization framework of margin cost functional [9], semi-supervised MarginBoost [10] and ASSEMBLE [11] were proposed by introducing the “pseudo-classes” to unlabeled data for characterizing difficult unlabeled examples. [sent-24, score-0.722]

15 In essence, such extensions work in a self-training way; the unlabeled data are assigned pseudo-class labels based on the constructed ensemble learner so far, and in turn the pseudo-class labels achieved will be used to find out a new proper learner to be added to the ensemble. [sent-25, score-0.796]

16 More recently, the Agreement Boost algorithm [13] has been developed with a theoretic justification of benefits from the use of multiple boosting learners within the co-training framework. [sent-29, score-0.578]

17 To our knowledge, however, none of the aforementioned semi-supervised boosting algorithms has taken the local smoothness constraints into account. [sent-30, score-0.582]

18 In this paper, we exploit the local smoothness constraints among data by introducing a regularizer to semi-supervised boosting. [sent-31, score-0.555]

19 Based on the universal optimization framework of margin cost functional for boosting [9], our regularizer is applicable to existing semi-supervised boosting algorithms [10]-[13]. [sent-32, score-1.496]

20 Experimental results on the synthetic, benchmark and real world classification tasks demonstrate its effectiveness of our regularizer in semi-supervised boosting learning. [sent-33, score-0.898]

21 2 briefly reviews semi-supervised boosting learning and presents our regularizer. [sent-35, score-0.394]

22 3 reports experimental results and the behaviors of regularized semi-supervised boosting algorithms. [sent-37, score-0.603]

23 2 Semi-supervised boosting learning and regularization In the section, we first briefly review the basic idea behind existing semi-supervised boosting algorithms within the universal optimization framework of margin cost functional [9] for making it self-contained. [sent-40, score-1.157]

24 Since there exists no label information available for unlabeled data, the critical idea underlying semi-supervised boosting is introducing a pseudo-class [11] or a pseudo margin [10] concept within the universal optimization framework [9] to unlabeled data. [sent-44, score-1.064]

25 The pseudo-class of an unlabeled example, x, is typically defined as y = sign[F (x)] [11] and its corresponding pseudo margin is yF (x) = |F (x)| [10],[11]. [sent-51, score-0.291]

26 In the universal optimization framework [9], constructing an ensemble learner needs to choose a base learner, f (x), to maximize the inner product − C(F ), f . [sent-53, score-0.517]

27 For unlabeled data, a subgradient of C(F ) in (1) has been introduced to tackle its non-differentiable problem [11] and then unlabeled data of pseudo-class labels can be treated in the same way as labeled data in the optimization problem. [sent-54, score-0.638]

28 As a result, finding a proper f (x) amounts to maximizing − C(F ), f = αi C [yi F (xi )] − i:f (xi )=yi αi C [yi F (xi )], (2) i:f (xi )=yi where yi is the true class label if xi is a labeled example or a pseudo-class label otherwise. [sent-55, score-0.536]

29 From (3), a proper base learner, f (x), can be found by minimizing weighted i∈S αi C [yi F (xi )] errors i:f (xi )=yi D(i). [sent-57, score-0.247]

30 Thus, any boosting algorithms specified for supervised learning [9] are now applicable to semi-supervised learning with the aforementioned treatment. [sent-58, score-0.497]

31 For co-training based semi-supervised boosting algorithms [12],[13], the above semi-supervised boosting procedure is applied to each view of data to build up a component ensemble learner. [sent-59, score-0.929]

32 Instead of self-training, the pseudo-class label of an unlabeled example for a specific view is determined by ensemble learners trained on other views of this example. [sent-60, score-0.567]

33 For example, the Agreement Boost [13] defines the co-training cost functional as J 1 J C[yi F j (xi )] + η C(F , · · · , F ) = j=1 xi ∈L C[−V (xi )]. [sent-61, score-0.226]

34 (4) xi ∈U Here J views of data are used to train J ensemble learners, F 1 , · · · , F J , respectively. [sent-62, score-0.266]

35 The disagreeJ 1 ment of J ensemble learners for an unlabeled example, xi ∈ U , is V (xi ) = J j=1 [F j (xi )]2 − 1 J J j=1 F j (xi ) 2 and the weight η ∈ R+ . [sent-63, score-0.656]

36 In light of view j, the pseudo-class label of an unlaJ 1 beled example, xi , is determined by yi = sign J j=1 F j (xi ) − F j (xi ) . [sent-64, score-0.309]

37 Thus, the minimization of (3) with such pseudo-class labels leads to a proper base learner f j (x) to be added to F j (x). [sent-65, score-0.405]

38 4, associated with each training example and the local smoothness around an example, xi , is measured by ˜ R(i) = Wij C(−Iij ). [sent-68, score-0.313]

39 (6) j:xj ∈S,j=i Here, Iij is a class label compatibility function for two different examples xi , xj ∈ S and defined as Iij = |yi − yj | where yi and yj are the true labels of xi and xj for labeled data or their pseudo-class ˜ labels otherwise. [sent-69, score-0.66]

40 To find a proper base learner, f (x), we now need to maximize T (F, f ) in (5) so as to minimize not only misclassification errors as before (see Sect. [sent-72, score-0.247]

41 1) but also the local class label incompatibility cost for smoothness. [sent-74, score-0.231]

42 In order to use the objective function in (5) for boosting learning, we need to have the new empirical data distribution and the termination condition. [sent-75, score-0.457]

43 (9) In (9), the first term refers to misclassification errors while the second term corresponds to the class label incompatibility of a data point with its nearby data points even though this data point itself fits well. [sent-83, score-0.249]

44 In contrast to (3), finding a proper base learner, f (x), now needs to minimize not only the misclassification errors but also the local class label incompatibility in our Regularized Boost. [sent-84, score-0.421]

45 where = i:f (xi )=yi D(i) + i:f (xi )=yi k:xk ∈S αk C [yk F (xk )]−βk R(k) Once finding an optimal base learner, ft+1 (x), at step t + 1, we need to choose a proper weight, wt+1 , to form a new ensemble, Ft+1 (x) = Ft (x) + wt+1 ft+1 (x). [sent-86, score-0.216]

46 In our Regularized Boost, we 1 choose wt+1 = 2 log 1− by simply treating pseudo-class labels for unlabeled data as same as true labels of labeled data, as suggested in [11]. [sent-87, score-0.44]

47 3 Experiments In this section, we report experimental results on synthetic, benchmark and real data sets. [sent-88, score-0.112]

48 Although our regularizer is applicable to existing semi-supervised boosting [10]-[13], we mainly apply it to the ASSEMBLE [11], a winning algorithm from the NIPS 2001 Unlabeled Data Competition, on a variety of classification tasks. [sent-89, score-0.837]

49 In addition, our regularizer is also used to train component ensemble learners of the Agreement Boost [13] for binary classification benchmark tasks since the algorithm [13] in its original form can cope with binary classification only. [sent-90, score-0.767]

50 For synthetic and benchmark data sets, we always randomly select 20% of examples as testing data except that a benchmark data set has pre-defined a training/test split. [sent-92, score-0.289]

51 , labeled data (L) and unlabeled data (U ), and the ratio between labeled and unlabeled data is 1:4 in our experiments. [sent-95, score-0.729]

52 To test the effectiveness of our Regularized Boost across different base learners, we perform all experiments with K nearest-neighbors (KNN) classifier, a local classifier, and multi-layer perceptron (MLP), a global classifier, where 3NN and a single hidden layered MLP are used in our experiments. [sent-97, score-0.258]

53 For comparison, we report results of a semi-supervised boosting algorithm (i. [sent-98, score-0.394]

54 , ASSEMBLE [11] or Agreement Boost [13]) and its regularized version (i. [sent-100, score-0.168]

55 In addition, we also provide results of a variant of Adaboost [14] trained on the labeled data only for reference. [sent-103, score-0.136]

56 The above experimental method conforms to those used in semi-supervised boosting methods [10]-[13] as well as other empirical studies of semi-supervised learning methods, e. [sent-104, score-0.394]

57 We wish to test our regularizer on this intuitive multi-class classification task of a high optimal Bayes error. [sent-109, score-0.363]

58 Our further observation via visualization with the ground truth indicates that the use of our regularizer leads to smoother decision boundaries than the original ASSEMBLE, which yields the better generalization performance. [sent-119, score-0.434]

59 2 Benchmark data sets To assess the performance of our regularizer for semi-supervised boosting algorithms, we perform a series of experiments on benchmark data sets from the UCI machine learning repository [16] without any data transformation. [sent-121, score-0.927]

60 In our experiments, we use the same initialization conditions for all boosting algorithms. [sent-122, score-0.394]

61 Our empirical work suggests that a maximum number of 100 boosting steps is sufficient to achieve the reasonable performance for those benchmark tasks. [sent-123, score-0.477]

62 Hence, we set such a maximum number of boosting steps to stop all boosting algorithms for a sensible comparison. [sent-124, score-0.788]

63 )% of AdaBoost, ASSEMBLE and Regularized Boost (RegBoost) with different base learners on five UCI classification data sets. [sent-127, score-0.366]

64 2 Table 1 tabulates the results of different boosting learning algorithms. [sent-188, score-0.431]

65 In contrast, the ASSEMBLE [11] on 3NN equipped with our regularizer yields an error rate of 2. [sent-199, score-0.412]

66 7% on average despite the fact that our Regularized Boost algorithm simply uses 765 labeled examples. [sent-200, score-0.107]

67 6 We further apply our regularizer to the Agreement Boost [13]. [sent-243, score-0.363]

68 As required by the Agreement Boost [13], the KNN and the MLP classifiers as base learners are used to construct two component ensemble learners without and with the use of our regularizer in experiments, which corresponds to its original and regularized version of the Agreement Boost. [sent-248, score-1.164]

69 Table 2 tabulates results produced by different boosting algorithms. [sent-249, score-0.431]

70 5 10 20 30 40 50 60 70 80 Number of base learners 90 100 5 0 5. [sent-263, score-0.337]

71 5 10 20 30 40 50 60 70 80 Number of base learners 5 0 90 100 10 20 30 40 50 60 70 80 Number of base learners 90 100 (a) (b) (c) Figure 2: Behaviors of semi-supervised boosting algorithms: the original version vs. [sent-264, score-1.068]

72 We investigate behaviors of regularized semi-supervised boosting algorithms on two largest data sets, OPTDIGITS and VR-vs-VP. [sent-269, score-0.632]

73 Figure 2 shows the averaging generalization performance achieved by stopping a boosting algorithm at different boosting steps. [sent-270, score-0.843]

74 From Figure 2, the use of our regularizer in the ASSEMBLE regardless of base learners adopted and the Agreement Boost always yields fast training. [sent-271, score-0.7]

75 As illustrated in Figures 2(a) and 2(b), the regularized version of the ASSEMBLE with KNN and MLP takes only 22 and 46 boosting steps on average to reach the performance of the original ASSEMBLE after 100 boosting steps, respectively. [sent-272, score-0.956]

76 Similarly, Figure 2(c) shows that the regularized Agreement Boost takes only 12 steps on average to achieve the performance of its original version after 100 boosting steps. [sent-273, score-0.562]

77 3 Facial expression recognition Facial expression recognition is a typical semi-supervised learning task since labeling facial expressions is an extremely expensive process and very prone to errors due to ambiguities. [sent-275, score-0.214]

78 (a) Exemplar pictures corresponding to seven universal facial expressions. [sent-281, score-0.234]

79 In our experiments, we first randomly choose 20% images (balanced to seven classes) as testing data and the rest of images constitute a training set (S) randomly split into labeled (L) and unlabeled (U ) data of equal size in each trial. [sent-283, score-0.435]

80 We set a maximum number of 1000 boosting rounds to stop the algorithms if their termination conditions are not met while the same initialization is used for all boosting algorithms. [sent-286, score-0.822]

81 From Figure 3(b), it is evident that the ASSEMBLE with our regularizer yields 5. [sent-288, score-0.396]

82 82% error reduction on average; an averaging error rate of 26. [sent-289, score-0.105]

83 , [18] where around 70% images were used to train a convolutional neural network and an averaging error rate of 31. [sent-292, score-0.105]

84 4 Discussions In this section, we discuss issues concerning our regularizer and relate it to previous work in the context of regularization in semi-supervised learning. [sent-294, score-0.436]

85 As defined in (5), our regularizer has a parameter, βi , associated with each training point, which can be used to encode the information of the marginal or input distribution, P (x), by setting βi = λP (x) where λ is a tradeoff or regularization parameter. [sent-295, score-0.482]

86 Our local smoothness regularizer plays an important role in re-sampling all training data including labeled and unlabeled data for boosting learning. [sent-300, score-1.324]

87 For unlabeled data, such an effect always makes sense to work on the smoothness and the cluster assumptions [1] as performed by existing regularization techniques [3]-[8]. [sent-302, score-0.438]

88 For labeled data, it actually has an effect that the labeled data points located in a “non-smoothing” region is more likely to be retained in the next round of boosting learning. [sent-303, score-0.663]

89 As exemplified in Figure 1, such points are often located around boundaries between different classes and therefore more informative in determining a decision boundary, which would be another reason why our regularizer improves the generalization of semi-supervised boosting algorithms. [sent-304, score-0.876]

90 In essence, the objective of using manifold smoothness in our Regularized Boost is identical to theirs in [19] but we accomplish it in a different way. [sent-306, score-0.181]

91 By comparison with existing regularization techniques used in semi-supervised learning, our Regularized Boost is closely related to graph-based semi-supervised learning methods, e. [sent-309, score-0.104]

92 In general, a graph-based method wants to find a function to simultaneously satisfy two conditions [2]: a) it should be close to given labels on the labeled nodes, and b) it should be smooth on the whole graph. [sent-312, score-0.152]

93 In particular, the work in [8] develops a regularization framework to carry out the above idea by defining the global and local consistency terms in their cost function. [sent-313, score-0.197]

94 Similarly, our cost function in (9) has two terms explicitly corresponding to global and local consistency though true labels of labeled data never change during our boosting learning, which resembles theirs [8]. [sent-314, score-0.675]

95 , [5],[6], our regularization takes effect on both labeled and unlabeled data while theirs are based on unlabeled data only. [sent-322, score-0.666]

96 5 Conclusions We have proposed a local smoothness regularizer for semi-supervising boosting learning and demonstrated its effectiveness on different types of data sets. [sent-323, score-0.982]

97 In our ongoing work, we are working for a formal analysis to justify the advantage of our regularizer and explain the behaviors of Regularized Boost, e. [sent-324, score-0.404]

98 (2005) The value of agreement, a new boosting algorithm. [sent-405, score-0.394]

99 (2000) Using EM to classify text from labeled and unlabeled documents. [sent-420, score-0.321]

100 (2004) Boosting on manifolds: adaptive regularization of base classifier. [sent-446, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('boosting', 0.394), ('regularizer', 0.363), ('assemble', 0.339), ('boost', 0.274), ('unlabeled', 0.214), ('learners', 0.184), ('regularized', 0.168), ('mlp', 0.167), ('adaboost', 0.159), ('base', 0.153), ('learner', 0.144), ('knn', 0.129), ('yi', 0.127), ('xi', 0.125), ('smoothness', 0.12), ('ensemble', 0.112), ('labeled', 0.107), ('agreement', 0.104), ('facial', 0.095), ('bupa', 0.093), ('regboost', 0.093), ('regularizedboost', 0.093), ('universal', 0.084), ('benchmark', 0.083), ('wdbc', 0.081), ('classi', 0.075), ('incompatibility', 0.074), ('optdigits', 0.074), ('regularization', 0.073), ('proper', 0.063), ('misclassi', 0.062), ('manifold', 0.061), ('ft', 0.06), ('cost', 0.057), ('label', 0.057), ('margin', 0.056), ('xk', 0.052), ('applicable', 0.049), ('uci', 0.049), ('iij', 0.048), ('inductive', 0.047), ('labels', 0.045), ('functional', 0.044), ('local', 0.043), ('behaviors', 0.041), ('cation', 0.039), ('yk', 0.039), ('agreementboost', 0.037), ('bswd', 0.037), ('jaffe', 0.037), ('tabulates', 0.037), ('wt', 0.036), ('synthetic', 0.036), ('ma', 0.034), ('termination', 0.034), ('effectiveness', 0.033), ('evident', 0.033), ('australian', 0.032), ('errors', 0.031), ('seven', 0.031), ('car', 0.031), ('existing', 0.031), ('averaging', 0.03), ('wij', 0.03), ('layered', 0.029), ('data', 0.029), ('supervised', 0.029), ('exempli', 0.027), ('grandvalet', 0.027), ('ve', 0.027), ('manchester', 0.026), ('convolutional', 0.026), ('located', 0.026), ('error', 0.026), ('generalization', 0.025), ('boundaries', 0.025), ('tasks', 0.025), ('female', 0.025), ('aforementioned', 0.025), ('reliability', 0.025), ('wisconsin', 0.025), ('training', 0.025), ('framework', 0.024), ('pictures', 0.024), ('cambridge', 0.023), ('rate', 0.023), ('king', 0.023), ('recognition', 0.022), ('expression', 0.022), ('transductive', 0.022), ('essence', 0.022), ('vote', 0.022), ('improves', 0.022), ('monotonically', 0.021), ('marginal', 0.021), ('decision', 0.021), ('pseudo', 0.021), ('advances', 0.021), ('mit', 0.021), ('weight', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

2 0.24659067 147 nips-2007-One-Pass Boosting

Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio

Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

3 0.16546449 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

Author: Larry Wasserman, John D. Lafferty

Abstract: Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning. 1

4 0.14983004 69 nips-2007-Discriminative Batch Mode Active Learning

Author: Yuhong Guo, Dale Schuurmans

Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms. 1

5 0.14561634 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

Author: Maryam Mahdaviani, Tanzeem Choudhury

Abstract: We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs). In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. In this paper, we introduce the semi-supervised virtual evidence boosting (sVEB) algorithm for training CRFs – a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. The objective function of sVEB combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. It reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real-world inference systems. Experiments on synthetic data and real activity traces collected from wearable sensors, illustrate that sVEB benefits from both the use of unlabeled data and automatic feature selection, and outperforms other semi-supervised approaches. 1

6 0.14326903 39 nips-2007-Boosting the Area under the ROC Curve

7 0.12669556 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

8 0.12520835 32 nips-2007-Bayesian Co-Training

9 0.12299958 175 nips-2007-Semi-Supervised Multitask Learning

10 0.11944968 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

11 0.11809663 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

12 0.11618648 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

13 0.10574103 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

14 0.10181615 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

15 0.10084089 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

16 0.089754574 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

17 0.08289215 136 nips-2007-Multiple-Instance Active Learning

18 0.074784569 62 nips-2007-Convex Learning with Invariances

19 0.073591068 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

20 0.071601465 40 nips-2007-Bundle Methods for Machine Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.214), (1, 0.052), (2, -0.173), (3, 0.197), (4, 0.072), (5, 0.156), (6, 0.155), (7, -0.161), (8, 0.203), (9, -0.005), (10, 0.052), (11, 0.051), (12, -0.193), (13, -0.073), (14, 0.122), (15, 0.086), (16, 0.018), (17, 0.091), (18, 0.012), (19, 0.051), (20, 0.017), (21, 0.042), (22, -0.102), (23, 0.034), (24, -0.006), (25, 0.046), (26, 0.028), (27, -0.042), (28, -0.029), (29, 0.005), (30, 0.076), (31, 0.01), (32, -0.046), (33, -0.069), (34, -0.069), (35, -0.047), (36, 0.01), (37, 0.012), (38, 0.02), (39, 0.053), (40, 0.028), (41, 0.02), (42, -0.03), (43, 0.022), (44, -0.041), (45, 0.054), (46, 0.05), (47, 0.044), (48, -0.116), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96299493 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

2 0.74468738 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

Author: Maryam Mahdaviani, Tanzeem Choudhury

Abstract: We present a new and efficient semi-supervised training method for parameter estimation and feature selection in conditional random fields (CRFs). In real-world applications such as activity recognition, unlabeled sensor traces are relatively easy to obtain whereas labeled examples are expensive and tedious to collect. Furthermore, the ability to automatically select a small subset of discriminatory features from a large pool can be advantageous in terms of computational speed as well as accuracy. In this paper, we introduce the semi-supervised virtual evidence boosting (sVEB) algorithm for training CRFs – a semi-supervised extension to the recently developed virtual evidence boosting (VEB) method for feature selection and parameter learning. The objective function of sVEB combines the unlabeled conditional entropy with labeled conditional pseudo-likelihood. It reduces the overall system cost as well as the human labeling cost required during training, which are both important considerations in building real-world inference systems. Experiments on synthetic data and real activity traces collected from wearable sensors, illustrate that sVEB benefits from both the use of unlabeled data and automatic feature selection, and outperforms other semi-supervised approaches. 1

3 0.7275368 147 nips-2007-One-Pass Boosting

Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio

Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

4 0.67689711 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

Author: Joseph K. Bradley, Robert E. Schapire

Abstract: We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. 1

5 0.6368832 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

Author: Kaushik Sinha, Mikhail Belkin

Abstract: Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received significant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unlabeled data remains somewhat limited. The simplest and the best understood situation is when the data is described by an identifiable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in [11, 5]. One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identifiable components. There have been recent efforts to analyze the non-parametric situation, for example, “cluster” and “manifold” assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in [11, 5] has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identifiable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model.

6 0.6256873 39 nips-2007-Boosting the Area under the ROC Curve

7 0.61427557 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

8 0.59893298 69 nips-2007-Discriminative Batch Mode Active Learning

9 0.56592643 175 nips-2007-Semi-Supervised Multitask Learning

10 0.54398799 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

11 0.54370129 32 nips-2007-Bayesian Co-Training

12 0.50842285 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

13 0.4880147 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

14 0.48679572 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

15 0.4747031 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images

16 0.44243476 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

17 0.43275127 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

18 0.42312711 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

19 0.403813 136 nips-2007-Multiple-Instance Active Learning

20 0.40269262 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.044), (13, 0.058), (16, 0.025), (19, 0.024), (21, 0.089), (31, 0.037), (34, 0.012), (35, 0.024), (47, 0.083), (49, 0.022), (83, 0.116), (85, 0.018), (87, 0.019), (88, 0.287), (90, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77771544 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

2 0.7371605 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola

Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1

3 0.6668632 215 nips-2007-What makes some POMDP problems easy to approximate?

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

4 0.57196862 158 nips-2007-Probabilistic Matrix Factorization

Author: Andriy Mnih, Ruslan Salakhutdinov

Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.

5 0.56132567 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

6 0.55846077 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

7 0.54674608 156 nips-2007-Predictive Matrix-Variate t Models

8 0.53978604 69 nips-2007-Discriminative Batch Mode Active Learning

9 0.53514099 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

10 0.53487521 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

11 0.532839 86 nips-2007-Exponential Family Predictive Representations of State

12 0.53241426 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

13 0.53185821 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

14 0.53078377 24 nips-2007-An Analysis of Inference with the Universum

15 0.52872157 63 nips-2007-Convex Relaxations of Latent Variable Training

16 0.5274455 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

17 0.52670652 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

18 0.52602601 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

19 0.52546871 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

20 0.5253678 84 nips-2007-Expectation Maximization and Posterior Constraints