jmlr jmlr2006 jmlr2006-12 knowledge-graph by maker-knowledge-mining

12 jmlr-2006-Active Learning with Feedback on Features and Instances

Source: pdf

Author: Hema Raghavan, Omid Madani, Rosie Jones

Abstract: We extend the traditional active learning framework to include feedback on features in addition to labeling instances, and we execute a careful study of the effects of feature selection and human feedback on features in the setting of text categorization. Our experiments on a variety of categorization tasks indicate that there is signiﬁcant potential in improving classiﬁer performance by feature re-weighting, beyond that achieved via membership queries alone (traditional active learning) if we have access to an oracle that can point to the important (most predictive) features. Our experiments on human subjects indicate that human feedback on feature relevance can identify a sufﬁcient proportion of the most relevant features (over 50% in our experiments). We ﬁnd that on average, labeling a feature takes much less time than labeling a document. We devise an algorithm that interleaves labeling features and documents which signiﬁcantly accelerates standard active learning in our simulation experiments. Feature feedback can complement traditional active learning in applications such as news ﬁltering, e-mail classiﬁcation, and personalization, where the human teacher can have signiﬁcant knowledge on the relevance of features. Keywords: active learning, feature selection, relevance feedback, term feedback, text classiﬁcation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our experiments on human subjects indicate that human feedback on feature relevance can identify a sufﬁcient proportion of the most relevant features (over 50% in our experiments). [sent-8, score-0.876]

2 We devise an algorithm that interleaves labeling features and documents which signiﬁcantly accelerates standard active learning in our simulation experiments. [sent-10, score-0.916]

3 Feature feedback can complement traditional active learning in applications such as news ﬁltering, e-mail classiﬁcation, and personalization, where the human teacher can have signiﬁcant knowledge on the relevance of features. [sent-11, score-1.107]

4 In the standard active learning paradigm, learning proceeds sequentially, with the learning algorithm actively asking for the labels (categories) of some instances from a teacher (also referred to as membership queries). [sent-28, score-0.699]

5 The objective is to ask the teacher to label the most informative instances in order to reduce labeling costs and accelerate the learning. [sent-29, score-0.64]

6 Still, in text categorization applications in particular, active learning might be perceived to be too slow, especially since the teacher may have much prior knowledge on relevance of features for the task. [sent-30, score-0.825]

7 In experiments in this paper we study the beneﬁts and costs of feature feedback via humans on active learning. [sent-39, score-0.8]

8 We try to ﬁnd a marriage between approaches to incorporating user feedback from machine learning and information retrieval and show that active learning should be a twofold process – at the term-level and at the document-level. [sent-40, score-0.833]

9 This has applications in e-mail classiﬁcation and news ﬁltering where the user has knowledge of the relevance of features and a willingness to label some (as few as possible) documents in order to build a system that suits her needs. [sent-43, score-0.853]

10 We state the active learning problems that we address and present our approach to use feedback on both features and instances to solve the problems in Section 2. [sent-46, score-0.85]

11 2 we show that the human-chosen features signiﬁcantly accelerate learning in experiments that simulate human feedback in an active learning loop. [sent-63, score-0.877]

12 Standard Active Learning Input: T (Total number of feedback iterations), U (Pool of unlabeled instances), init size (number of random feedback iterations) Output: M T (Model) t = 1; U 0 = U ; M 0 =NULL; 1. [sent-65, score-0.772]

13 t + + Return M T Teacher/ Oracle t<=T Instance Selection M Steps 1,2 Figure 1: Algorithm and block diagram for traditional active learning where the system asks for feedback on instances only (System 1). [sent-81, score-0.757]

14 The teacher may also be called the user, especially when the teacher training the model is the same as the person using it, for example a user who is training a personalized news ﬁltering system. [sent-87, score-0.811]

15 Traditionally in active learning the teacher is asked membership queries which are questions on the class labels or categories of selected instances (documents in our case). [sent-88, score-0.851]

16 We will also use the term oracle to refer to a source that gives feedback on instances and/or features, but in this paper we make a distinction between teacher and oracle. [sent-90, score-0.98]

17 We will reserve the term teacher or user to refer to a real human, whose feedback may not be perfect, and we use the term oracle to refer to a source whose feedback is (close to) perfect for speeding active learning. [sent-91, score-1.661]

18 Active Learning Augmented with Feature Feedback Input: T (Total number of feedback iterations), U (Pool of unlabeled instances), init size (number of random feedback iterations) Output: M T (Model) t = 1; U 0 = U ; M 0 =NULL; 1. [sent-107, score-0.772]

19 t <= init_size Instance Selection M Step 1 t <= T Feature Selection M Teacher/ Oracle Instance Selection Step 2 Figure 2: An active learning system where feedback on features is also requested (System 2). [sent-139, score-0.784]

20 1 Our Proposal: Feature Feedback and Instance Feedback in Tandem In this paper we propose to extend the traditional active learning framework to engage the teacher in providing feedback on features in addition to instances. [sent-158, score-1.061]

21 A realization of this idea is system 2 shown in Figure 2, where the active learner not only queries the teacher on an informative document, but also presents a list of f features for the teacher to judge (step 2. [sent-159, score-1.153]

22 In this paper we aim to determine whether a human teacher can answer the latter type of question sufﬁciently effectively so that active learning is accelerated signiﬁcantly. [sent-167, score-0.654]

23 Therefore, it is not clear whether a human teacher can considerably accelerate the training of a statistical classiﬁer, beyond simple active learning, by providing feedback on features. [sent-170, score-1.003]

24 Before we address that issue, we determine whether feature feedback can accelerate active learning in an idealized setting. [sent-171, score-0.764]

25 For example the word ct is a highly relevant feature for classifying Reuters news articles on the earnings category and our oracle would be able to determine that this feature is relevant when asked. [sent-177, score-0.934]

26 Therefore, the oracle and teacher may differ in their answers to questions about features, that is, questions of type (2) above. [sent-180, score-0.655]

27 We assume that the oracle and the teacher always agree on the labels of documents that is, questions of type (1) above. [sent-181, score-0.945]

28 After showing the usefulness of oracle feature selection, we will then show that humans can emulate the oracle for feature feedback to an extent that results in signiﬁcant improvements over traditional active learning. [sent-182, score-1.605]

29 Use of Feature Feedback Before Active Learning Input: T (Total number of feedback iterations), U (Pool of unlabeled instances), init size (number of random feedback iterations) Output: M T (Model) t = 1; U 0 = U ; M 0 =NULL; 1. [sent-193, score-0.772]

30 t + + Return M T Teacher/ Oracle Feature Selection M Steps 1,2 t<= T Instance Selection M Step 3,4 Figure 3: An active learning system where feature selection is done before instance selection (System 3). [sent-224, score-0.632]

31 The second type of experiment is a slight variation designed to isolate the effect of oracle feature selection on example selection versus model selection during active learning. [sent-228, score-0.918]

32 In these experiments, active learning proceeds normally with all the features available, but after all the instances are picked (after T iterations), the best set of k features that improve the resulting trained classiﬁer the most are picked and the resulting performance is reported. [sent-229, score-0.843]

33 In short, the trajectory of the active learning process, that is, the instances labeled and classiﬁers learned, can be different in the two regimes, which may lead to substantially different active learning performance. [sent-235, score-0.771]

34 1) because it also allows the system to beneﬁt from the increase in the knowledge of the teacher or the process may help remind the teacher about the usefulness of features as she reads the documents. [sent-241, score-0.746]

35 For example, the teacher who did not know that ct stood for cents may realize that the word is indeed relevant upon reading documents containing the term. [sent-242, score-0.703]

36 Use of Feature Feedback After Active Learning Input: T (Total number of feedback iterations), U (Pool of unlabeled instances, init size (number of random feedback iterations) Output: M T (Model) t = 1; U 0 = U ; M 0 =NULL; 1. [sent-244, score-0.772]

37 , Fk }) t<=T Teacher/ Oracle Instance Selection M Step 1,2 Feature Selection M Step 4,5 Return M T Figure 4: An active learning system where feature selection is done after instance selection (System 4). [sent-275, score-0.632]

38 2 Active Learning: Uncertainty Sampling Uncertainty sampling (Lewis and Catlett, 1994) is a type of active learning in which the instance that the teacher is queried on is the unlabeled instance that the classiﬁer is most uncertain about. [sent-295, score-0.826]

39 4 Construction of the Approximate Feature Oracle The (feature) oracle in our experiments has access to the labels of all documents in the data-set (hence the name oracle) and uses this information to return a ranked list of features sorted in decreasing order of importance. [sent-311, score-0.775]

40 In our oracle experiments, we cut off the ranked list (therefore obtaining a feature subset) at the point that yields the highest average active learning performance. [sent-317, score-0.702]

41 For example, the words storm and damage each respectively occur in 50% and 27% of the documents on hurricane Mitch and in 75% and 54% of the documents on hurricane George. [sent-344, score-1.014]

42 1663 R AGHAVAN , M ADANI AND J ONES 42% documents on hurricane Mitch and in 0 documents on hurricane George. [sent-352, score-1.014]

43 05% documents on the topic of hurricane Mitch and in 88% of the documents on hurricane George. [sent-354, score-1.047]

44 We randomly sampled documents such that at least ﬁve documents belonged to each class. [sent-375, score-0.668]

45 We ask the user to ﬁrst label the features and then documents, so that the feature labeling process receives no beneﬁt due to the fact that the user has viewed relevant documents. [sent-380, score-0.93]

46 In the learning process we have proposed, though, the user would be labeling documents and features simultaneously, so the user would indeed be inﬂuenced by the documents she reads. [sent-381, score-1.329]

47 The metric compares this difference when t documents have been actively picked to the difference when t documents have been randomly picked for increasing number of training documents t. [sent-411, score-1.156]

48 Results: Experiments with an Oracle In this section we seek the answer to the following questions: • Can feature feedback signiﬁcantly boost active learning performance? [sent-452, score-0.73]

49 • Should we use feature feedback during the entire active learning process (both instance selection, and model selection) or only for model selection? [sent-453, score-0.771]

50 To measure how much gain we can get from feature feedback we can measure the impact of the oracle (which has knowledge of the best set of features) on active learning. [sent-454, score-1.05]

51 This gives us an upper bound on how useful feature feedback is for active learning. [sent-455, score-0.73]

52 1 on average), while with more documents labeled, (22 documents labeled for F122 ) the optimal number of features is larger (11851. [sent-517, score-0.929]

53 Also, for 31 of 42 cases m ≤ p (that is, the number of features optimal for seven labeled instances, m is less than the number of features optimal for 22 labeled instances, p) meaning that as the number of labeled instances (t) increases, the complexity of the classiﬁer also needs to increase. [sent-552, score-0.727]

54 2 the difference between systems 3 and 4 is in that feature selection precedes active learning in the former, and the best feature subset is picked in a retrospective manner, while it follows active learning in the latter. [sent-557, score-0.979]

55 The two systems when used with oracle feature selection will help us understand the extent to which oracle feedback aids different aspects of the active learning process. [sent-558, score-1.376]

56 1 0 1 2 3 4 5 6 Category 7 8 9 10 (c) E42 Figure 9: F17 , F122 and efﬁciency E42 for the Reuters corpus when feature selection is done before active learning (system 3) and when feature selection is done after active learning (system 4). [sent-684, score-1.033]

57 If the feature labels people provide are imperfect, is the feedback still beneﬁcial to active learning? [sent-702, score-0.73]

58 We may expect this to be more efﬁcient, since documents are often long and may contain redundant or irrelevant content, and results from our oracle experiments indicate great potential in doing feature selection. [sent-705, score-0.753]

59 We employed this approach rather than one where an actual user labels features and documents in tandem because our approach allows us to run many repeated trials of our experiments, enabling us to do signiﬁcance testing. [sent-707, score-0.669]

60 We evaluated user feature labeling by calculating their average precision and recall at identifying the top 20 features as ranked by an oracle using information gain on the entire labeled set. [sent-711, score-1.088]

61 ) and an active learner which has seen 50 documents (@50). [sent-763, score-0.668]

62 Note that precision and recall denote the ability of the user to recognize the oracle features and are not measures of classiﬁcation accuracy. [sent-764, score-0.671]

63 Average labeling times for features and documents are also shown. [sent-765, score-0.633]

64 We now proceed to see how to use features labeled as relevant by our naive users in active learning. [sent-812, score-0.731]

65 We now describe our approach of applying term and document level feedback simultaneously in active learning. [sent-817, score-0.714]

66 Using the unlabeled data for term level feedback is very common in information retrieval and is called pseudo-relevance feedback (Salton, 1968). [sent-828, score-0.756]

67 If a user has labeled a feature in a previous iteration, we don’t show the user that feature again (the top f are picked from the unlabeled features). [sent-832, score-0.882]

68 Like before we report efﬁciency (E42 ), the F1 score with 7 labeled documents (F17 ), and the F1 score with 22 labeled documents (F122 ) for each of uncertainty sampling (Unc), oracle feature selection with uncertainty sampling (Ora) and the Human in the Loop (HIL) algorithm. [sent-875, score-1.565]

69 As a baseline we also report results for the case when the top 20 features as obtained by the information gain oracle are input to the simulated HIL experiments (this represents what a user with 100% precision and recall would obtain by our method). [sent-876, score-0.704]

70 The difference between no feature feedback (Unc) and human-labeled features (HIL) is greatest with few documents labeled, but persists up to 42 documents labeled. [sent-953, score-1.269]

71 When to stop asking for labels on both features and documents and switch entirely to documents remains an area for future work. [sent-954, score-0.86]

72 Consider that we ask for both document and feature feedback up to j iterations and after that we only ask for document feedback. [sent-956, score-0.759]

73 For the auto vs motorcycles problem, the user has been asked to label 75% of the 1677 R AGHAVAN , M ADANI AND J ONES oracle features (averaged over multiple iterations and multiple users) at some point or the other. [sent-963, score-0.807]

74 Most relevant features are asked within 10 iterations which makes us believe that we can often stop feature level feedback in around 10 iterations. [sent-967, score-0.735]

75 The legend indicates the number of iterations ( j) for which there was both feature and document feedback, after which only document feedback was asked for. [sent-978, score-0.759]

76 The line at the bottom, labeled j = 0 corresponds to regular uncertainty sampling or the case when feature feedback was asked for 0 iterations. [sent-979, score-0.73]

77 Related Work Our work is related to a number of areas including query learning, active learning, use of (prior) knowledge and feature selection in machine learning, term-relevance feedback in information retrieval, and human-computer interaction. [sent-986, score-0.836]

78 Many of the effective forms are composed of lists of terms and the user is asked to mark terms as relevant or not, and some have found that term level feedback is more effective than document level feedback (Diaz and Allan, 2005). [sent-990, score-1.097]

79 Feature feedback may be viewed as the teacher providing evidence or an explanation for the learner on the reasoning behind the labeling. [sent-1001, score-0.646]

80 Jones (2005) also used single feature-set labeling in the context of active learning: the user was queried on a feature rather than the whole instance. [sent-1025, score-0.77]

81 The labeled feature was taken as a proxy for the label of any instance containing that feature, so a single feature labeling potentially labeled many documents (similar to the soft labeling technique discussed next). [sent-1026, score-1.186]

82 The instances in this work consisted of only two features (a 1679 R AGHAVAN , M ADANI AND J ONES noun-phrase and a context), so labeling one feature is equivalent to labeling half an instance. [sent-1028, score-0.674]

83 Our work differs in that our instances (documents) contain many features (words) and we combine both feature labeling and document labeling. [sent-1029, score-0.645]

84 Our work also differs in that we use the labeled features for feature selection and feature re-weighting, rather than as proxies for document labels. [sent-1030, score-0.713]

85 However, in our scheme the user is labeling documents and features in an interactive and interleaved fashion. [sent-1035, score-0.842]

86 We expect that our proposed interactive mode has an advantage over requesting prior knowledge from the outset, as it may be easier for the user to identify or recall relevant features while labeling documents in the collection and being presented with candidate features. [sent-1036, score-0.926]

87 Furthermore, we simplify the user’s task in that our technique does not require the user to specify whether the feature is positively or negatively correlated with the category, just whether the user thinks the feature is relevant or predictive. [sent-1038, score-0.68]

88 Their proposed method is attractive in that it treats features as single term documents that can be labeled by humans, but they also study labeling features before documents (and only in an “oracle” setting, without using actual human annotators). [sent-1043, score-1.319]

89 We conducted a user study to see how well naive users performed as compared to a feature oracle in the domain of text categorization. [sent-1051, score-0.772]

90 We found that even naive users can provide effective feedback on the most relevant features (about 60% accuracy of the oracle in our experiments). [sent-1057, score-0.943]

91 We measured the manual costs of relevance feedback on features versus labeling documents: we found that feature feedback takes about one ﬁfth of the time taken by document labeling on average. [sent-1059, score-1.361]

92 The idea of using as few documents as possible for training classiﬁers has been studied in semi-supervised learning and active learning. [sent-1066, score-0.617]

93 In this paper we extended the traditional active learning setting which concerns the issue of minimal feedback and proposed an approach where the user provides feedback on features as well as documents. [sent-1067, score-1.277]

94 For example, whether to ask for feedback on a document or on a feature, or even whether to stop asking questions all together (ask nothing), appropriate for a scenario where no additional feedback is likely to improve performance signiﬁcantly. [sent-1072, score-0.868]

95 Furthermore, there are alternate kinds of feedback that are potentially useful – feedback on clusters of features for example. [sent-1074, score-0.784]

96 The second question involves human computer interaction issues and seeks to explore how to translate what the learner needs to know, into a question, or a user interface, that the human teacher can easily understand. [sent-1075, score-0.694]

97 In our case, the learner asked the teacher labels on word features and documents, both of which required little effort on the part of the teacher to understand what was being asked of him. [sent-1076, score-0.96]

98 An attractive alternative or complementary method of soliciting feature feedback is asking users to highlight some relevant or predictive terms as they read a document. [sent-1078, score-0.672]

99 Related to the above is better understanding and quantifying the potential of active learning enhanced with feature feedback 1681 R AGHAVAN , M ADANI AND J ONES as a function of various aspects of the learning problem, such as measures of the difﬁculty of the category that one seeks to learn. [sent-1082, score-0.801]

100 If it is not relevant or you don’t know whether the feature is relevant mark DONT KNOW correspondingly A feature is relevant if it helps discriminate between documents in Class 1 versus documents in Class 2. [sent-1167, score-1.13]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('documents', 0.334), ('feedback', 0.315), ('oracle', 0.287), ('active', 0.283), ('teacher', 0.28), ('user', 0.181), ('hurricane', 0.173), ('features', 0.154), ('labeling', 0.145), ('users', 0.133), ('adani', 0.132), ('aghavan', 0.132), ('mitch', 0.132), ('feature', 0.132), ('nstances', 0.124), ('document', 0.116), ('hil', 0.107), ('labeled', 0.107), ('eatures', 0.105), ('eedback', 0.105), ('earnings', 0.099), ('instances', 0.098), ('tdt', 0.091), ('human', 0.091), ('asked', 0.08), ('reuters', 0.08), ('picked', 0.077), ('selection', 0.072), ('unlabeled', 0.072), ('category', 0.071), ('humans', 0.07), ('news', 0.07), ('init', 0.07), ('judgments', 0.07), ('rand', 0.069), ('baseball', 0.066), ('newsgroups', 0.066), ('corpus', 0.059), ('ciency', 0.059), ('retrieval', 0.054), ('relevant', 0.054), ('classi', 0.054), ('act', 0.053), ('learner', 0.051), ('precision', 0.049), ('sampling', 0.049), ('er', 0.048), ('uncertainty', 0.047), ('questions', 0.044), ('label', 0.043), ('topics', 0.043), ('pool', 0.043), ('brank', 0.041), ('croft', 0.041), ('sebastiani', 0.041), ('unc', 0.041), ('usenet', 0.041), ('instance', 0.041), ('ask', 0.04), ('text', 0.039), ('relevance', 0.039), ('asking', 0.038), ('ff', 0.038), ('judge', 0.038), ('kappa', 0.038), ('improvements', 0.037), ('earning', 0.037), ('xt', 0.036), ('mark', 0.036), ('queries', 0.035), ('word', 0.035), ('instructions', 0.035), ('madani', 0.035), ('accelerate', 0.034), ('query', 0.034), ('auto', 0.033), ('das', 0.033), ('emulate', 0.033), ('godbole', 0.033), ('hockey', 0.033), ('landis', 0.033), ('ora', 0.033), ('raghavan', 0.033), ('submit', 0.033), ('trec', 0.033), ('topic', 0.033), ('gain', 0.033), ('system', 0.032), ('categories', 0.031), ('uncertain', 0.031), ('prior', 0.03), ('vs', 0.029), ('traditional', 0.029), ('queried', 0.029), ('train', 0.028), ('agreement', 0.028), ('fs', 0.028), ('interactive', 0.028), ('annotating', 0.028), ('button', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000013 12 jmlr-2006-Active Learning with Feedback on Features and Instances

Author: Hema Raghavan, Omid Madani, Rosie Jones

2 0.096897945 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)

Author: Kristin P. Bennett, Emilio Parrado-Hernández

Abstract: The ﬁelds of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, second-order cone, semideﬁnite, and semi-inﬁnite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for speciﬁc classes of problems. Reducing machine learning problems to well-explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques. In turn, machine learning presents new challenges to mathematical programming. The special issue include papers from two primary themes: novel machine learning models and novel optimization approaches for existing models. Many papers blend both themes, making small changes in the underlying core mathematical program that enable the develop of effective new algorithms. Keywords: machine learning, mathematical programming, convex optimization

3 0.078279749 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

Abstract: We present a kernel-based algorithm for hierarchical text classiﬁcation where the documents are allowed to belong to more than one category at a time. The classiﬁcation model is a variant of the Maximum Margin Markov Network framework, where the classiﬁcation hierarchy is represented as a Markov tree equipped with an exponential family deﬁned on the edges. We present an efﬁcient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classiﬁcation hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efﬁcient as training independent SVM-light classiﬁers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classiﬁcation learning algorithms. Keywords: kernel methods, hierarchical classiﬁcation, text categorization, convex optimization, structured outputs

4 0.078009777 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

Author: Mikhail Belkin, Partha Niyogi, Vikas Sindhwani

Abstract: We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including support vector machines and regularized least squares can be obtained as special cases. We use properties of reproducing kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework. Keywords: semi-supervised learning, graph transduction, regularization, kernel methods, manifold learning, spectral graph theory, unlabeled data, support vector machines

5 0.074884325 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

Author: Masashi Sugiyama

Abstract: The goal of active learning is to determine the locations of training input points so that the generalization error is minimized. We discuss the problem of active learning in linear regression scenarios. Traditional active learning methods using least-squares learning often assume that the model used for learning is correctly speciﬁed. In many practical situations, however, this assumption may not be fulﬁlled. Recently, active learning methods using “importance”-weighted least-squares learning have been proposed, which are shown to be robust against misspeciﬁcation of models. In this paper, we propose a new active learning method also using the weighted least-squares learning, which we call ALICE (Active Learning using the Importance-weighted least-squares learning based on Conditional Expectation of the generalization error). An important difference from existing methods is that we predict the conditional expectation of the generalization error given training input points, while existing methods predict the full expectation of the generalization error. Due to this difference, the training input design can be ﬁne-tuned depending on the realization of training input points. Theoretically, we prove that the proposed active learning criterion is a more accurate predictor of the single-trial generalization error than the existing criterion. Numerical studies with toy and benchmark data sets show that the proposed method compares favorably to existing methods. Keywords: Active Learning, Conditional Expectation of Generalization Error, Misspeciﬁcation of Models, Importance-Weighted Least-Squares Learning, Covariate Shift.

6 0.07476826 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications (Special Topic on Machine Learning and Optimization)

7 0.067318663 88 jmlr-2006-Streamwise Feature Selection

8 0.062979624 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

9 0.05587557 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs (Special Topic on Machine Learning and Optimization)

10 0.051739488 44 jmlr-2006-Large Scale Transductive SVMs

11 0.050548404 70 jmlr-2006-Online Passive-Aggressive Algorithms

12 0.047223996 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space

13 0.046062283 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

14 0.045097224 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

15 0.043942593 96 jmlr-2006-Worst-Case Analysis of Selective Sampling for Linear Classification

16 0.041578505 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

17 0.038348384 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

18 0.035158642 48 jmlr-2006-Learning Minimum Volume Sets

19 0.034839805 53 jmlr-2006-Learning a Hidden Hypergraph

20 0.033677194 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, -0.039), (2, 0.112), (3, 0.11), (4, 0.009), (5, 0.004), (6, -0.022), (7, -0.144), (8, -0.04), (9, 0.021), (10, -0.076), (11, 0.18), (12, -0.04), (13, 0.031), (14, -0.082), (15, 0.056), (16, -0.078), (17, 0.067), (18, 0.196), (19, -0.048), (20, -0.085), (21, -0.209), (22, 0.11), (23, 0.044), (24, -0.218), (25, -0.149), (26, 0.265), (27, 0.079), (28, -0.012), (29, -0.082), (30, 0.044), (31, 0.047), (32, -0.073), (33, 0.046), (34, -0.096), (35, -0.037), (36, 0.065), (37, 0.109), (38, 0.056), (39, -0.055), (40, 0.081), (41, -0.091), (42, 0.05), (43, 0.049), (44, 0.068), (45, -0.072), (46, 0.028), (47, 0.055), (48, 0.005), (49, -0.152)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97437 12 jmlr-2006-Active Learning with Feedback on Features and Instances

Author: Hema Raghavan, Omid Madani, Rosie Jones

2 0.62728328 88 jmlr-2006-Streamwise Feature Selection

Author: Jing Zhou, Dean P. Foster, Robert A. Stine, Lyle H. Ungar

Abstract: In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overﬁtting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe information-investing and α-investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overﬁtting. They differ from standard penalty methods such as AIC, BIC and RIC, which always drastically over- or under-ﬁt in the limit of inﬁnite numbers of non-predictive features. Empirical results show that streamwise regression is competitive with (on small data sets) and superior to (on large data sets) much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with millions of potential features. Keywords: classiﬁcation, stepwise regression, multiple regression, feature selection, false discovery rate

3 0.55874401 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

Author: Masashi Sugiyama

4 0.37751505 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs (Special Topic on Machine Learning and Optimization)

Author: Katya Scheinberg

Abstract: We propose an active set algorithm to solve the convex quadratic programming (QP) problem which is the core of the support vector machine (SVM) training. The underlying method is not new and is based on the extensive practice of the Simplex method and its variants for convex quadratic problems. However, its application to large-scale SVM problems is new. Until recently the traditional active set methods were considered impractical for large SVM problems. By adapting the methods to the special structure of SVM problems we were able to produce an efﬁcient implementation. We conduct an extensive study of the behavior of our method and its variations on SVM problems. We present computational results comparing our method with Joachims’ SVM light (see Joachims, 1999). The results show that our method has overall better performance on many SVM problems. It seems to have a particularly strong advantage on more difﬁcult problems. In addition this algorithm has better theoretical properties and it naturally extends to the incremental mode. Since the proposed method solves the standard SVM formulation, as does SVMlight , the generalization properties of these two approaches are identical and we do not discuss them in the paper. Keywords: active set methods, support vector machines, quadratic programming

5 0.36168468 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor

6 0.33982581 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)

7 0.33159783 38 jmlr-2006-Incremental Support Vector Learning: Analysis, Implementation and Applications (Special Topic on Machine Learning and Optimization)

8 0.27789068 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

9 0.27575299 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

10 0.25272086 44 jmlr-2006-Large Scale Transductive SVMs

11 0.24579354 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

12 0.24507259 48 jmlr-2006-Learning Minimum Volume Sets

13 0.20981196 86 jmlr-2006-Step Size Adaptation in Reproducing Kernel Hilbert Space

14 0.1947366 85 jmlr-2006-Statistical Comparisons of Classifiers over Multiple Data Sets

15 0.18904515 81 jmlr-2006-Some Discriminant-Based PAC Algorithms

16 0.18495718 42 jmlr-2006-Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting (Special Topic on Inductive Programming)

17 0.17645502 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

18 0.17645235 59 jmlr-2006-Machine Learning for Computer Security (Special Topic on Machine Learning for Computer Security)

19 0.17609571 22 jmlr-2006-Considering Cost Asymmetry in Learning Classifiers

20 0.17120036 70 jmlr-2006-Online Passive-Aggressive Algorithms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(36, 0.061), (50, 0.021), (63, 0.028), (68, 0.015), (70, 0.011), (76, 0.015), (78, 0.014), (81, 0.018), (84, 0.012), (90, 0.019), (91, 0.022), (96, 0.661)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98580158 12 jmlr-2006-Active Learning with Feedback on Features and Instances

Author: Hema Raghavan, Omid Madani, Rosie Jones

2 0.98567253 91 jmlr-2006-The Interplay of Optimization and Machine Learning Research (Special Topic on Machine Learning and Optimization)

Author: Kristin P. Bennett, Emilio Parrado-Hernández

3 0.96108186 13 jmlr-2006-Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

Author: Fu Chang, Chin-Chin Lin, Chi-Jen Lu

Abstract: In this paper, we propose a number of adaptive prototype learning (APL) algorithms. They employ the same algorithmic scheme to determine the number and location of prototypes, but differ in the use of samples or the weighted averages of samples as prototypes, and also in the assumption of distance measures. To understand these algorithms from a theoretical viewpoint, we address their convergence properties, as well as their consistency under certain conditions. We also present a soft version of APL, in which a non-zero training error is allowed in order to enhance the generalization power of the resultant classifier. Applying the proposed algorithms to twelve UCI benchmark data sets, we demonstrate that they outperform many instance-based learning algorithms, the k-nearest neighbor rule, and support vector machines in terms of average test accuracy. Keywords: adaptive prototype learning, cluster-based prototypes, consistency, instance-based prototype, pattern classification 1

4 0.70359892 78 jmlr-2006-Rearrangement Clustering: Pitfalls, Remedies, and Applications

Author: Sharlee Climer, Weixiong Zhang

Abstract: Given a matrix of values in which the rows correspond to objects and the columns correspond to features of the objects, rearrangement clustering is the problem of rearranging the rows of the matrix such that the sum of the similarities between adjacent rows is maximized. Referred to by various names and reinvented several times, this clustering technique has been extensively used in many ﬁelds over the last three decades. In this paper, we point out two critical pitfalls that have been previously overlooked. The ﬁrst pitfall is deleterious when rearrangement clustering is applied to objects that form natural clusters. The second concerns a similarity metric that is commonly used. We present an algorithm that overcomes these pitfalls. This algorithm is based on a variation of the Traveling Salesman Problem. It offers an extra beneﬁt as it automatically determines cluster boundaries. Using this algorithm, we optimally solve four benchmark problems and a 2,467-gene expression data clustering problem. As expected, our new algorithm identiﬁes better clusters than those found by previous approaches in all ﬁve cases. Overall, our results demonstrate the beneﬁts of rectifying the pitfalls and exemplify the usefulness of this clustering technique. Our code is available at our websites. Keywords: clustering, visualization of patterns in data, bond energy algorithm, traveling salesman problem, asymmetric clustering

5 0.70349592 44 jmlr-2006-Large Scale Transductive SVMs

Author: Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou

Abstract: We show how the concave-convex procedure can be applied to transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the ﬁrst time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP

6 0.70050263 88 jmlr-2006-Streamwise Feature Selection

7 0.69871593 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)

8 0.69863307 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)

9 0.69261199 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation

10 0.68858194 14 jmlr-2006-An Efficient Implementation of an Active Set Method for SVMs (Special Topic on Machine Learning and Optimization)

11 0.68613797 72 jmlr-2006-Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems (Special Topic on Machine Learning and Optimization)

12 0.68381459 51 jmlr-2006-Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming (Special Topic on Machine Learning and Optimization)

13 0.6621812 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

14 0.62678623 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

15 0.6229701 61 jmlr-2006-Maximum-Gain Working Set Selection for SVMs (Special Topic on Machine Learning and Optimization)

16 0.6154381 30 jmlr-2006-Evolutionary Function Approximation for Reinforcement Learning

17 0.6140272 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

18 0.61105329 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms

19 0.60418111 33 jmlr-2006-Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (Special Topic on Machine Learning and Optimization)

20 0.59909976 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples