jmlr jmlr2007 jmlr2007-29 knowledge-graph by maker-knowledge-mining

29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

Source: pdf

Author: J. Zico Kolter, Marcus A. Maloof

Abstract: We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation— consisting of ﬁve experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like—we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, ﬁxed-size ensemble of experts. Keywords: concept learning, online learning, ensemble methods, concept drift

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Computer Science Georgetown University Washington, DC 20057-1232, USA Editor: Dale Schuurmans Abstract We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. [sent-7, score-0.663]

2 Keywords: concept learning, online learning, ensemble methods, concept drift 1. [sent-10, score-0.582]

3 Introduction In this paper, we describe an ensemble method designed expressly for tracking concept drift (Kolter and Maloof, 2003). [sent-11, score-0.491]

4 Ours is an extension of the weighted majority algorithm (Littlestone and Warmuth, 1994), which also tracks drifting concepts (Blum, 1997), but our algorithm, dynamic weighted majority (DWM), adds and removes base learners or experts in response to global and local performance. [sent-12, score-0.62]

5 Informally, concept drift occurs when a set of examples has legitimate class labels at one time and has different legitimate labels at another time. [sent-14, score-0.318]

6 Based on “Dynamic Weighted Majority: A new ensemble method for tracking concept drift”, by Jeremy Z. [sent-17, score-0.324]

7 Indeed, tracking concept drift is important for any application involving models of human behavior. [sent-27, score-0.351]

8 Overall, results suggest that a weighted ensemble of incremental learners tracks drifting concepts better than learners that simply modify concept descriptions, that store and learn from examples encountered previously, and that use an unweighted ensemble of experts. [sent-36, score-0.801]

9 In the next section, we survey work on the problem of concept drift, on ensemble methods, and at the intersection: work on ensemble methods for concept drift. [sent-42, score-0.528]

10 In Section 3, we describe dynamic weighted majority, an ensemble method for tracking concept drift. [sent-43, score-0.404]

11 Background and Related Work Dynamic weighted majority is an ensemble method designed expressly for concept drift. [sent-47, score-0.342]

12 For many years, research on ensemble methods and research on methods for concept drift have intermingled little. [sent-48, score-0.431]

13 However, within the last few years, researchers have proposed several ensemble methods for tracking concept drift. [sent-49, score-0.324]

14 In the next three sections, we survey relevant work on the problem of tracking concept drift, on ensemble methods, and on ensemble methods for tracking concept drift. [sent-50, score-0.648]

15 1 Concept Drift Concept drift (Schlimmer and Granger, 1986; Maloof, 2005) is an online learning task in which concepts change or drift over time. [sent-52, score-0.454]

16 In concrete terms, concept drift occurs when the class labels of a set of examples change over time. [sent-55, score-0.318]

17 It uses a frame representation and copes with such drift by either adjusting current concept descriptions or creating a new version of these descriptions. [sent-82, score-0.34]

18 ) Although not expressly designed for concept drift, the MetaL(B) and MetaL(IB) systems (Widmer, 1997) use meta-learning with naive Bayes and instance-based learning, respectively, to cope with reoccurring contexts. [sent-97, score-0.264]

19 The base learner then uses the predictive features of the contextually relevant examples in the current window to form new concept descriptions. [sent-103, score-0.343]

20 By adding a contextual variable to the STAGGER concepts (Schlimmer and Granger, 1986), experimental results using predictive accuracy suggest that the meta-learners were able use contextual features to acquire target concepts better than and more quickly than did the base learners alone. [sent-104, score-0.456]

21 When compared to FLORA 4 (Widmer and Kubat, 1996), which retrieves previously learned concept descriptions when contexts reoccur, MetaL(B) often performed better in terms of slope and asymptote, even though it relearned new concept descriptions rather than retrieving and modifying old ones. [sent-105, score-0.368]

22 Lane and Brodley (1998) investigated methods for concept drift in one-class learning problems (i. [sent-106, score-0.291]

23 For static concepts, the time stamp is an irrelevant attribute, but if drift occurs, then the time stamp will appear in induced trees. [sent-118, score-0.241]

24 Moreover, one can use its position in the tree as an indicator of how much drift may have occurred—the more relevant the time stamp, the higher it will appear in the tree, and the more drift that has occurred. [sent-119, score-0.366]

25 In subsequent work, Black and Hickey (2002) applied CD 3 to twenty-seven months of customercall data and were able to detect concept drift based on the number and height of time stamps in the 2758 DYNAMIC W EIGHTED M AJORITY induced decision trees. [sent-128, score-0.312]

26 The drops in predictive accuracy could have been due to sampling, and in general, it may be difﬁcult to determine if a learner is coping with concept drift or is simply acquiring more information about a static concept. [sent-137, score-0.516]

27 This type of concept drift has been called virtual concept drift, as opposed to real concept drift (Klinkenberg and Joachims, 2000). [sent-138, score-0.706]

28 (1999) deﬁne population drift as being “related to” concept drift, noting that the term concept drift has no single meaning. [sent-143, score-0.582]

29 The term concept drift has been applied to different phenomena, such as drops and recoveries in performance during online learning (Syed et al. [sent-146, score-0.318]

30 However, in concrete terms, the term has historically meant that the class labels of instances change over time, established in the ﬁrst paper on concept drift (Schlimmer and Granger, 1986). [sent-148, score-0.291]

31 However, the prior distribution may not change, and changes in the conditional distribution will not necessarily mean that concept drift has occurred. [sent-150, score-0.291]

32 Popular ensemble methods, such as bagging and boosting, are off-line algorithms, but researchers have developed online ensemble methods. [sent-214, score-0.328]

33 3 Ensemble Methods for Concept Drift Ensemble methods for concept drift share many similarities with the online and off-line methods discussed in the previous section. [sent-226, score-0.318]

34 However, methods for concept drift must take into account the temporal nature of the data stream, for a set of examples may have certain class labels at time t and others at time t . [sent-227, score-0.318]

35 Clearly, ensemble methods for concept drift must process a stream of data. [sent-228, score-0.431]

36 To achieve these effects, one ensemble method for concept drift builds two ensembles and selects the best performing one for subsequent processing (Scholz and Klinkenberg, 2006). [sent-233, score-0.431]

37 Blum’s (1997) implementation of the weighted majority algorithm (Littlestone and Warmuth, 1994) uses as experts pairs of features coupled with a history of the most recent class labels from the training set appearing with those features. [sent-236, score-0.237]

38 The global algorithm predicts based on a weighted-majority vote of the expert predictions and decreases the weight of any expert that predicts incorrectly. [sent-238, score-0.253]

39 Blum (1997) also investigated triples of features as experts and a variant of winnow (Littlestone, 1988) that lets experts abstain if their features are not present in an instance. [sent-240, score-0.318]

40 Herbster and Warmuth (1998) consider a setting in which an online learner is trained over several concepts and has access to n experts (which are ﬁxed prediction strategies). [sent-253, score-0.346]

41 They also describe two pruning methods for limiting the number of experts maintained; one is useful in practice, the other gives formal guarantees on the number of experts that AddExp will create. [sent-268, score-0.318]

42 DWM: An Ensemble Method for Concept Drift Dynamic weighted majority maintains a weighted pool of experts or base learners. [sent-270, score-0.331]

43 DWM then trains the experts in the ensemble on the new example (line 23). [sent-298, score-0.317]

44 As mentioned previously, if the global prediction is incorrect (line 16), DWM adds a new expert to the ensemble with a weight of one (lines 17–19). [sent-345, score-0.237]

45 Finally, after using the new example to train each learner in the ensemble (lines 22 and 23), DWM outputs the global prediction, which is the weighted vote of the expert predictions (line 24). [sent-346, score-0.338]

46 The ﬁrst, most extensive evaluation involved the STAGGER concepts (Schlimmer and Granger, 1986), a standard benchmark for evaluating how learners cope with drifting concepts. [sent-382, score-0.226]

47 In an effort to determine how our method scales to larger problems involving concept drift, our second evaluation consisted of testing DWM - NB using the SEA concepts (Street and Kim, 2001), a problem recently proposed in the data mining community. [sent-389, score-0.234]

48 For the sake of comparison, in addition to these algorithms, we also evaluated naive Bayes, ITI, naive Bayes with perfect forgetting, and ITI with perfect forgetting. [sent-438, score-0.28]

49 The “standard” or “traditional” implementations of naive Bayes and ITI provided a worst-case evaluation, since these systems have not been designed to cope with concept drift and learn from all examples in the stream regardless of changes to the target concept. [sent-439, score-0.489]

50 The implementations with perfect forgetting, which is the same as training the methods on each target concept individually, provided a best-case evaluation, since the systems were never burdened with examples or concept descriptions from previous target concepts. [sent-440, score-0.406]

51 As expected, naive Bayes with perfect forgetting performed the best on all three concepts, while naive Bayes without forgetting performed the worst. [sent-442, score-0.34]

52 7 Expert Count 6 5 4 3 DWM−NB DWM−ITI 2 1 0 20 40 80 60 Time Step (t) 100 120 Figure 4: Number of experts maintained with 95% conﬁdence intervals for DWM - NB and DWM - ITI on the STAGGER concepts (Kolter and Maloof, 2003). [sent-449, score-0.28]

53 DWM - ITI converged more quickly than did DWM NB to the second and third target concepts, but if we compare the plots for naive Bayes and ITI with perfect forgetting, we see that ITI converged more quickly to these target concepts than did naive Bayes. [sent-454, score-0.415]

54 On average, DWM - ITI maintained fewer experts than did DWM - NB, and we attribute this to the fact that ITI performed better on the individual concepts than did naive Bayes. [sent-457, score-0.427]

55 AQ 11, although not designed to cope with concept drift, outperformed DWM - ITI in terms of asymptote on the ﬁrst concept and in terms of slope on the third, but on the second concept, performed signiﬁcantly worse than did DWM - ITI. [sent-470, score-0.342]

56 With respect to complexity of the experts themselves, the STAGGER concepts consist of three attributes, each taking one of three possible values. [sent-477, score-0.252]

57 Therefore, this implementation of weighted majority maintained 27 experts throughout the presentation of examples, as compared to the maximum of six that DWM - NB maintained. [sent-478, score-0.265]

58 Researchers have built several systems for coping with concept drift and have evaluated many of them on the STAGGER concepts. [sent-482, score-0.316]

59 In this experiment, when the concept changed from the ﬁrst to the second, the examples of the ﬁrst concept remaining in this window prevented AQ - PM from adjusting quickly to the second. [sent-487, score-0.321]

60 The only difference between these two learners is that AQ 11- PM maintains a ﬁxed window of thirty examples that have appeared on the boundaries of 2770 DYNAMIC W EIGHTED M AJORITY concept descriptions. [sent-501, score-0.295]

61 ) Generally, acquiring the second concept after learning the ﬁrst is the hardest task, as the second concept is almost a reversal of the ﬁrst; the two concepts share only one positive example. [sent-511, score-0.37]

62 Acquiring the third concept after acquiring the second is easier because the two concepts share a greater number of positive examples. [sent-512, score-0.246]

63 A learning method, such as STAGGER, that only reﬁnes concept descriptions will have more difﬁculty responding to concept drift, as compared to an ensemble method, such as DWM, that both reﬁnes existing concept descriptions and creates new ones. [sent-514, score-0.628]

64 Recall that Blum’s weighted majority uses as experts pairs of features with a brief history of past predictions. [sent-517, score-0.237]

65 ) Assume a learner incrementally modiﬁes its concept descriptions as new examples arrive. [sent-528, score-0.247]

66 Unfortunately, target concepts are not always disjoint, it is difﬁcult to determine precisely when concepts change, and it is challenging to identify which concept descriptions (or parts of concept descriptions) apply to new target concepts. [sent-531, score-0.545]

67 On this problem, we evaluated DWM - NB, naive Bayes, and naive Bayes with perfect forgetting. [sent-558, score-0.26]

68 In the left graph of Figure 7, we see the predictive accuracies for DWM - NB, naive Bayes, and naive Bayes with perfect forgetting on the SEA concepts with 10% class noise. [sent-570, score-0.48]

69 As with the STAGGER concepts, naive Bayes performed the worst, since it had no direct method of removing outdated concept descriptions. [sent-571, score-0.244]

70 The number of experts was capped at ﬁve and at ten and compared to DWM - NB with no limit on the number of experts created. [sent-576, score-0.318]

71 ) A large number of experts obviously impacts memory utilization, and, depending on the complexity of the base learners, could also affect learning and performance time, since DWM trains and queries each expert in the ensemble when a new example arrives. [sent-581, score-0.428]

72 That is, after reaching the point where there are k experts in the ensemble, when DWM adds a new expert, it removes the expert with the lowest weight. [sent-584, score-0.256]

73 In this version of DWM, we set the threshold for removing experts to zero, which guaranteed that experts were removed only if they were the weakest member of the ensemble of k experts. [sent-585, score-0.458]

74 Indeed, DWM - NB with ﬁve experts performed almost as well as the original algorithm that placed no limit on the number of experts created. [sent-589, score-0.318]

75 If the new classiﬁer improves the global performance of the ensemble, then it 2773 KOLTER AND M ALOOF is added, provided the ensemble does not contain a maximum number of classiﬁers; otherwise, SEA replaces a poorly performing classiﬁer in the ensemble with the new classiﬁer. [sent-595, score-0.28]

76 However, if every classiﬁer in the ensemble has been trained on a given target concept, and the concept changes to one that is disjoint, then SEA must replace at least half of the classiﬁers in the ensemble before accuracy on the new target concept will surpass that on the old. [sent-596, score-0.621]

77 For instance, if the ensemble consists of 20 classiﬁers, and each learns from a ﬁxed set of 500 examples, then it would take at least 5, 000 additional training examples before the ensemble contained a majority number of classiﬁers trained on the new concept. [sent-597, score-0.35]

78 Assume p = 500, the ensemble consists of 20 fully trained classiﬁers, all with a weight of one, and the new concept is disjoint from the previous one. [sent-599, score-0.264]

79 The original 20 experts will again misclassify the example, and the new expert will predict correctly. [sent-604, score-0.236]

80 Generally, ensemble methods with weighting mechanisms, like those present in DWM, will converge more quickly to target concepts (i. [sent-620, score-0.264]

81 Regarding the number of experts that DWM maintained, we used a simple heuristic that added a new expert whenever the global prediction was incorrect, which intuitively, should be problematic for noisy domains. [sent-623, score-0.256]

82 However, on the SEA concepts, while DWM - NB maintained as many as 40 experts at, say, time step 37, 500, it maintained only 22 experts on average over the 10 runs, which is similar to the 20–25 that SEA reportedly stored (Street and Kim, 2001). [sent-624, score-0.374]

83 If the number of experts were to reach impractical levels, then DWM could simply stop creating experts after obtaining acceptable accuracy; training would continue. [sent-625, score-0.318]

84 Since Blum’s implementation forms concept descriptions consisting of weighted pairs of attribute values, we reasoned that conditionally-dependent attributes would have less affect on its performance than they would on naive Bayes’. [sent-695, score-0.379]

85 DWM As a data set derived from real-world phenomenon, we cannot know deﬁnitively if or when concept drift occurred. [sent-726, score-0.291]

86 Concluding Remarks Tracking concept drift is important for many applications. [sent-755, score-0.291]

87 In this paper, we presented an ensemble method based on the weighted majority algorithm (Littlestone and Warmuth, 1994). [sent-764, score-0.218]

88 Our method, dynamic weighted majority, creates and removes base algorithms in response to changes in performance, which makes it well suited for problems involving concept drift. [sent-765, score-0.276]

89 On the problems we considered, a weighted ensemble of learners with mechanisms to add and remove experts in response to changes in performance provided a better response to concept drift than did other learners, especially those that relied on only incremental learning (i. [sent-768, score-0.76]

90 , FLORA 2 and the AQ - PM systems), or on an ensemble of unweighted learners (i. [sent-772, score-0.235]

91 We anticipate that these investigations will lead to general, robust, and scalable ensemble methods for tracking concept drift. [sent-790, score-0.324]

92 Classiﬁcation of customer call data in the presence of concept drift and noise. [sent-1091, score-0.291]

93 A case-based technique for tracking concept drift in spam ﬁltering. [sent-1139, score-0.351]

94 Detecting concept drift in ﬁnancial time series prediction using symbolic machine learning. [sent-1217, score-0.311]

95 Reﬁned time stamps for concept drift detection during mining for classiﬁcation rules. [sent-1250, score-0.329]

96 Concept versioning: A methodology for tracking evolutionary concept drift in dynamic concept systems. [sent-1288, score-0.52]

97 Dynamic weighted majority: A new ensemble method for tracking concept drift. [sent-1307, score-0.359]

98 Using additive expert ensembles to cope with concept drift. [sent-1315, score-0.221]

99 Approaches to online learning and concept drift for user identiﬁcation in computer security. [sent-1330, score-0.318]

100 Learning in the presence of concept drift and hidden contexts. [sent-1539, score-0.291]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dwm', 0.727), ('aq', 0.227), ('iti', 0.188), ('nb', 0.187), ('kolter', 0.175), ('drift', 0.167), ('stagger', 0.167), ('maloof', 0.166), ('experts', 0.159), ('ensemble', 0.14), ('concept', 0.124), ('naive', 0.12), ('sea', 0.094), ('concepts', 0.093), ('michalski', 0.083), ('aloof', 0.077), ('expert', 0.077), ('learners', 0.073), ('ajority', 0.073), ('pm', 0.066), ('predictive', 0.065), ('schlimmer', 0.064), ('tracking', 0.06), ('blum', 0.058), ('bayes', 0.058), ('descriptions', 0.049), ('eighted', 0.047), ('learner', 0.047), ('window', 0.046), ('dynamic', 0.045), ('electricity', 0.043), ('flora', 0.043), ('granger', 0.043), ('majority', 0.043), ('drifting', 0.04), ('forgetting', 0.04), ('harries', 0.04), ('calendar', 0.036), ('widmer', 0.036), ('weighted', 0.035), ('wah', 0.034), ('incremental', 0.034), ('base', 0.034), ('klinkenberg', 0.032), ('tree', 0.032), ('accuracy', 0.031), ('target', 0.031), ('predicts', 0.03), ('acquiring', 0.029), ('cap', 0.029), ('hulten', 0.029), ('static', 0.028), ('mechanisms', 0.028), ('maintained', 0.028), ('examples', 0.027), ('outperformed', 0.027), ('attribute', 0.027), ('online', 0.027), ('gama', 0.026), ('kubat', 0.026), ('maintains', 0.025), ('vfdt', 0.025), ('asymptote', 0.025), ('coping', 0.025), ('scheduling', 0.024), ('littlestone', 0.024), ('batch', 0.024), ('attributes', 0.024), ('warmuth', 0.023), ('stamp', 0.023), ('scholz', 0.023), ('accuracies', 0.022), ('slope', 0.022), ('asuncion', 0.022), ('hickey', 0.022), ('unweighted', 0.022), ('addexp', 0.021), ('maclin', 0.021), ('stamps', 0.021), ('vote', 0.021), ('period', 0.021), ('bagging', 0.021), ('day', 0.021), ('removes', 0.02), ('perfect', 0.02), ('prediction', 0.02), ('cope', 0.02), ('leaf', 0.02), ('arrives', 0.019), ('pricing', 0.019), ('street', 0.019), ('batches', 0.018), ('rules', 0.018), ('contextual', 0.018), ('trains', 0.018), ('predictions', 0.018), ('creates', 0.018), ('georgetown', 0.017), ('mining', 0.017), ('dence', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

Author: J. Zico Kolter, Marcus A. Maloof

2 0.18050787 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners

Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales

Abstract: In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the ﬁeld of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts its internal parameters in order to face the current dynamics of the data stream. Results show that it is comparable to other batch algorithms when facing problems with no concept change, and it is better than evaluated methods in its ability to deal with concept drift when dealing with problems in which: concept change occurs at different speeds, noise may be present and, examples may arrive from different areas of the problem domain (virtual drift). Keywords: incremental algorithms, online learning, concept drift, decision trees, robust learners

3 0.064376481 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers (Special Topic on Model Selection)

Author: Marc Boullé

Abstract: The naive Bayes classiﬁer has proved to be very effective on many real data applications. Its performance usually beneﬁts from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overﬁtting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and ﬁnally results in a naive Bayes classiﬁer with “soft variable selection”. Extensive experiments show that the compressionbased averaged classiﬁer outperforms the Bayesian model averaging scheme. Keywords: naive Bayes, Bayesian, model selection, model averaging

4 0.063326374 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction

Author: Nicolás García-Pedrajas, César García-Osorio, Colin Fyfe

Abstract: In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classiﬁers. The proposed approach combines the philosophy of boosting, putting more effort on difﬁcult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difﬁculties to previous classiﬁers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classiﬁed instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classiﬁer. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods. Keywords: classiﬁer ensembles, boosting, neural networks, nonlinear projections

5 0.045024756 11 jmlr-2007-Anytime Learning of Decision Trees

Author: Saher Esmeir, Shaul Markovitch

Abstract: The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difﬁcult. Furthermore, they require a ﬁxed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields signiﬁcantly better decision trees when more time is available. Keywords: anytime algorithms, decision tree induction, lookahead, hard concepts, resource-bounded reasoning

6 0.041520823 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes

7 0.036782585 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning

8 0.033763677 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

9 0.030604016 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

10 0.027140401 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)

11 0.026223041 83 jmlr-2007-The On-Line Shortest Path Problem Under Partial Monitoring

12 0.025351135 23 jmlr-2007-Concave Learners for Rankboost

13 0.024166847 16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation

14 0.024069835 14 jmlr-2007-Behavioral Shaping for Geometric Concepts

15 0.022661276 34 jmlr-2007-From External to Internal Regret (Special Topic on the Conference on Learning Theory 2005)

16 0.021632161 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss

17 0.021268953 65 jmlr-2007-PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers

18 0.021175731 52 jmlr-2007-Margin Trees for High-dimensional Classification

19 0.020860069 43 jmlr-2007-Integrating Naïve Bayes and FOIL

20 0.020581909 74 jmlr-2007-Separating Models of Learning from Correlated and Uncorrelated Data (Special Topic on the Conference on Learning Theory 2005)

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.126), (1, 0.16), (2, -0.109), (3, 0.041), (4, 0.099), (5, 0.242), (6, 0.133), (7, 0.08), (8, -0.177), (9, 0.094), (10, -0.079), (11, 0.014), (12, -0.176), (13, -0.109), (14, -0.108), (15, 0.187), (16, -0.035), (17, 0.172), (18, -0.049), (19, 0.149), (20, -0.069), (21, 0.177), (22, -0.078), (23, 0.098), (24, -0.12), (25, 0.029), (26, -0.105), (27, 0.081), (28, 0.21), (29, 0.253), (30, -0.062), (31, 0.078), (32, 0.035), (33, -0.067), (34, -0.0), (35, 0.008), (36, -0.045), (37, 0.025), (38, -0.014), (39, -0.011), (40, -0.083), (41, 0.009), (42, 0.195), (43, -0.061), (44, -0.012), (45, 0.048), (46, -0.023), (47, 0.019), (48, 0.095), (49, -0.205)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96608192 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

Author: J. Zico Kolter, Marcus A. Maloof

2 0.7347427 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners

Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales

3 0.33025026 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction

Author: Nicolás García-Pedrajas, César García-Osorio, Colin Fyfe

4 0.28395569 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers (Special Topic on Model Selection)

Author: Marc Boullé

5 0.20121023 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes

Author: Dima Kuzmin, Manfred K. Warmuth

Abstract: n Maximum concept classes of VC dimension d over n domain points have size ≤d , and this is an upper bound on the size of any concept class of VC dimension d over n points. We give a compression scheme for any maximum class that represents each concept by a subset of up to d unlabeled domain points and has the property that for any sample of a concept in the class, the representative of exactly one of the concepts consistent with the sample is a subset of the domain of the sample. This allows us to compress any sample of a concept in the class to a subset of up to d unlabeled sample points such that this subset represents a concept consistent with the entire original sample. Unlike the previously known compression scheme for maximum classes (Floyd and Warmuth, 1995) which compresses to labeled subsets of the sample of size equal d, our new scheme is tight in the sense that the number of possible unlabeled compression sets of size at most d equals the number of concepts in the class. Keywords: compression schemes, VC dimension, maximum classes, one-inclusion graph

6 0.1640133 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning

7 0.15861112 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

8 0.13277724 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

9 0.11131243 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)

10 0.11124307 14 jmlr-2007-Behavioral Shaping for Geometric Concepts

11 0.11041359 43 jmlr-2007-Integrating Naïve Bayes and FOIL

12 0.10592815 34 jmlr-2007-From External to Internal Regret (Special Topic on the Conference on Learning Theory 2005)

13 0.1043119 74 jmlr-2007-Separating Models of Learning from Correlated and Uncorrelated Data (Special Topic on the Conference on Learning Theory 2005)

14 0.10360622 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

15 0.10240054 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

16 0.10014398 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss

17 0.097356997 23 jmlr-2007-Concave Learners for Rankboost

18 0.096554674 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

19 0.095541164 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

20 0.095160067 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.381), (4, 0.013), (8, 0.014), (10, 0.096), (12, 0.022), (15, 0.034), (22, 0.012), (28, 0.041), (40, 0.022), (45, 0.031), (48, 0.031), (60, 0.031), (63, 0.017), (80, 0.034), (85, 0.069), (98, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74999928 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts

Author: J. Zico Kolter, Marcus A. Maloof

2 0.73568839 25 jmlr-2007-Covariate Shift Adaptation by Importance Weighted Cross Validation

Author: Masashi Sugiyama, Matthias Krauledat, Klaus-Robert Müller

Abstract: A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisﬁed, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classiﬁcation under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the brain-computer interface, where strong non-stationarity effects can be seen between training and test sessions. Keywords: covariate shift, cross validation, importance sampling, extrapolation, brain-computer interface

3 0.38034278 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners

Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales

4 0.35334483 8 jmlr-2007-A Unified Continuous Optimization Framework for Center-Based Clustering Methods

Author: Marc Teboulle

Abstract: Center-based partitioning clustering algorithms rely on minimizing an appropriately formulated objective function, and different formulations suggest different possible algorithms. In this paper, we start with the standard nonconvex and nonsmooth formulation of the partitioning clustering problem. We demonstrate that within this elementary formulation, convex analysis tools and optimization theory provide a unifying language and framework to design, analyze and extend hard and soft center-based clustering algorithms, through a generic algorithm which retains the computational simplicity of the popular k-means scheme. We show that several well known and more recent center-based clustering algorithms, which have been derived either heuristically, or/and have emerged from intuitive analogies in physics, statistical techniques and information theoretic perspectives can be recovered as special cases of the proposed analysis and we streamline their relationships. Keywords: clustering, k-means algorithm, convex analysis, support and asymptotic functions, distance-like functions, Bregman and Csiszar divergences, nonlinear means, nonsmooth optimization, smoothing algorithms, ﬁxed point methods, deterministic annealing, expectation maximization, information theory and entropy methods

5 0.31756592 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

Author: Wei Pan, Xiaotong Shen

Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to ﬁt our proposed model, and propose a modiﬁed BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression proﬁles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage

6 0.31353909 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation

7 0.31030601 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

8 0.30891645 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data

9 0.30682555 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

10 0.30311656 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

11 0.29896265 39 jmlr-2007-Handling Missing Values when Applying Classification Models

12 0.29766929 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

13 0.29523388 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

14 0.29512423 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis

15 0.29456013 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

16 0.29435736 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

17 0.29335582 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction

18 0.29186991 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

19 0.29143295 53 jmlr-2007-Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling

20 0.29013917 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes