jmlr jmlr2007 jmlr2007-29 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: J. Zico Kolter, Marcus A. Maloof
Abstract: We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation— consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like—we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts. Keywords: concept learning, online learning, ensemble methods, concept drift
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Computer Science Georgetown University Washington, DC 20057-1232, USA Editor: Dale Schuurmans Abstract We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. [sent-7, score-0.663]
2 Keywords: concept learning, online learning, ensemble methods, concept drift 1. [sent-10, score-0.582]
3 Introduction In this paper, we describe an ensemble method designed expressly for tracking concept drift (Kolter and Maloof, 2003). [sent-11, score-0.491]
4 Ours is an extension of the weighted majority algorithm (Littlestone and Warmuth, 1994), which also tracks drifting concepts (Blum, 1997), but our algorithm, dynamic weighted majority (DWM), adds and removes base learners or experts in response to global and local performance. [sent-12, score-0.62]
5 Informally, concept drift occurs when a set of examples has legitimate class labels at one time and has different legitimate labels at another time. [sent-14, score-0.318]
6 Based on “Dynamic Weighted Majority: A new ensemble method for tracking concept drift”, by Jeremy Z. [sent-17, score-0.324]
7 Indeed, tracking concept drift is important for any application involving models of human behavior. [sent-27, score-0.351]
8 Overall, results suggest that a weighted ensemble of incremental learners tracks drifting concepts better than learners that simply modify concept descriptions, that store and learn from examples encountered previously, and that use an unweighted ensemble of experts. [sent-36, score-0.801]
9 In the next section, we survey work on the problem of concept drift, on ensemble methods, and at the intersection: work on ensemble methods for concept drift. [sent-42, score-0.528]
10 In Section 3, we describe dynamic weighted majority, an ensemble method for tracking concept drift. [sent-43, score-0.404]
11 Background and Related Work Dynamic weighted majority is an ensemble method designed expressly for concept drift. [sent-47, score-0.342]
12 For many years, research on ensemble methods and research on methods for concept drift have intermingled little. [sent-48, score-0.431]
13 However, within the last few years, researchers have proposed several ensemble methods for tracking concept drift. [sent-49, score-0.324]
14 In the next three sections, we survey relevant work on the problem of tracking concept drift, on ensemble methods, and on ensemble methods for tracking concept drift. [sent-50, score-0.648]
15 1 Concept Drift Concept drift (Schlimmer and Granger, 1986; Maloof, 2005) is an online learning task in which concepts change or drift over time. [sent-52, score-0.454]
16 In concrete terms, concept drift occurs when the class labels of a set of examples change over time. [sent-55, score-0.318]
17 It uses a frame representation and copes with such drift by either adjusting current concept descriptions or creating a new version of these descriptions. [sent-82, score-0.34]
18 ) Although not expressly designed for concept drift, the MetaL(B) and MetaL(IB) systems (Widmer, 1997) use meta-learning with naive Bayes and instance-based learning, respectively, to cope with reoccurring contexts. [sent-97, score-0.264]
19 The base learner then uses the predictive features of the contextually relevant examples in the current window to form new concept descriptions. [sent-103, score-0.343]
20 By adding a contextual variable to the STAGGER concepts (Schlimmer and Granger, 1986), experimental results using predictive accuracy suggest that the meta-learners were able use contextual features to acquire target concepts better than and more quickly than did the base learners alone. [sent-104, score-0.456]
21 When compared to FLORA 4 (Widmer and Kubat, 1996), which retrieves previously learned concept descriptions when contexts reoccur, MetaL(B) often performed better in terms of slope and asymptote, even though it relearned new concept descriptions rather than retrieving and modifying old ones. [sent-105, score-0.368]
22 Lane and Brodley (1998) investigated methods for concept drift in one-class learning problems (i. [sent-106, score-0.291]
23 For static concepts, the time stamp is an irrelevant attribute, but if drift occurs, then the time stamp will appear in induced trees. [sent-118, score-0.241]
24 Moreover, one can use its position in the tree as an indicator of how much drift may have occurred—the more relevant the time stamp, the higher it will appear in the tree, and the more drift that has occurred. [sent-119, score-0.366]
25 In subsequent work, Black and Hickey (2002) applied CD 3 to twenty-seven months of customercall data and were able to detect concept drift based on the number and height of time stamps in the 2758 DYNAMIC W EIGHTED M AJORITY induced decision trees. [sent-128, score-0.312]
26 The drops in predictive accuracy could have been due to sampling, and in general, it may be difficult to determine if a learner is coping with concept drift or is simply acquiring more information about a static concept. [sent-137, score-0.516]
27 This type of concept drift has been called virtual concept drift, as opposed to real concept drift (Klinkenberg and Joachims, 2000). [sent-138, score-0.706]
28 (1999) define population drift as being “related to” concept drift, noting that the term concept drift has no single meaning. [sent-143, score-0.582]
29 The term concept drift has been applied to different phenomena, such as drops and recoveries in performance during online learning (Syed et al. [sent-146, score-0.318]
30 However, in concrete terms, the term has historically meant that the class labels of instances change over time, established in the first paper on concept drift (Schlimmer and Granger, 1986). [sent-148, score-0.291]
31 However, the prior distribution may not change, and changes in the conditional distribution will not necessarily mean that concept drift has occurred. [sent-150, score-0.291]
32 Popular ensemble methods, such as bagging and boosting, are off-line algorithms, but researchers have developed online ensemble methods. [sent-214, score-0.328]
33 3 Ensemble Methods for Concept Drift Ensemble methods for concept drift share many similarities with the online and off-line methods discussed in the previous section. [sent-226, score-0.318]
34 However, methods for concept drift must take into account the temporal nature of the data stream, for a set of examples may have certain class labels at time t and others at time t . [sent-227, score-0.318]
35 Clearly, ensemble methods for concept drift must process a stream of data. [sent-228, score-0.431]
36 To achieve these effects, one ensemble method for concept drift builds two ensembles and selects the best performing one for subsequent processing (Scholz and Klinkenberg, 2006). [sent-233, score-0.431]
37 Blum’s (1997) implementation of the weighted majority algorithm (Littlestone and Warmuth, 1994) uses as experts pairs of features coupled with a history of the most recent class labels from the training set appearing with those features. [sent-236, score-0.237]
38 The global algorithm predicts based on a weighted-majority vote of the expert predictions and decreases the weight of any expert that predicts incorrectly. [sent-238, score-0.253]
39 Blum (1997) also investigated triples of features as experts and a variant of winnow (Littlestone, 1988) that lets experts abstain if their features are not present in an instance. [sent-240, score-0.318]
40 Herbster and Warmuth (1998) consider a setting in which an online learner is trained over several concepts and has access to n experts (which are fixed prediction strategies). [sent-253, score-0.346]
41 They also describe two pruning methods for limiting the number of experts maintained; one is useful in practice, the other gives formal guarantees on the number of experts that AddExp will create. [sent-268, score-0.318]
42 DWM: An Ensemble Method for Concept Drift Dynamic weighted majority maintains a weighted pool of experts or base learners. [sent-270, score-0.331]
43 DWM then trains the experts in the ensemble on the new example (line 23). [sent-298, score-0.317]
44 As mentioned previously, if the global prediction is incorrect (line 16), DWM adds a new expert to the ensemble with a weight of one (lines 17–19). [sent-345, score-0.237]
45 Finally, after using the new example to train each learner in the ensemble (lines 22 and 23), DWM outputs the global prediction, which is the weighted vote of the expert predictions (line 24). [sent-346, score-0.338]
46 The first, most extensive evaluation involved the STAGGER concepts (Schlimmer and Granger, 1986), a standard benchmark for evaluating how learners cope with drifting concepts. [sent-382, score-0.226]
47 In an effort to determine how our method scales to larger problems involving concept drift, our second evaluation consisted of testing DWM - NB using the SEA concepts (Street and Kim, 2001), a problem recently proposed in the data mining community. [sent-389, score-0.234]
48 For the sake of comparison, in addition to these algorithms, we also evaluated naive Bayes, ITI, naive Bayes with perfect forgetting, and ITI with perfect forgetting. [sent-438, score-0.28]
49 The “standard” or “traditional” implementations of naive Bayes and ITI provided a worst-case evaluation, since these systems have not been designed to cope with concept drift and learn from all examples in the stream regardless of changes to the target concept. [sent-439, score-0.489]
50 The implementations with perfect forgetting, which is the same as training the methods on each target concept individually, provided a best-case evaluation, since the systems were never burdened with examples or concept descriptions from previous target concepts. [sent-440, score-0.406]
51 As expected, naive Bayes with perfect forgetting performed the best on all three concepts, while naive Bayes without forgetting performed the worst. [sent-442, score-0.34]
52 7 Expert Count 6 5 4 3 DWM−NB DWM−ITI 2 1 0 20 40 80 60 Time Step (t) 100 120 Figure 4: Number of experts maintained with 95% confidence intervals for DWM - NB and DWM - ITI on the STAGGER concepts (Kolter and Maloof, 2003). [sent-449, score-0.28]
53 DWM - ITI converged more quickly than did DWM NB to the second and third target concepts, but if we compare the plots for naive Bayes and ITI with perfect forgetting, we see that ITI converged more quickly to these target concepts than did naive Bayes. [sent-454, score-0.415]
54 On average, DWM - ITI maintained fewer experts than did DWM - NB, and we attribute this to the fact that ITI performed better on the individual concepts than did naive Bayes. [sent-457, score-0.427]
55 AQ 11, although not designed to cope with concept drift, outperformed DWM - ITI in terms of asymptote on the first concept and in terms of slope on the third, but on the second concept, performed significantly worse than did DWM - ITI. [sent-470, score-0.342]
56 With respect to complexity of the experts themselves, the STAGGER concepts consist of three attributes, each taking one of three possible values. [sent-477, score-0.252]
57 Therefore, this implementation of weighted majority maintained 27 experts throughout the presentation of examples, as compared to the maximum of six that DWM - NB maintained. [sent-478, score-0.265]
58 Researchers have built several systems for coping with concept drift and have evaluated many of them on the STAGGER concepts. [sent-482, score-0.316]
59 In this experiment, when the concept changed from the first to the second, the examples of the first concept remaining in this window prevented AQ - PM from adjusting quickly to the second. [sent-487, score-0.321]
60 The only difference between these two learners is that AQ 11- PM maintains a fixed window of thirty examples that have appeared on the boundaries of 2770 DYNAMIC W EIGHTED M AJORITY concept descriptions. [sent-501, score-0.295]
61 ) Generally, acquiring the second concept after learning the first is the hardest task, as the second concept is almost a reversal of the first; the two concepts share only one positive example. [sent-511, score-0.37]
62 Acquiring the third concept after acquiring the second is easier because the two concepts share a greater number of positive examples. [sent-512, score-0.246]
63 A learning method, such as STAGGER, that only refines concept descriptions will have more difficulty responding to concept drift, as compared to an ensemble method, such as DWM, that both refines existing concept descriptions and creates new ones. [sent-514, score-0.628]
64 Recall that Blum’s weighted majority uses as experts pairs of features with a brief history of past predictions. [sent-517, score-0.237]
65 ) Assume a learner incrementally modifies its concept descriptions as new examples arrive. [sent-528, score-0.247]
66 Unfortunately, target concepts are not always disjoint, it is difficult to determine precisely when concepts change, and it is challenging to identify which concept descriptions (or parts of concept descriptions) apply to new target concepts. [sent-531, score-0.545]
67 On this problem, we evaluated DWM - NB, naive Bayes, and naive Bayes with perfect forgetting. [sent-558, score-0.26]
68 In the left graph of Figure 7, we see the predictive accuracies for DWM - NB, naive Bayes, and naive Bayes with perfect forgetting on the SEA concepts with 10% class noise. [sent-570, score-0.48]
69 As with the STAGGER concepts, naive Bayes performed the worst, since it had no direct method of removing outdated concept descriptions. [sent-571, score-0.244]
70 The number of experts was capped at five and at ten and compared to DWM - NB with no limit on the number of experts created. [sent-576, score-0.318]
71 ) A large number of experts obviously impacts memory utilization, and, depending on the complexity of the base learners, could also affect learning and performance time, since DWM trains and queries each expert in the ensemble when a new example arrives. [sent-581, score-0.428]
72 That is, after reaching the point where there are k experts in the ensemble, when DWM adds a new expert, it removes the expert with the lowest weight. [sent-584, score-0.256]
73 In this version of DWM, we set the threshold for removing experts to zero, which guaranteed that experts were removed only if they were the weakest member of the ensemble of k experts. [sent-585, score-0.458]
74 Indeed, DWM - NB with five experts performed almost as well as the original algorithm that placed no limit on the number of experts created. [sent-589, score-0.318]
75 If the new classifier improves the global performance of the ensemble, then it 2773 KOLTER AND M ALOOF is added, provided the ensemble does not contain a maximum number of classifiers; otherwise, SEA replaces a poorly performing classifier in the ensemble with the new classifier. [sent-595, score-0.28]
76 However, if every classifier in the ensemble has been trained on a given target concept, and the concept changes to one that is disjoint, then SEA must replace at least half of the classifiers in the ensemble before accuracy on the new target concept will surpass that on the old. [sent-596, score-0.621]
77 For instance, if the ensemble consists of 20 classifiers, and each learns from a fixed set of 500 examples, then it would take at least 5, 000 additional training examples before the ensemble contained a majority number of classifiers trained on the new concept. [sent-597, score-0.35]
78 Assume p = 500, the ensemble consists of 20 fully trained classifiers, all with a weight of one, and the new concept is disjoint from the previous one. [sent-599, score-0.264]
79 The original 20 experts will again misclassify the example, and the new expert will predict correctly. [sent-604, score-0.236]
80 Generally, ensemble methods with weighting mechanisms, like those present in DWM, will converge more quickly to target concepts (i. [sent-620, score-0.264]
81 Regarding the number of experts that DWM maintained, we used a simple heuristic that added a new expert whenever the global prediction was incorrect, which intuitively, should be problematic for noisy domains. [sent-623, score-0.256]
82 However, on the SEA concepts, while DWM - NB maintained as many as 40 experts at, say, time step 37, 500, it maintained only 22 experts on average over the 10 runs, which is similar to the 20–25 that SEA reportedly stored (Street and Kim, 2001). [sent-624, score-0.374]
83 If the number of experts were to reach impractical levels, then DWM could simply stop creating experts after obtaining acceptable accuracy; training would continue. [sent-625, score-0.318]
84 Since Blum’s implementation forms concept descriptions consisting of weighted pairs of attribute values, we reasoned that conditionally-dependent attributes would have less affect on its performance than they would on naive Bayes’. [sent-695, score-0.379]
85 DWM As a data set derived from real-world phenomenon, we cannot know definitively if or when concept drift occurred. [sent-726, score-0.291]
86 Concluding Remarks Tracking concept drift is important for many applications. [sent-755, score-0.291]
87 In this paper, we presented an ensemble method based on the weighted majority algorithm (Littlestone and Warmuth, 1994). [sent-764, score-0.218]
88 Our method, dynamic weighted majority, creates and removes base algorithms in response to changes in performance, which makes it well suited for problems involving concept drift. [sent-765, score-0.276]
89 On the problems we considered, a weighted ensemble of learners with mechanisms to add and remove experts in response to changes in performance provided a better response to concept drift than did other learners, especially those that relied on only incremental learning (i. [sent-768, score-0.76]
90 , FLORA 2 and the AQ - PM systems), or on an ensemble of unweighted learners (i. [sent-772, score-0.235]
91 We anticipate that these investigations will lead to general, robust, and scalable ensemble methods for tracking concept drift. [sent-790, score-0.324]
92 Classification of customer call data in the presence of concept drift and noise. [sent-1091, score-0.291]
93 A case-based technique for tracking concept drift in spam filtering. [sent-1139, score-0.351]
94 Detecting concept drift in financial time series prediction using symbolic machine learning. [sent-1217, score-0.311]
95 Refined time stamps for concept drift detection during mining for classification rules. [sent-1250, score-0.329]
96 Concept versioning: A methodology for tracking evolutionary concept drift in dynamic concept systems. [sent-1288, score-0.52]
97 Dynamic weighted majority: A new ensemble method for tracking concept drift. [sent-1307, score-0.359]
98 Using additive expert ensembles to cope with concept drift. [sent-1315, score-0.221]
99 Approaches to online learning and concept drift for user identification in computer security. [sent-1330, score-0.318]
100 Learning in the presence of concept drift and hidden contexts. [sent-1539, score-0.291]
wordName wordTfidf (topN-words)
[('dwm', 0.727), ('aq', 0.227), ('iti', 0.188), ('nb', 0.187), ('kolter', 0.175), ('drift', 0.167), ('stagger', 0.167), ('maloof', 0.166), ('experts', 0.159), ('ensemble', 0.14), ('concept', 0.124), ('naive', 0.12), ('sea', 0.094), ('concepts', 0.093), ('michalski', 0.083), ('aloof', 0.077), ('expert', 0.077), ('learners', 0.073), ('ajority', 0.073), ('pm', 0.066), ('predictive', 0.065), ('schlimmer', 0.064), ('tracking', 0.06), ('blum', 0.058), ('bayes', 0.058), ('descriptions', 0.049), ('eighted', 0.047), ('learner', 0.047), ('window', 0.046), ('dynamic', 0.045), ('electricity', 0.043), ('flora', 0.043), ('granger', 0.043), ('majority', 0.043), ('drifting', 0.04), ('forgetting', 0.04), ('harries', 0.04), ('calendar', 0.036), ('widmer', 0.036), ('weighted', 0.035), ('wah', 0.034), ('incremental', 0.034), ('base', 0.034), ('klinkenberg', 0.032), ('tree', 0.032), ('accuracy', 0.031), ('target', 0.031), ('predicts', 0.03), ('acquiring', 0.029), ('cap', 0.029), ('hulten', 0.029), ('static', 0.028), ('mechanisms', 0.028), ('maintained', 0.028), ('examples', 0.027), ('outperformed', 0.027), ('attribute', 0.027), ('online', 0.027), ('gama', 0.026), ('kubat', 0.026), ('maintains', 0.025), ('vfdt', 0.025), ('asymptote', 0.025), ('coping', 0.025), ('scheduling', 0.024), ('littlestone', 0.024), ('batch', 0.024), ('attributes', 0.024), ('warmuth', 0.023), ('stamp', 0.023), ('scholz', 0.023), ('accuracies', 0.022), ('slope', 0.022), ('asuncion', 0.022), ('hickey', 0.022), ('unweighted', 0.022), ('addexp', 0.021), ('maclin', 0.021), ('stamps', 0.021), ('vote', 0.021), ('period', 0.021), ('bagging', 0.021), ('day', 0.021), ('removes', 0.02), ('perfect', 0.02), ('prediction', 0.02), ('cope', 0.02), ('leaf', 0.02), ('arrives', 0.019), ('pricing', 0.019), ('street', 0.019), ('batches', 0.018), ('rules', 0.018), ('contextual', 0.018), ('trains', 0.018), ('predictions', 0.018), ('creates', 0.018), ('georgetown', 0.017), ('mining', 0.017), ('dence', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts
Author: J. Zico Kolter, Marcus A. Maloof
Abstract: We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation— consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like—we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts. Keywords: concept learning, online learning, ensemble methods, concept drift
2 0.18050787 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners
Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales
Abstract: In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the field of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts its internal parameters in order to face the current dynamics of the data stream. Results show that it is comparable to other batch algorithms when facing problems with no concept change, and it is better than evaluated methods in its ability to deal with concept drift when dealing with problems in which: concept change occurs at different speeds, noise may be present and, examples may arrive from different areas of the problem domain (virtual drift). Keywords: incremental algorithms, online learning, concept drift, decision trees, robust learners
3 0.064376481 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers (Special Topic on Model Selection)
Author: Marc Boullé
Abstract: The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with “soft variable selection”. Extensive experiments show that the compressionbased averaged classifier outperforms the Bayesian model averaging scheme. Keywords: naive Bayes, Bayesian, model selection, model averaging
4 0.063326374 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction
Author: Nicolás García-Pedrajas, César García-Osorio, Colin Fyfe
Abstract: In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods. Keywords: classifier ensembles, boosting, neural networks, nonlinear projections
5 0.045024756 11 jmlr-2007-Anytime Learning of Decision Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available. Keywords: anytime algorithms, decision tree induction, lookahead, hard concepts, resource-bounded reasoning
6 0.041520823 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes
7 0.036782585 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning
8 0.033763677 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
9 0.030604016 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
10 0.027140401 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)
11 0.026223041 83 jmlr-2007-The On-Line Shortest Path Problem Under Partial Monitoring
12 0.025351135 23 jmlr-2007-Concave Learners for Rankboost
13 0.024166847 16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation
14 0.024069835 14 jmlr-2007-Behavioral Shaping for Geometric Concepts
15 0.022661276 34 jmlr-2007-From External to Internal Regret (Special Topic on the Conference on Learning Theory 2005)
16 0.021632161 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss
17 0.021268953 65 jmlr-2007-PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers
18 0.021175731 52 jmlr-2007-Margin Trees for High-dimensional Classification
19 0.020860069 43 jmlr-2007-Integrating Naïve Bayes and FOIL
topicId topicWeight
[(0, 0.126), (1, 0.16), (2, -0.109), (3, 0.041), (4, 0.099), (5, 0.242), (6, 0.133), (7, 0.08), (8, -0.177), (9, 0.094), (10, -0.079), (11, 0.014), (12, -0.176), (13, -0.109), (14, -0.108), (15, 0.187), (16, -0.035), (17, 0.172), (18, -0.049), (19, 0.149), (20, -0.069), (21, 0.177), (22, -0.078), (23, 0.098), (24, -0.12), (25, 0.029), (26, -0.105), (27, 0.081), (28, 0.21), (29, 0.253), (30, -0.062), (31, 0.078), (32, 0.035), (33, -0.067), (34, -0.0), (35, 0.008), (36, -0.045), (37, 0.025), (38, -0.014), (39, -0.011), (40, -0.083), (41, 0.009), (42, 0.195), (43, -0.061), (44, -0.012), (45, 0.048), (46, -0.023), (47, 0.019), (48, 0.095), (49, -0.205)]
simIndex simValue paperId paperTitle
same-paper 1 0.96608192 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts
Author: J. Zico Kolter, Marcus A. Maloof
Abstract: We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation— consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like—we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts. Keywords: concept learning, online learning, ensemble methods, concept drift
2 0.7347427 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners
Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales
Abstract: In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the field of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts its internal parameters in order to face the current dynamics of the data stream. Results show that it is comparable to other batch algorithms when facing problems with no concept change, and it is better than evaluated methods in its ability to deal with concept drift when dealing with problems in which: concept change occurs at different speeds, noise may be present and, examples may arrive from different areas of the problem domain (virtual drift). Keywords: incremental algorithms, online learning, concept drift, decision trees, robust learners
3 0.33025026 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction
Author: Nicolás García-Pedrajas, César García-Osorio, Colin Fyfe
Abstract: In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting methods. Keywords: classifier ensembles, boosting, neural networks, nonlinear projections
4 0.28395569 22 jmlr-2007-Compression-Based Averaging of Selective Naive Bayes Classifiers (Special Topic on Model Selection)
Author: Marc Boullé
Abstract: The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with “soft variable selection”. Extensive experiments show that the compressionbased averaged classifier outperforms the Bayesian model averaging scheme. Keywords: naive Bayes, Bayesian, model selection, model averaging
5 0.20121023 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes
Author: Dima Kuzmin, Manfred K. Warmuth
Abstract: n Maximum concept classes of VC dimension d over n domain points have size ≤d , and this is an upper bound on the size of any concept class of VC dimension d over n points. We give a compression scheme for any maximum class that represents each concept by a subset of up to d unlabeled domain points and has the property that for any sample of a concept in the class, the representative of exactly one of the concepts consistent with the sample is a subset of the domain of the sample. This allows us to compress any sample of a concept in the class to a subset of up to d unlabeled sample points such that this subset represents a concept consistent with the entire original sample. Unlike the previously known compression scheme for maximum classes (Floyd and Warmuth, 1995) which compresses to labeled subsets of the sample of size equal d, our new scheme is tight in the sense that the number of possible unlabeled compression sets of size at most d equals the number of concepts in the class. Keywords: compression schemes, VC dimension, maximum classes, one-inclusion graph
6 0.1640133 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning
8 0.13277724 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
10 0.11124307 14 jmlr-2007-Behavioral Shaping for Geometric Concepts
11 0.11041359 43 jmlr-2007-Integrating Naïve Bayes and FOIL
12 0.10592815 34 jmlr-2007-From External to Internal Regret (Special Topic on the Conference on Learning Theory 2005)
14 0.10360622 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm
15 0.10240054 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
16 0.10014398 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss
17 0.097356997 23 jmlr-2007-Concave Learners for Rankboost
18 0.096554674 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption
19 0.095541164 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
20 0.095160067 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
topicId topicWeight
[(3, 0.381), (4, 0.013), (8, 0.014), (10, 0.096), (12, 0.022), (15, 0.034), (22, 0.012), (28, 0.041), (40, 0.022), (45, 0.031), (48, 0.031), (60, 0.031), (63, 0.017), (80, 0.034), (85, 0.069), (98, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.74999928 29 jmlr-2007-Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts
Author: J. Zico Kolter, Marcus A. Maloof
Abstract: We present an ensemble method for concept drift that dynamically creates and removes weighted experts in response to changes in performance. The method, dynamic weighted majority (DWM), uses four mechanisms to cope with concept drift: It trains online learners of the ensemble, it weights those learners based on their performance, it removes them, also based on their performance, and it adds new experts based on the global performance of the ensemble. After an extensive evaluation— consisting of five experiments, eight learners, and thirty data sets that varied in type of target concept, size, presence of noise, and the like—we concluded that DWM outperformed other learners that only incrementally learn concept descriptions, that maintain and use previously encountered examples, and that employ an unweighted, fixed-size ensemble of experts. Keywords: concept learning, online learning, ensemble methods, concept drift
2 0.73568839 25 jmlr-2007-Covariate Shift Adaptation by Importance Weighted Cross Validation
Author: Masashi Sugiyama, Matthias Krauledat, Klaus-Robert Müller
Abstract: A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the brain-computer interface, where strong non-stationarity effects can be seen between training and test sessions. Keywords: covariate shift, cross validation, importance sampling, extrapolation, brain-computer interface
3 0.38034278 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners
Author: Marlon Núñez, Raúl Fidalgo, Rafael Morales
Abstract: In the process of concept learning, target concepts may have portions with short-term changes, other portions may support long-term changes, and yet others may not change at all. For this reason several local windows need to be handled. We suggest facing this problem, which naturally exists in the field of concept learning, by allocating windows which can adapt their size to portions of the target concept. We propose an incremental decision tree that is updated with incoming examples. Each leaf of the decision tree holds a time window and a local performance measure as the main parameter to be controlled. When the performance of a leaf decreases, the size of its local window is reduced. This learning algorithm, called OnlineTree2, automatically adjusts its internal parameters in order to face the current dynamics of the data stream. Results show that it is comparable to other batch algorithms when facing problems with no concept change, and it is better than evaluated methods in its ability to deal with concept drift when dealing with problems in which: concept change occurs at different speeds, noise may be present and, examples may arrive from different areas of the problem domain (virtual drift). Keywords: incremental algorithms, online learning, concept drift, decision trees, robust learners
4 0.35334483 8 jmlr-2007-A Unified Continuous Optimization Framework for Center-Based Clustering Methods
Author: Marc Teboulle
Abstract: Center-based partitioning clustering algorithms rely on minimizing an appropriately formulated objective function, and different formulations suggest different possible algorithms. In this paper, we start with the standard nonconvex and nonsmooth formulation of the partitioning clustering problem. We demonstrate that within this elementary formulation, convex analysis tools and optimization theory provide a unifying language and framework to design, analyze and extend hard and soft center-based clustering algorithms, through a generic algorithm which retains the computational simplicity of the popular k-means scheme. We show that several well known and more recent center-based clustering algorithms, which have been derived either heuristically, or/and have emerged from intuitive analogies in physics, statistical techniques and information theoretic perspectives can be recovered as special cases of the proposed analysis and we streamline their relationships. Keywords: clustering, k-means algorithm, convex analysis, support and asymptotic functions, distance-like functions, Bregman and Csiszar divergences, nonlinear means, nonsmooth optimization, smoothing algorithms, fixed point methods, deterministic annealing, expectation maximization, information theory and entropy methods
5 0.31756592 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
Author: Wei Pan, Xiaotong Shen
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method. Keywords: BIC, EM, mixture model, penalized likelihood, soft-thresholding, shrinkage
6 0.31353909 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
7 0.31030601 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
8 0.30891645 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
9 0.30682555 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
10 0.30311656 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
11 0.29896265 39 jmlr-2007-Handling Missing Values when Applying Classification Models
12 0.29766929 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
13 0.29523388 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
14 0.29512423 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
15 0.29456013 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm
16 0.29435736 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
17 0.29335582 59 jmlr-2007-Nonlinear Boosting Projections for Ensemble Construction
18 0.29186991 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method
20 0.29013917 88 jmlr-2007-Unlabeled Compression Schemes for Maximum Classes