nips nips2010 nips2010-2 knowledge-graph by maker-knowledge-mining

2 nips-2010-A Bayesian Approach to Concept Drift

Source: pdf

Author: Stephen Bach, Mark Maloof

Abstract: To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. [sent-5, score-0.509]

2 We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. [sent-6, score-0.373]

3 We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. [sent-7, score-0.268]

4 A common problem in online classiﬁcation tasks is concept drift, which is when the target concept changes over time. [sent-11, score-0.462]

5 If the correct label for some x is y1 at time step t1 and y2 at time step t2 , does this indicate concept drift or that the training examples are noisy? [sent-13, score-0.534]

6 Researchers have approached drift in a number of ways. [sent-14, score-0.249]

7 A probabilistic model of drift offers three main beneﬁts to the research community. [sent-25, score-0.268]

8 Second, probability theory is a well-developed theory that could offer new insights into the problem of concept drift. [sent-27, score-0.18]

9 In this paper we present a probabilistic model of the number of most-recent training examples that the active concept describes. [sent-30, score-0.237]

10 We then describe a Bayesian approach to concept drift. [sent-37, score-0.18]

11 Finally, we show the results of an empirical comparison among our method (pruned and unpruned), BMC for change points, and Dynamic Weighted Majority [5], an ensemble method for concept drift. [sent-38, score-0.313]

12 Fearnhead and Liu [13] introduced an online version of Fearnhead’s simulation method [12] which uses particle ﬁltering to quickly update the distribution over change points. [sent-53, score-0.082]

13 2 A method for online Bayesian change-point detection Adams and MacKay [14] proposed maintaining a discrete distribution over lt , the length in time steps of the longest substrings of observations that are identically distributed, ending at time step t. [sent-57, score-0.54]

14 This method therefore models the location of only the most recent change point, a cost-saving measure useful for many online problems. [sent-58, score-0.095]

15 A conditional prior distribution p(lt |lt−1 ) is used, such that   λ−1 if lt = 0; p(lt |lt−1 ) = 1 − λ−1 if lt = lt−1 + 1;  0 otherwise. [sent-59, score-0.731]

16 The crucial aspect is that, given that a substring is identically distributed, it assigns mass to only two outcomes: the next observation is distributed identically to the observations of the substring, or it is the ﬁrst of a new substring. [sent-61, score-0.117]

18 In other words, it is the predictive probability 2 of a model trained on the observations received from time steps t − lt to t − 1. [sent-73, score-0.458]

19 Once this posterior distribution p(lt |D1:t ) is calculated, each model in the ensemble is trained on the new observation. [sent-75, score-0.165]

20 3 Comparing conditional distributions for concept drift We propose a new approach to coping with concept drift. [sent-77, score-0.653]

21 Using [14] as a starting point, we place a distribution over lt , which now refers to the length in time steps that the currently active concept has been active. [sent-79, score-0.607]

22 There is now an important distinction between BMC for concept drift and BMC for change points: BMC for concept drift models changes in p(Y |X), whereas BMC for change points models changes in the joint distribution p(Y, X). [sent-80, score-1.036]

23 We use the conditional distribution to look for drift points because we do not wish to react to changes in the marginal distribution p(X). [sent-81, score-0.402]

24 A change point in the joint distribution p(Y, X) could correspond to a change point in p(X), a drift point in p(Y |X), or both. [sent-82, score-0.307]

25 In other words, we assume that neither the sequence of attribute values X1:t nor the sequence of class labels Y1:t alone provide information about lt . [sent-84, score-0.376]

26 We also assume that examples from different concepts are independent. [sent-86, score-0.112]

27 p(Yt |Y1:t−1 , X1:t ) (3) To classify unlabeled attribute values X with class label Y , the predictive distribution is t p(Y |X) = p(Y |X, Y1:t , X1:t , lt = i)p(lt = i). [sent-89, score-0.411]

28 If left unchecked, the size of its ensemble will grow linearly with the number of observations. [sent-91, score-0.112]

29 We hypothesized that looking for drift points in the conditional distribution p(Y |X) instead of change points in the joint distribution p(Y, X) would lead to higher accuracy on classiﬁcation tasks. [sent-98, score-0.401]

30 It is identical to BCMC, except that it uses Equation 2 to compute the posterior over lt , where D ≡ (Y, X). [sent-100, score-0.366]

31 We also hypothesized that PBCMC could achieve improved combinations of accuracy and speed compared to Dynamic Weighted Majority (DWM) [5], an ensemble method for concept drift that uses a heuristic weighting scheme and pruning. [sent-101, score-0.593]

32 Like the other learners, DWM maintains a dynamically-sized, weighted ensemble of models trained on blocks of examples. [sent-103, score-0.17]

33 Then if the algorithm’s global prediction was incorrect, it adds a new model to the ensemble with a weight of 1, and it removes any models with weights below a threshold θ. [sent-106, score-0.146]

34 1 Test problems We conducted our experiments using four problems previously used in the literature to evaluate methods for concept drift The STAGGER concepts [1, 3] are three target concepts in a binary classiﬁcation task presented over 120 time steps. [sent-109, score-0.691]

35 For the ﬁrst 40 time steps, the target concept is color = red ∧ size = small. [sent-111, score-0.242]

36 For the next 40 time steps, the target concept is color = green ∨ shape = circle. [sent-112, score-0.242]

37 Finally, for the last 40 time steps, the target concept is size = medium ∨ size = large. [sent-113, score-0.259]

38 A number of researchers have used this problem to evaluate methods for concept drift [4, 5, 3, 1]. [sent-114, score-0.453]

39 Per the problem’s usual formulation, we evaluated each learner by presenting it with a single, random example at each time step and then testing it on a set of 100 random examples, resampled after each time step. [sent-115, score-0.146]

40 The SEA concepts [8] are four target concepts in a binary classiﬁcation task, presented over 50,000 time steps. [sent-117, score-0.244]

41 The target concept changes every 12,500 time steps, and associated with each concept is a single, randomly generated test set of 2,500 examples. [sent-118, score-0.456]

42 At each time step, a learner is presented with a randomly generated example, which has a 10% chance of being labeled as the wrong class. [sent-119, score-0.104]

43 Every 100 time steps, the learner is tested on the active concept’s test set. [sent-120, score-0.143]

44 The target concepts are hyperplanes, such that y = + if x1 + x2 ≤ θ, where θ ∈ {7, 8, 9, 9. [sent-125, score-0.13]

45 Several researchers have used a shifting hyperplane to evaluate learners for concept drift [5, 6, 7, 2, 8]. [sent-128, score-0.495]

46 Each learner was tested on the 1,685 examples for User 1. [sent-134, score-0.124]

47 At each time step, the learner was presented the next example without its label. [sent-135, score-0.104]

48 The task is to predict whether the price of electricity will go up or down based on ﬁve numeric attributes: the day of the week, the 30-minute period of the day, the demand for electricity in New South Wales, the demand in Victoria, and the amount of electricity to be transferred between the two. [sent-138, score-0.37]

49 At each time step, the learner classiﬁed the next example in temporal order before being given the correct label and using it to learn. [sent-140, score-0.104]

50 For STAGGER and SEA, we measured accuracy on the test set, then computed average accuracy and 95% conﬁdence intervals at each time step. [sent-144, score-0.122]

51 We present both AUC under the entire curve and after the ﬁrst drift point to show both a learner’s overall performance and its performance after drift occurs. [sent-147, score-0.498]

52 For CAP and electricity prediction, we measured accuracy on the unlabeled observations. [sent-148, score-0.127]

53 It takes the Bayesian approach to probabilities (hence the additional “Bayes” in the name), meaning that it places distributions over the parameters that govern 4 Table 1: Results for (a) the STAGGER concepts and (b) the SEA concepts. [sent-155, score-0.091]

54 (a) STAGGER concepts AUC AUC (overall) (after drift) 0. [sent-156, score-0.091]

55 011 Learner and Parameters BNB , on each concept PBCMC , λ = 20, φ = 10−4 BCMC , λ = 20 BMC , λ = 50 DWM , β = 0. [sent-180, score-0.18]

56 5, θ = 10−4 BNB , on all examples (b) SEA concepts AUC BNB , on each concept DWM , β = 0. [sent-181, score-0.292]

57 We also tested BNB as a control to show the effects of not attempting to cope with drift and BNB trained using only examples from the active concept (when such information was available) to show possible accuracy given perfect information about drift. [sent-218, score-0.565]

58 Parameter selection is difﬁcult when evaluating methods for concept drift. [sent-219, score-0.18]

59 We therefore tested each learner on each problem using each of a set of values for each parameter. [sent-221, score-0.103]

60 5 Table 2: Accuracy on the CAP and electricity data sets. [sent-230, score-0.091]

61 On the SEA concepts, DWM was the top performer, matching the accuracy of BNB trained on each concept and outperforming all the other learner methods. [sent-267, score-0.316]

62 Table 2 shows the top results for the CAP and electricity data sets. [sent-269, score-0.091]

63 DWM performed the best on the location and duration data sets, while BCMC performed best on the start time and day-of-week data sets. [sent-270, score-0.097]

64 PBCMC matched the accuracy of BCMC on the day-of-week and duration data sets and came close to it on the others. [sent-271, score-0.083]

65 BCMC performed the best on the electricity data set, closely followed by PBCMC. [sent-273, score-0.091]

66 The ﬁrst conclusion is clear: looking for changes in the conditional distribution p(Y |X) led to better accuracy than looking for changes in the joint distribution p(Y, X). [sent-274, score-0.211]

67 For both PBCMC and DWM, some of the most reactive parameterizations we tested were optimal on STAGGER and electricity, but some of the most stable were optimal on SEA and CAP. [sent-289, score-0.196]

68 For each problem, almost all of the parameterizations of the top learner were more accurate than almost all of the parameterizations of the other. [sent-291, score-0.151]

69 This indicates that PBCMC was generally better for the concepts which favor reactivity, whereas DWM was generally better for the concepts which favor stability. [sent-292, score-0.236]

70 9, θ = 10 86 20 0 20 40 60 Time Step (t) 80 100 120 0 (a) 12500 25000 Time Step (t) 37500 50000 (b) Figure 1: Average accuracy on (a) the STAGGER concepts and (b) the SEA concepts. [sent-302, score-0.127]

71 better performing learners in each problem were faster to react to concept drift. [sent-304, score-0.273]

72 This shows that DWM did not perform better on SEA simply by being more stable whether the concept was or not. [sent-305, score-0.217]

73 On the SEA concepts, PBCMC did perform best with the most stable parameterization we tried, but its main problem was that it wasn’t reactive enough when drift occurred. [sent-306, score-0.417]

74 Perhaps we can achieve better performances by using a more reactive parameterization of DWM on certain problems and/or a more stable parameterization of PBCMC on other problems. [sent-308, score-0.197]

75 For the problems on which PBCMC was superior, DWM’s best results were not obtained using the most reactive parameterization. [sent-310, score-0.102]

76 In other words, simply using an even more reactive parameterization of DWM did not improve performance on these problems. [sent-311, score-0.131]

77 Further, on the duration problem in the CAP data sets, PBCMC also achieved the reported accuracy using λ = 5000 and φ = 10−2 , and on the location problem it acheived negligibly better accuracy using λ = 5000 and φ = 10−3 or φ = 10−4 . [sent-312, score-0.146]

78 PBCMC favors reactivity by adding a new model at every time step and decaying the weights of all models by the degree to which they are incorrect. [sent-318, score-0.165]

79 DWM favors stability by only adding a new model after incorrect overall predictions and only decaying weights of incorrect models, and then only by a constant factor. [sent-319, score-0.086]

80 This is supported by the results on problems favoring reactive parameterizations compared with the results on problems favoring stable parameterizations. [sent-320, score-0.208]

81 Further, that it is difﬁcult to close the performance gaps with better parameter selection suggests that there is a range of reactivity or stability each favors. [sent-321, score-0.139]

82 When parameterized beyond this range, the performance of each learner degrades, or at least plateaus. [sent-322, score-0.081]

83 To further support this theory, we consider trends in ensemble sizes. [sent-323, score-0.112]

84 The ﬁgure shows that the trends in ensemble sizes were roughly interchanged between the two learners on the two problems. [sent-326, score-0.154]

85 On both problems, one learner stayed within a relatively small range of ensemble sizes, whereas the other continued to expand the ensemble when the concept was stable, only signiﬁcantly pruning soon after drift. [sent-327, score-0.501]

86 On STAGGER , PBCMC expanded its ensemble size far more, whereas DWM did on SEA . [sent-328, score-0.128]

87 9, θ = 10 1400 60 1200 50 Ensemble size Ensemble size 1000 40 30 800 600 20 400 10 200 0 0 0 20 40 60 Time Step (t) 80 100 120 0 (a) 12500 25000 Time Step (t) 37500 50000 (b) Figure 2: Average numbers of models on (a) the STAGGER concepts and (b) the SEA concepts. [sent-333, score-0.109]

88 its ensemble more than when it is not as likely. [sent-335, score-0.112]

89 On STAGGER, PBCMC matched the performance of BNB on the ﬁrst target concept (not shown), whereas DWM made more mistakes as it reacted to erroneously inferred drift. [sent-339, score-0.26]

90 On SEA, PBCMC needs to be parameterized to be so stable that it cannot react quickly to drift. [sent-340, score-0.104]

91 5 Conclusion and Future Work In this paper we presented a Bayesian approach to coping with concept drift. [sent-341, score-0.205]

92 We showed that looking for changes in the conditional distribution p(Y |X) led to better accuracy than looking for changes in the joint distribution p(Y, X). [sent-343, score-0.211]

93 We also showed that our Bayesian approach is competitive with one of the top ensemble methods for concept drift, DWM, sometimes beating and sometimes losing to it. [sent-344, score-0.326]

94 We showed that both PBCMC and DWM appear to favor a different range of reactivity or stability. [sent-346, score-0.108]

95 Related to this task is a better characterization of their relative advantages and the relationships among them, their favored ranges of reactivity or stability, and the problems to which they are applied. [sent-348, score-0.111]

96 It also important to note that the more constrained ensemble sizes discussed above correspond to faster classiﬁcation speeds. [sent-349, score-0.112]

97 With a useful probabilistic model for concept drift, such as ours, one could potentially incorporate existing probabilistic domain knowledge to guide the search for drift points or build broader models that use beliefs about drift to guide decision making. [sent-352, score-0.75]

98 Learning in the presence of concept drift and hidden contexts. [sent-374, score-0.429]

99 Dynamic weighted majority: An ensemble method for drifting concepts. [sent-389, score-0.112]

100 Using additive expert ensembles to cope with concept drift. [sent-396, score-0.22]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pbcmc', 0.533), ('dwm', 0.444), ('lt', 0.348), ('drift', 0.249), ('bcmc', 0.19), ('bnb', 0.19), ('stagger', 0.19), ('sea', 0.184), ('concept', 0.18), ('bmc', 0.142), ('ensemble', 0.112), ('reactive', 0.102), ('concepts', 0.091), ('electricity', 0.091), ('reactivity', 0.089), ('learner', 0.081), ('cap', 0.067), ('auc', 0.054), ('react', 0.051), ('duration', 0.047), ('fearnhead', 0.045), ('learners', 0.042), ('target', 0.039), ('bayesian', 0.038), ('stable', 0.037), ('accuracy', 0.036), ('day', 0.035), ('adams', 0.035), ('parameterizations', 0.035), ('changes', 0.034), ('identically', 0.033), ('dt', 0.033), ('parameterization', 0.029), ('online', 0.029), ('barry', 0.029), ('changepoint', 0.029), ('mackay', 0.029), ('stability', 0.028), ('attribute', 0.028), ('looking', 0.028), ('location', 0.027), ('intervals', 0.027), ('observations', 0.026), ('attributes', 0.025), ('coping', 0.025), ('maloof', 0.025), ('reacted', 0.025), ('schlimmer', 0.025), ('substring', 0.025), ('victoria', 0.025), ('mining', 0.025), ('researchers', 0.024), ('pruned', 0.024), ('time', 0.023), ('demand', 0.023), ('steps', 0.023), ('favored', 0.022), ('georgetown', 0.022), ('performer', 0.022), ('regularly', 0.022), ('gaps', 0.022), ('dence', 0.022), ('tested', 0.022), ('incorrect', 0.021), ('examples', 0.021), ('cope', 0.021), ('blocks', 0.021), ('change', 0.021), ('acm', 0.02), ('assign', 0.02), ('ensembles', 0.019), ('bayes', 0.019), ('favor', 0.019), ('trained', 0.019), ('predictive', 0.019), ('probabilistic', 0.019), ('conditional', 0.019), ('step', 0.019), ('posterior', 0.018), ('sigkdd', 0.018), ('conducted', 0.018), ('week', 0.018), ('models', 0.018), ('medium', 0.017), ('wish', 0.017), ('active', 0.017), ('sometimes', 0.017), ('favoring', 0.017), ('weights', 0.016), ('placed', 0.016), ('distribution', 0.016), ('hypothesized', 0.016), ('numeric', 0.016), ('seventh', 0.016), ('classi', 0.016), ('whereas', 0.016), ('quickly', 0.016), ('beliefs', 0.016), ('marginalizing', 0.016), ('ny', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 2 nips-2010-A Bayesian Approach to Concept Drift

Author: Stephen Bach, Mark Maloof

2 0.069901668 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

Author: Sashank J. Reddi, Sunita Sarawagi, Sundar Vishwanathan

Abstract: We propose a new LP relaxation for obtaining the MAP assignment of a binary MRF with pairwise potentials. Our relaxation is derived from reducing the MAP assignment problem to an instance of a recently proposed Bipartite Multi-cut problem where the LP relaxation is guaranteed to provide an O(log k) approximation where k is the number of vertices adjacent to non-submodular edges in the MRF. We then propose a combinatorial algorithm to efﬁciently solve the LP and also provide a lower bound by concurrently solving its dual to within an approximation. The algorithm is up to an order of magnitude faster and provides better MAP scores and bounds than the state of the art message passing algorithm of [1] that tightens the local marginal polytope with third-order marginal constraints. 1

3 0.057619136 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but ﬁlter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the ﬁltering loss of the ensemble as a theoretical justiﬁcation of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We ﬁnd that our approach signiﬁcantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

4 0.054215215 138 nips-2010-Large Margin Multi-Task Metric Learning

Author: Shibin Parameswaran, Kilian Q. Weinberger

Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural ﬁt for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classiﬁcation problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classiﬁers. 1

5 0.053620297 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

Abstract: We present a novel approach to inference in conditionally Gaussian continuous time stochastic processes, where the latent process is a Markovian jump process. We ﬁrst consider the case of jump-diffusion processes, where the drift of a linear stochastic differential equation can jump at arbitrary time points. We derive partial differential equations for exact inference and present a very efﬁcient mean ﬁeld approximation. By introducing a novel lower bound on the free energy, we then generalise our approach to Gaussian processes with arbitrary covariance, such as the non-Markovian RBF covariance. We present results on both simulated and real data, showing that the approach is very accurate in capturing latent dynamics and can be useful in a number of real data modelling tasks.

6 0.051102743 222 nips-2010-Random Walk Approach to Regret Minimization

7 0.050336376 150 nips-2010-Learning concept graphs from text with stick-breaking priors

8 0.041589402 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation

9 0.04028618 1 nips-2010-(RF)^2 -- Random Forest Random Field

10 0.039557133 124 nips-2010-Inductive Regularized Learning of Kernel Functions

11 0.038136076 261 nips-2010-Supervised Clustering

12 0.034860924 167 nips-2010-Mixture of time-warped trajectory models for movement decoding

13 0.034349553 27 nips-2010-Agnostic Active Learning Without Constraints

14 0.031967238 228 nips-2010-Reverse Multi-Label Learning

15 0.031493548 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

16 0.030087737 130 nips-2010-Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains

17 0.030061791 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

18 0.030037303 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models

19 0.02934357 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

20 0.028966319 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.093), (1, 0.011), (2, 0.002), (3, -0.0), (4, -0.03), (5, 0.035), (6, -0.01), (7, -0.032), (8, 0.021), (9, 0.014), (10, -0.009), (11, -0.006), (12, 0.02), (13, -0.005), (14, 0.02), (15, 0.044), (16, -0.027), (17, 0.014), (18, -0.029), (19, 0.035), (20, -0.039), (21, 0.049), (22, 0.009), (23, 0.075), (24, -0.034), (25, -0.014), (26, -0.017), (27, -0.003), (28, -0.006), (29, -0.037), (30, 0.047), (31, 0.012), (32, -0.046), (33, -0.07), (34, 0.01), (35, -0.033), (36, -0.015), (37, 0.044), (38, 0.018), (39, -0.006), (40, -0.001), (41, 0.037), (42, -0.082), (43, 0.024), (44, -0.002), (45, 0.007), (46, 0.001), (47, -0.012), (48, -0.036), (49, -0.142)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85695505 2 nips-2010-A Bayesian Approach to Concept Drift

Author: Stephen Bach, Mark Maloof

2 0.54384226 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks

Author: Dirk Husmeier, Frank Dondelinger, Sophie Lebre

Abstract: Conventional dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption, which is too restrictive in many practical applications. Various approaches to relax the homogeneity assumption have recently been proposed, allowing the network structure to change with time. However, unless time series are very long, this ﬂexibility leads to the risk of overﬁtting and inﬂated inference uncertainty. In the present paper we investigate three regularization schemes based on inter-segment information sharing, choosing different prior distributions and different coupling schemes between nodes. We apply our method to gene expression time series obtained during the Drosophila life cycle, and compare the predicted segmentation with other state-of-the-art techniques. We conclude our evaluation with an application to synthetic biology, where the objective is to predict a known in vivo regulatory network of ﬁve genes in yeast. 1

3 0.52042598 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation

Author: Abhishek Kumar, Avishek Saha, Hal Daume

Abstract: This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in E ASYA DAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical ﬁndings and demonstrate the efﬁcacy of the proposed method when compared to EA as well as few other representative baseline approaches.

4 0.47103444 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

Author: Manfred Opper, Andreas Ruttor, Guido Sanguinetti

5 0.45037451 167 nips-2010-Mixture of time-warped trajectory models for movement decoding

Author: Elaine Corbett, Eric Perreault, Konrad Koerding

Abstract: Applications of Brain-Machine-Interfaces typically estimate user intent based on biological signals that are under voluntary control. For example, we might want to estimate how a patient with a paralyzed arm wants to move based on residual muscle activity. To solve such problems it is necessary to integrate obtained information over time. To do so, state of the art approaches typically use a probabilistic model of how the state, e.g. position and velocity of the arm, evolves over time – a so-called trajectory model. We wanted to further develop this approach using two intuitive insights: (1) At any given point of time there may be a small set of likely movement targets, potentially identified by the location of objects in the workspace or by gaze information from the user. (2) The user may want to produce movements at varying speeds. We thus use a generative model with a trajectory model incorporating these insights. Approximate inference on that generative model is implemented using a mixture of extended Kalman filters. We find that the resulting algorithm allows us to decode arm movements dramatically better than when we use a trajectory model with linear dynamics. 1 In trod u cti on When patients have lost a limb or the ability to communicate with the outside world, brain machine interfaces (BMIs) are often used to enable robotic prostheses or restore communication. To achieve this, the user's intended state of the device must be decoded from biological signals. In the context of Bayesian statistics, two aspects are important for the design of an estimator of a temporally evolving state: the observation model, which describes how measured variables relate to the system’s state and the trajectory model which describes how the state changes over time in a probabilistic manner. Following this logic many recent BMI applications have relied on Bayesian estimation for a wide range of problems including the decoding of intended human [1] and animal [2] movements. In the context of BMIs, Bayesian approaches offer a principled way of formalizing the uncertainty about signals and thus often result in improvements over other signal processing techniques [1]-[3]. Most work on state estimation in dynamical systems has assumed linear dynamics and Gaussian noise. Under these circumstances, efficient algorithms result from belief propagation. The most frequent application uses the Kalman filter (KF), which recursively combines noisy state observations with the probabilistic evolution of state defined by the trajectory model to estimate the marginal distribution over states [4]. Such approaches have been used widely for applications including upper [1] and lower [5] extremity prosthetic 1 devices, functional electric stimulation [6] and human computer interactions [7]. As these algorithms are so commonly used, it seems promising to develop extensions to nonlinear trajectory models that may better describe the probabilistic distribution of movements in everyday life. One salient departure from the standard assumptions is that people tend to produce both slow and fast movements, depending on the situation. Models with linear dynamics only allow such deviation through the noise term, which makes these models poor at describing the natural variation of movement speeds during real world tasks. Explicitly incorporating movement speed into the trajectory model should lead to better movement estimates. Knowledge of the target position should also strongly affect trajectory models. After all , we tend to accelerate our arm early during movement and slow down later on. Target information can be linearly incorporated into the trajectory model, and this has greatly improved predictions [8]-[12]. Alternatively, if there are a small number of potential targets then a mixture of trajectory models approach [13] can be used. Here we are interested in the case where available data provide a prior over potential t argets but where movement targets may be anywhere. We want to incorporate target uncertainty and allow generalization to novel targets. Prior information about potential targets could come from a number of sources but would generally be noisy. For example, activity in the dorsal premotor cortex provides information about intended target location prior to movement and may be used where such recordings are available [14]. Target information may also be found noninvasively by tracking eye movements. However, such data will generally provide non-zero priors for a number of possible target locations as the subject saccades over the scene. While subjects almost always look at a target before reaching for it [15], there may be a delay of up to a second between looking at the target and the reach – a time interval over which up to 3 saccades are typically made. Each of these fixations could be the target. Hence, a probabilistic distribution of targets is appropriate when using either neural recordings or eye tracking to estimate potential reach targets Here we present an algorithm that uses a mixture of extended Kalman Filters (EKFs) to combine our insights related to the variation of movement speed and the availability of probabilistic target knowledge. Each of the mixture component s allows the speed of the movement to vary continuously over time. We tested how well we could use EMGs and eye movements to decode hand position of humans performing a three -dimensional large workspace reaching task. We find that using a trajectory model that allows for probabilistic target information and variation of speed leads to dramatic improvements in decoding quality. 2 Gen e ral Decod i n g S etti n g We wanted to test how well different decoding algorithms can decode human movement, over a wide range of dynamics. While many recent studies have looked at more restrictive, two-dimensional movements, a system to restore arm function should produce a wide range of 3D trajectories. We recorded arm kinematics and EMGs of healthy subjects during unconstrained 3D reaches to targets over a large workspace. Two healthy subjects were asked to reach at slow, normal and fast speeds, as they would in everyday life. Subjects were seated as they reached towards 16 LEDs in blocks of 150s, which were located on two planes positioned such that all targets were just reachable (Fig 1A). The target LED was lit for one second prior to an auditory go cue, at which time the subject would reach to the target at the appropriate speed. Slow, normal and fast reaches were allotted 3 s, 1.5s and 1s respectively; however, subjects determined the speed. An approximate total of 450 reaches were performed per subject. The subjects provided informed consent, and the protocol was approved by the Northwestern University Institutional Review Board. EMG signals were measured from the pectoralis major, and the three deltoid muscles of the shoulder. This represents a small subset of the muscles involved in reaching, and approximates those muscles retaining some voluntary control following mid-level cervical spinal cord injuries. 2 The EMG signals were band-pass filtered between 10 and 1,000 Hz, and subsequently anti aliased filtered. Hand, wrist, shoulder and head positions were tracked using an Optotrak motion analysis system. We simultaneously recorded eye movements with an ASL EYETRAC-6 head mounted eye tracker. Approximately 25% of the reaches were assigned to the test set, and the rest were used for training. Reaches for which either the motion capture data was incomplete, or there was visible motion artifact on the EMG were removed. As the state we used hand positions and joint angles (3 shoulder, 2 elbow, position, velocity and acceleration, 24 dimensions). Joint angles were calculated from the shoulder and wrist marker data using digitized bony landmarks which defined a coordinate system for the upper limb as detailed by Wu et al. [16]. As the motion data were sampled at 60Hz, the mean absolute value o f the EMG in the corresponding 16.7ms windows was used as an observation of the state at each time-step. Algorithm accuracy was quantified by normalizing the root -mean-squared error by the straight line distance between the first and final position of the endpoint for each reach. We compared the algorithms statistically using repeated measures ANOVAs with Tukey post -hoc tests, treating reach and subject as random effects. In the rest of the paper we will ask how well these reaching movements can be decoded from EMG and eye-tracking data. Figure 1: A Experimental setup and B sample kinematics and processed EMGs for one reach 3 Kal man Fi l ters w i th Target i n f ormati on All models that we consider in this paper assume linear observations with Gaussian noise: (1) where x is the state, y is the observation and v is the measurement noise with p(v) ~ N(0,R), and R is the observation covariance matrix. The model fitted the measured EMGs with an average r2 of 0.55. This highlights the need to integrate information over time. The standard approach also assumes linear dynamics and Gaussian process noise: (2) where, x t represents the hand and joint angle positions, w is the process noise with p(w) ~ N(0,Q), and Q is the state covariance matrix. The Kalman filter does optimal inference for this generative model. This model can effectively capture the dynamics of stereotypical reaches to a single target by appropriately tuning its parameters. However, when used to describe reaches to multiple targets, the model cannot describe target dependent aspects of reaching but boils down to a random drift model. Fast velocities are underestimated as they are unlikely under the trajectory model and there is excessive drift close to the target (Fig. 2A). 3 In many decoding applications we may know the subject’s target. A range of recent studies have addressed the issue of incorporating this information into the trajectory model [8, 13], and we might assume the effect of the target on the dynamics to be linear. This naturally suggests adding the target to the state space, which works well in practice [9, 12]. By appending the target to the state vector (KFT), the simple linear format of the KF may be retained: (3) where xTt is the vector of target positions, with dimensionality less than or equal to that of xt. This trajectory model thus allows describing both the rapid acceleration that characterizes the beginning of a reach and the stabilization towards its end. We compared the accuracy of the KF and the KFT to the Single Target Model (STM), a KF trained only on reaches to the target being tested (Fig. 2). The STM represents the best possible prediction that could be obtained with a Kalman filter. Assuming the target is perfectly known, we implemented the KFT by correctly initializing the target state xT at the beginning of the reach. We will relax this assumption below. The initial hand and joint angle positions were also assumed to be known. Figure 2: A Sample reach and predictions and B average accuracies with standard errors for KFT, KF and MTM. Consistent with the recent literature, both methods that incorporated target information produced higher prediction accuracy than the standard KF (both p<0.0001). Interestingly, there was no significant difference between the KFT and the STM (p=0.9). It seems that when we have knowledge of the target, we do not lose much by training a single model over the whole workspace rather than modeling the targets individually. This is encouraging, as we desire a BMI system that can generalize to any target within the workspace, not just specifically to those that are available in the training data. Clearly, adding the target to the state space allows the dynamics of typical movements to be modeled effectively, resulting in dramatic increases in decoding performance. 4 Ti me Warp i n g 4.1 I m p l e m e n t i n g a t i m e - w a r p e d t r a j e c t o r y mo d e l While the KFT above can capture the general reach trajectory profile, it does not allow for natural variability in the speed of movements. Depending on our task objectives, which would not directly be observed by a BMI, we might lazily reach toward a target or move a t maximal speed. We aim to change the trajectory model to explicitly incorporate a warping factor by which the average movement speed is scaled, allowing for such variability. As the movement speed will be positive in all practical cases, we model the logarithm of this factor, 4 and append it to the state vector: (4) We create a time-warped trajectory model by noting that if the average rate of a trajectory is to be scaled by a factor S, the position at time t will equal that of the original trajectory at time St. Differentiating, the velocity will be multiplied by S, and the acceleration by S 2. For simplicity, the trajectory noise is assumed to be additive and Gaussian, and the model is assumed to be stationary: (5) where Ip is the p-dimensional identity matrix and is a p p matrix of zeros. Only the terms used to predict the acceleration states need to be estimated to build the state transition matrix, and they are scaled as a nonlinear function of xs. After adding the variable movement speed to the state space the system is no longer linear. Therefore we need a different solution strategy. Instead of the typical KFT we use the Extended Kalman Filter (EKFT) to implement a nonlinear trajectory model by linearizing the dynamics around the best estimate at each time-step [17]. With this approach we add only small computational overhead to the KFT recursions. 4.2 Tr a i n i n g t h e t i m e w a r p i n g mo d e l The filter parameters were trained using a variant of the Expectation Maximization (EM) algorithm [18]. For extended Kalman filter learning the initialization for the variables may matter. S was initialized with the ground truth average reach speeds for each movement relative to the average speed across all movements. The state transition parameters were estimated using nonlinear least squares regression, while C, Q and R were estimated linearly for the new system, using the maximum likelihood solution [18] (M-step). For the E-step we used a standard extended Kalman smoother. We thus found the expected values for t he states given the current filter parameters. For this computation, and later when testing the algorithm, xs was initialized to its average value across all reaches while the remaining states were initialized to their true values. The smoothed estimate fo r xs was then used, along with the true values for the other states, to re-estimate the filter parameters in the M-step as before. We alternated between the E and M steps until the log likelihood converged (which it did in all cases). Following the training procedure, the diagonal of the state covariance matrix Q corresponding to xs was set to the variance of the smoothed xs over all reaches, according to how much this state should be allowed to change during prediction. This allowed the estimate of xs to develop over the course of the reach due to the evidence provided by the observations, better capturing the dynamics of reaches at different speeds. 4.3 P e r f o r ma n c e o f t h e t i m e - w a r p e d E K F T Incorporating time warping explicitly into the trajectory model pro duced a noticeable increase in decoding performance over the KFT. As the speed state xs is estimated throughout the course of the reach, based on the evidence provided by the observations, the trajectory model has the flexibility to follow the dynamics of the reach more accurately (Fig. 3). While at the normal self-selected speed the difference between the algorithms is small, for the slow and fast speeds, where the dynamics deviate from average, there i s a clear advantage to the time warping model. 5 Figure 3: Hand positions and predictions of the KFT and EKFT for sample reaches at A slow, B normal and C fast speeds. Note the different time scales between reaches. The models were first trained using data from all speeds (Fig. 4A). The EKFT was 1.8% more accurate on average (p<0.01), and the effect was significant at the slow (1.9%, p<0.05) and the fast (2.8%, p<0.01), but not at the normal (p=0.3) speed. We also trained the models from data using only reaches at the self-selected normal speed, as we wanted to see if there was enough variation to effectively train the EKFT (Fig. 4B). Interestingly, the performance of the EKFT was reduced by only 0.6%, and the KFT by 1.1%. The difference in performance between the EKFT and KFT was even more pronounced on aver age (2.3%, p<0.001), and for the slow and fast speeds (3.6 and 4.1%, both p< 0.0001). At the normal speed, the algorithms again were not statistically different (p=0.6). This result demonstrates that the EKFT is a practical option for a real BMI system, as it is not necessary to greatly vary the speeds while collecting training data for the model to be effective over a wide range of intended speeds. Explicitly incorporating speed information into the trajectory model helps decoding, by modeling the natural variation in volitional speed. Figure 4: Mean and standard error of EKFT and KFT accuracy at the different subjectselected speeds. Models were trained on reaches at A all speeds and B just normal speed reaches. Asterisks indicate statistically significant differences between the algorithms. 5 Mi xtu res of Target s So far, we have assumed that the targets of our reaches are perfectly known. In a real-world system, there will be uncertainty about the intended target of the reach. However, in typical applications there are a small number of possible objectives. Here we address this situation. Drawing on the recent literature, we use a mixture model to consider each of the possible targets [11, 13]. We condition the posterior probability for the state on the N possible targets, T: (6) 6 Using Bayes' Rule, this equation becomes: (7) As we are dealing with a mixture model, we perform the Kalman filter recursion for each possible target, xT, and our solution is a weighted sum of the outputs. The weights are proportional to the prior for that target, , and the likelihood of the model given that target . is independent of the target and does not need to be calculated. We tested mixtures of both algorithms, the mKFT and mEKFT, with real uncert ain priors obtained from eye-tracking in the one-second period preceding movement. As the targets were situated on two planes, the three-dimensional location of the eye gaze was found by projecting its direction onto those planes. The first, middle and last eye samples were selected, and all other samples were assigned to a group according to which of the three was closest. The mean and variance of these three groups were used to initialize three Kalman filters in the mixture model. The priors of the three groups were assigned proportional to the number of samples in them. If the subject looks at multiple positions prior to reaching, this method ensures with a high probability that the correct target was accounted for in one of the filters in the mixture. We also compared the MTM approach of Yu et al. [13], where a different KF model was generated for each target, and a mixture is performed over these models. This approach explicitly captures the dynamics of stereotypical reaches to specific targets. Given perfect target information, it would reduce to the STM described above. Priors for the MTM were found by assigning each valid eye sample to its closest two targets, and weighting the models proportional to the number of samples assigned to the corresponding target, divided by its distance from the mean of those samples. We tried other ways of assigning priors and the one presented gave the best results. We calculated the reduction in decoding quality when instead of perfect priors we provide eye-movement based noisy priors (Fig. 5). The accuracies of the mEKFT, the mKFT and the MTM were only degraded by 0.8, 1.9 and 2.1% respectively, compared to the perfect prior situation. The mEKFT was still close to 10% better than the KF. The mixture model framework is effective in accounting for uncertain priors. Figure 5: Mean and standard errors of accuracy for algorithms with perfect priors, and uncertain priors with full and partial training set. The asterisk indicates a statistically significant effects between the two training types, where real priors are used. Here, only reaches at normal speed were used to train the models, as this is a more realistic training set for a BMI application. This accounts for the degraded performance of the MTM with perfect priors relative to the STM from above (Fig. 2). With even more stereotyped training data for each target, the MTM doesn't generalize as well to new speeds. 7 We also wanted to know if the algorithms could generalize to new targets. In a real application, the available training data will generally not span the entire useable worksp ace. We compared the algorithms where reaches to all targets except the one being tested had been used to train the models. The performance of the MTM was significantly de graded unsurprisingly, as it was designed for reaches to a set of known targets. Performance of the mKFT and mEKFT degraded by about 1%, but not significantly (both p>0.7), demonstrating that the continuous approach to target information is preferable when the target could be anywhere in space, not just at locations for which training data is available. 6 Di scu ssi on and concl u si on s The goal of this work was to design a trajectory model that would improve decoding for BMIs with an application to reaching. We incorporated two features that prominently influence the dynamics of natural reach: the movement speed and the target location. Our approach is appropriate where uncertain target information is available. The model generalizes well to new regions of the workspace for which there is no training data, and across a broad range of reaching dynamics to widely spaced targets in three dimensions. The advantages over linear models in decoding precision we report here could be equally obtained using mixtures over many targets and speeds. While mixture models [11, 13] could allow for slow versus fast movements and any number of potential targets, this strategy will generally require many mixture components. Such an approach would require a lot more training data, as we have shown that it does not generalize well. It would also be run-time intensive which is problematic for prosthetic devices that rely on low power controllers. In contrast, the algorithm introduced here only takes a small amount of additional run-time in comparison to the standard KF approach. The EKF is only marginally slower than the standard KF and the algorithm will not generally need to consider more than 3 mixture components assuming the subject fixates the target within the second pre ceding the reach. In this paper we assumed that subjects always would fixate a reach target – along with other non-targets. While this is close to the way humans usually coordinate eyes and reaches [15], there might be cases where people do not fixate a reach target. Our approach could be easily extended to deal with such situations by adding a dummy mixture component that all ows the description of movements to any target. As an alternative to mixture approaches, a system can explicitly estimate the target position in the state vector [9]. This approach, however, would not straightforwardly allow for the rich target information available; we look at the target but also at other locations, strongly suggesting mixture distributions. A combination of the two approaches could further improve decoding quality. We could both estimate speed and target position for the EKFT in a continuous manner while retaining the mixture over target priors. We believe that the issues that we have addressed here are almost universal. Virtually all types of movements are executed at varying speed. A probabilistic distribution for a small number of action candidates may also be expected in most BMI applications – after all there are usually only a small number of actions that make sense in a given environment. While this work is presented in the context of decoding human reaching, it may be applied to a wide range of BMI applications including lower limb prosthetic devices and human computer interactions, as well as different signal sources such as electrode grid recordings and electroencephalograms. The increased user control in conveying their intended movements would significantly improve the functionality of a neuroprosthetic device. A c k n o w l e d g e me n t s T h e a u t h o r s t h a n k T. H a s w e l l , E . K r e p k o v i c h , a n d V. Ravichandran for assistance with experiments. This work was funded by the NSF Program in Cyber-Physical Systems. R e f e re n c e s [1] L.R. Hochberg, M.D. Serruya, G.M. Friehs, J.A. Mukand, M. Saleh, A.H. Caplan, A. Branner, D. 8 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Chen, R.D. Penn, and J.P. Donoghue, “Neuronal ensemble control of prosthetic devices by a human with tetraplegia,” Nature, vol. 442, 2006, pp. 164–171. W. Wu, Y. Gao, E. Bienenstock, J.P. Donoghue, and M.J. Black, “Bayesian population decoding of motor cortical activity using a Kalman filter,” Neural Computation, vol. 18, 2006, pp. 80–118. W. Wu, M.J. Black, Y. Gao, E. Bienenstock, M. Serruya, A. Shaikhouni, and J.P. Donoghue, “Neural decoding of cursor motion using a Kalman filter,” Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 2003, p. 133. R.E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of basic Engineering, vol. 82, 1960, pp. 35–45. G.G. Scandaroli, G.A. Borges, J.Y. Ishihara, M.H. Terra, A.F.D. Rocha, and F.A.D.O. Nascimento, “Estimation of foot orientation with respect to ground for an above knee robotic prosthesis,” Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems, St. Louis, MO, USA: IEEE Press, 2009, pp. 1112-1117. I. Cikajlo, Z. Matjačić, and T. Bajd, “Efficient FES triggering applying Kalman filter during sensory supported treadmill walking,” Journal of Medical Engineering & Technology, vol. 32, 2008, pp. 133144. S. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, “Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia,” Journal of Neural Engineering, vol. 5, 2008, pp. 455-476. L. Srinivasan, U.T. Eden, A.S. Willsky, and E.N. Brown, “A state-space analysis for reconstruction of goal-directed movements using neural signals,” Neural computation, vol. 18, 2006, pp. 2465–2494. G.H. Mulliken, S. Musallam, and R.A. Andersen, “Decoding trajectories from posterior parietal cortex ensembles,” Journal of Neuroscience, vol. 28, 2008, p. 12913. W. Wu, J.E. Kulkarni, N.G. Hatsopoulos, and L. Paninski, “Neural Decoding of Hand Motion Using a Linear State-Space Model With Hidden States,” IEEE Transactions on neural systems and rehabilitation engineering, vol. 17, 2009, p. 1. J.E. Kulkarni and L. Paninski, “State-space decoding of goal-directed movements,” IEEE Signal Processing Magazine, vol. 25, 2008, p. 78. C. Kemere and T. Meng, “Optimal estimation of feed-forward-controlled linear systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05), 2005. B.M. Yu, C. Kemere, G. Santhanam, A. Afshar, S.I. Ryu, T.H. Meng, M. Sahani, and K.V. Shenoy, “Mixture of trajectory models for neural decoding of goal-directed movements,” Journal of neurophysiology, vol. 97, 2007, p. 3763. N. Hatsopoulos, J. Joshi, and J.G. O'Leary, “Decoding continuous and discrete motor behaviors using motor and premotor cortical ensembles,” Journal of neurophysiology, vol. 92, 2004, p. 1165. R.S. Johansson, G. Westling, A. Backstrom, and J.R. Flanagan, “Eye-hand coordination in object manipulation,” Journal of Neuroscience, vol. 21, 2001, p. 6917. G. Wu, F.C. van der Helm, H.E.J. Veeger, M. Makhsous, P. Van Roy, C. Anglin, J. Nagels, A.R. Karduna, and K. McQuade, “ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion–Part II: shoulder, elbow, wrist and hand,” Journal of biomechanics, vol. 38, 2005, pp. 981–992. D. Simon, Optimal state estimation: Kalman, H [infinity] and nonlinear approaches, John Wiley and Sons, 2006. Z. Ghahramani and G.E. Hinton, “Parameter estimation for linear dynamical systems,” University of Toronto technical report CRG-TR-96-2, vol. 6, 1996. 9

6 0.45003697 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

7 0.42903879 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

8 0.41828266 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

9 0.41026872 257 nips-2010-Structured Determinantal Point Processes

10 0.4081693 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

11 0.39831859 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

12 0.3937389 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

13 0.39159232 215 nips-2010-Probabilistic Deterministic Infinite Automata

14 0.39077324 138 nips-2010-Large Margin Multi-Task Metric Learning

15 0.38815635 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

16 0.38755101 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

17 0.38709116 1 nips-2010-(RF)^2 -- Random Forest Random Field

18 0.38457507 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

19 0.38343963 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

20 0.3828578 50 nips-2010-Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.051), (17, 0.019), (27, 0.062), (30, 0.031), (45, 0.19), (50, 0.048), (52, 0.012), (53, 0.363), (60, 0.037), (77, 0.047), (78, 0.011), (90, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.73188275 226 nips-2010-Repeated Games against Budgeted Adversaries

Author: Jacob D. Abernethy, Manfred K. Warmuth

Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efﬁciently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1

same-paper 2 0.71753311 2 nips-2010-A Bayesian Approach to Concept Drift

Author: Stephen Bach, Mark Maloof

3 0.65821773 90 nips-2010-Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Author: Bo Thiesson, Chong Wang

Abstract: Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-speciﬁc data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is signiﬁcantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data. 1

4 0.65667146 208 nips-2010-Policy gradients in linearly-solvable MDPs

Author: Emanuel Todorov

Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the ﬁrst time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the ﬁrst compatible function approximators and natural policy gradients for continuous-time stochastic systems.

5 0.63009256 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

Author: Xinhua Zhang, Ankan Saha, S.v.n. Vishwanathan

Abstract: In a recent paper Joachims [1] presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an accurate solution in O(1/ 2 ) iterations. By tightening the analysis, Teo et al. [2] showed that O(1/ ) iterations sufﬁce. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a multivariate performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algo√ rithm that converges in O(1/ ) iterations. 1

6 0.54223025 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

7 0.54005653 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

8 0.53466785 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

9 0.53370076 38 nips-2010-Batch Bayesian Optimization via Simulation Matching

10 0.53064305 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

11 0.52865255 64 nips-2010-Distributionally Robust Markov Decision Processes

12 0.52399462 212 nips-2010-Predictive State Temporal Difference Learning

13 0.52255213 4 nips-2010-A Computational Decision Theory for Interactive Assistants

14 0.52218801 184 nips-2010-Nonparametric Bayesian Policy Priors for Reinforcement Learning

15 0.52189261 78 nips-2010-Error Propagation for Approximate Policy and Value Iteration

16 0.51926953 183 nips-2010-Non-Stochastic Bandit Slate Problems

17 0.51734573 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

18 0.51673514 269 nips-2010-Throttling Poisson Processes

19 0.5158987 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

20 0.51583844 50 nips-2010-Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories