jmlr jmlr2008 jmlr2008-53 knowledge-graph by maker-knowledge-mining

53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

Source: pdf

Author: Manu Chhabra, Robert A. Jacobs

Abstract: The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefﬁcients of a linear combination of sequences minimizing a cost function. From the perspective of numerical optimization, the GAR model is interesting because it creates a library of “local features”—each sequence in the library is a solution to a single training task—and learns to combine these sequences using a local optimization procedure, namely, additive regression. We speculate that learners with local representational primitives and local optimization procedures will show good performance on nonlinear tasks. The GAR model is also interesting from the perspective of motor control because it outperforms several competing models. Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. The library is also robust in the sense that, after an initial training period, nearly all novel movements can be learned as additive combinations of sequences in the library, and in the sense that it shows good generalization when an arm’s dynamics are altered between training and test conditions, such as when a payload is added to the arm. Lastly, the GAR model works well regardless of whether motor tasks are speciﬁed in joint space or Cartesian space. We conclude that learning techniques using local primitives and optimization procedures are viable and potentially important methods for motor control and possibly other domains, and that these techniques deserve further examination by the artiﬁcial intelligence and cognitive science

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Brain & Cognitive Sciences University of Rochester Rochester, NY 14627, USA Editor: Peter Dayan Abstract The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. [sent-6, score-1.205]

2 We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefﬁcients of a linear combination of sequences minimizing a cost function. [sent-7, score-0.676]

3 The GAR model is also interesting from the perspective of motor control because it outperforms several competing models. [sent-10, score-0.528]

4 Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. [sent-11, score-0.72]

5 Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. [sent-12, score-0.67]

6 Lastly, the GAR model works well regardless of whether motor tasks are speciﬁed in joint space or Cartesian space. [sent-14, score-0.57]

7 We conclude that learning techniques using local primitives and optimization procedures are viable and potentially important methods for motor control and possibly other domains, and that these techniques deserve further examination by the artiﬁcial intelligence and cognitive science communities. [sent-15, score-0.668]

8 Keywords: additive regression, motor primitives, sparse representations 1. [sent-16, score-0.572]

9 Consider, for example, an agent whose goal is to apply torques to each joint of a two-joint arm so that the endpoint of the arm moves from an initial location to a target location in 100 time steps. [sent-18, score-0.549]

10 A motor synergy is a dependency among the dimensions or parameters of a motor system. [sent-25, score-0.946]

11 For example, a coupling of the torques applied at the shoulder and elbow joints would be a motor synergy. [sent-26, score-0.63]

12 Motor synergies are useful because they reduce the number of parameters that must be independently controlled, thereby making motor control signiﬁcantly easier (Bernstein, 1967). [sent-27, score-0.629]

13 For our current purposes, we focus here on frameworks in which an agent with a library of motor synergies quickly learns to perform complex motor tasks by linearly combining its synergies. [sent-29, score-1.402]

14 For example, Mussa-Ivaldi, Giszter, and Bizzi (1994) identiﬁed frogs’ motor synergies by stimulating sites in the frogs’ spinal cords. [sent-31, score-0.594]

15 The idea of characterizing complex movements as linear combinations of motor synergies has also been inﬂuential in the ﬁelds of artiﬁcial intelligence and cognitive science. [sent-33, score-0.725]

16 Other researchers have speculated that motor primitives can be learned using dimensionality-reduction techniques. [sent-38, score-0.575]

17 1 The fact that novel motor tasks can often be performed by linearly combining motor synergies is a surprising result. [sent-42, score-1.166]

18 To see why this result is unexpected, consider the case in which an agent needs to control a two-joint arm to perform a motor task. [sent-43, score-0.748]

19 , desired angles for the shoulder and elbow joints at each time step of a movement), that a cost function is deﬁned as the sum of squared error between the desired and actual joint angles, and that the agent has a library of motor synergies where a synergy is a sequence of torques (i. [sent-46, score-1.19]

20 The agent attempts to perform the motor task by ﬁnding a set of coefﬁcients for a linear combination of synergies minimizing the cost function. [sent-49, score-0.701]

21 Our review focuses on frameworks in which complex movements are expressed as linear combinations of motor primitives. [sent-51, score-0.588]

22 There are, of course, frameworks that use motor primitives in other ways. [sent-52, score-0.575]

23 The Motor Basis Optimization Problem motivates the need to think about good ways of constructing a library of synergies, and good ways of learning to linearly combine the synergies to perform novel motor tasks. [sent-60, score-0.846]

24 In this paper, we propose a new learning model that learns a sparse and overcomplete representation of the space of potentially useful motor commands, and learns to linearly combine elements of this representation using a “greedy additive regression” procedure. [sent-61, score-0.693]

25 This paper introduces a new learning model for motor control referred to as the Greedy Additive Regression (GAR) model. [sent-71, score-0.528]

26 The GAR model maintains a library of torque sequences (i. [sent-72, score-0.65]

27 The PCA model learns a library of motor primitives using PCA, and ﬁnds coefﬁcients for linearly combining the primitives using gradient descent. [sent-83, score-0.99]

28 Moreover, the library is overcomplete and also sparse, meaning that only a small fraction of the stored torque sequences are used when learning a novel movement. [sent-86, score-0.698]

29 If, for example, an arm is suddenly required to carry a payload during testing, torque sequences in the library can still be additively combined to rapidly learn new movements with this altered arm. [sent-91, score-1.036]

30 We also demonstrate that the model works well regardless of whether motor tasks are speciﬁed in joint space or Cartesian 1537 C HHABRA AND JACOBS space. [sent-92, score-0.57]

31 Based on these results, we believe that the GAR model contains several desirable properties, including a library which maintains a sparse and overcomplete representation of the space of potentially useful motor commands, and an additive regression optimization procedure which is fast and robust. [sent-93, score-0.888]

32 Section 2 describes the two-joint arm that we simulated, and Section 3 describes the motor tasks that we used. [sent-95, score-0.732]

33 Motor Tasks A motor task is to apply torques to the arm so that it follows a desired trajectory deﬁned in joint space. [sent-107, score-0.946]

34 Given a desired joint-space trajectory, a motor task is to apply a time-varying torque to the arm so that the arm follows the desired trajectory. [sent-132, score-1.22]

35 The cost function corresponding to the motor task is the sum of 1539 C HHABRA AND JACOBS Figure 2: A schematic description of the Greedy Additive Regression(GAR) model. [sent-134, score-0.523]

36 An additive regression algorithm is then used to construct a torque sequence by linearly combining sequences from the library. [sent-136, score-0.607]

37 i The motor tasks deﬁned here are more complex than tasks often used in the literature in at least two respects. [sent-141, score-0.581]

38 First, the desired Cartesian-space trajectories used here are typically highly curved, as opposed to straight-line reaching movements which are commonly used in experimental and computational studies of motor control. [sent-142, score-0.622]

39 This model rapidly learns new motor tasks using a library of torque sequences. [sent-147, score-1.023]

40 When a new motor task arrives, the model ﬁrst checks whether a linear combination of sequences from the library achieves good performance on this task. [sent-149, score-0.924]

41 Intuitively, the model removes the torque sequence that has been least used during the motor tasks that it has performed. [sent-158, score-0.835]

42 Let n be an index over motor tasks, k be an index over sequences in the library, and |ρk (n)| be the absolute value of the linear coefﬁcient ρk (n) assigned to sequence k on task n by the additive regression algorithm. [sent-159, score-0.842]

43 To complete the description of the GAR model, we need to describe the additive regression algorithm for ﬁnding potentially good linear combinations of torque sequences from the library for a motor task. [sent-163, score-1.263]

44 At iteration t, the algorithm maintains an aggregate torque sequence F (t) to perform a motor task such that: F (t) = t ∑ ρj fj (2) j=1 where f j is a sequence in the library and ρ j is its corresponding coefﬁcient. [sent-166, score-1.063]

45 Each torque sequence in the library is associated with a trajectory of joint angles. [sent-172, score-0.624]

46 Our work differs from their work in many details because the domain of motor control forces us to confront the complexities inherent in learning to control a dynamical system (see also Tassa, Erez, and Smart, 2008). [sent-199, score-0.56]

47 In addition, the additive regression algorithm can be seen as performing gradient descent where the direction of the gradient at each iteration is projected onto the library sequence whose trajectory is maximally correlated with this gradient. [sent-205, score-0.597]

48 The PCA model performs dimensionality-reduction via PCA to learn a library of motor primitives. [sent-212, score-0.7]

49 When given a novel motor task, the PCA model learns to perform the 3. [sent-213, score-0.543]

50 task using a policy gradient optimization procedure (Sutton, McAllester, Singh, and Mansour, 1999; Williams 1992) to learn a set of coefﬁcients for linearly combining the motor primitives. [sent-219, score-0.64]

51 1 GAR versus PCA In the PCA model, the library of motor synergies was created as follows. [sent-222, score-0.82]

52 We ﬁrst generated 3000 motor tasks as described in Section 3, and then used feedback error learning to learn a torque sequence for each task. [sent-223, score-0.878]

53 To learn to perform a novel motor task from a test set, the PCA model searched for good linear combinations of the PCA sequences. [sent-230, score-0.581]

54 Its library of torque sequences was created by running the model on 3000 motor tasks. [sent-236, score-1.142]

55 To learn to perform a novel motor task from a test set, the GAR model learned to linearly combine the GAR sequences using the additive regression algorithm described above. [sent-239, score-0.866]

56 1 0 GAR AR PCA PG PCA AR GAR PG Figure 3: Average root mean squared errors of four systems on a test set of 100 novel motor tasks (the error bars show the standard errors of the means). [sent-242, score-0.555]

57 The four systems use: (i) GAR sequences with additive regression (GAR model); (ii) PCA sequences with policy gradient (PCA model); (iii) PCA sequences with additive regression; and (iv) GAR sequences with policy gradient. [sent-243, score-1.086]

58 The PCA model combines PCA sequences with policy gradient, whereas the GAR model combines GAR sequences with additive regression. [sent-245, score-0.546]

59 For the sake of completeness, we also studied the remaining two combinations, namely, the combination of PCA sequences with additive regression and the combination of GAR sequences with policy gradient. [sent-246, score-0.577]

60 The vertical axis gives a system’s average root mean squared error (RMSE where the error is between the desired and actual joint angles) on a test set of 100 novel motor tasks. [sent-249, score-0.606]

61 Local optimization methods have been shown to be effective in motor control in previous research. [sent-281, score-0.53]

62 2 Visualizing Torque Sequences The library of a GAR model is created on the basis of a wide variety of motor tasks. [sent-296, score-0.719]

63 The torque sequences in the library should, therefore, be “representative” of the tasks they encode. [sent-297, score-0.684]

64 We then ordered the sequences in the library by the percent of the model’s total activation that a sequence accounted for. [sent-300, score-0.521]

65 This graph illustrates that, even though the sequences are added to the library in an arbitrary order, the important sequences that remain in the library are representative of the motor tasks. [sent-307, score-1.239]

66 If the size, denoted K, is too small, then torque sequences that are often useful for learning novel motor tasks might be removed. [sent-311, score-0.978]

67 4 Figure 5: Cartesian-space trajectories generated by the three torque sequences that accounted for the largest percent of a GAR model’s total activation. [sent-326, score-0.53]

68 These trajectories were generated by initializing the shoulder and elbow joint angles of the arm to π/4 and π/2 respectively, and then applying the sequences to the arm with coefﬁcients of 1 and -1. [sent-327, score-0.743]

69 In Figure 6, the horizontal axis shows the number of motor tasks, and the vertical axis shows the percent of tasks in which a version of the GAR model needed to learn a torque sequence via feedback error learning. [sent-331, score-1.036]

70 The motor tasks were divided into 60 blocks of 50 trials each. [sent-333, score-0.527]

71 As training progresses, the library has many more useful sequences, and most novel motor tasks can be performed by linearly combining sequences from the library. [sent-337, score-0.955]

72 A comparison of the versions with different library sizes shows that the version with a library size of 50 used feedback error learning more often than versions with library sizes of 100 or 200. [sent-339, score-0.684]

73 The sequences in a library are ordered according to the percent of a model’s total activation that a sequence accounted for. [sent-342, score-0.521]

74 The libraries for the GAR model (with a library of size 100) and the PCA model were created as described above with an arm that did not carry a payload. [sent-368, score-0.519]

75 Test trials were conducted as described above; that is, for each test task, a linear combination of torque sequences in a library was found via the additive regression algorithm for the GAR model, and via policy gradient for the PCA model. [sent-370, score-0.88]

76 The sequences in a library are ordered according to the percent of a model’s total activation that a sequence accounted for. [sent-375, score-0.521]

77 combinations of libraries with optimization algorithms, namely, the GAR sequences with policy gradient, and the PCA sequences with additive regression. [sent-377, score-0.61]

78 It clearly demonstrates that the GAR model develops a useful library of torque sequences, and that the additive regression algorithm is a powerful optimization procedure for ﬁnding good linear combinations, even under test conditions that are very different from training conditions. [sent-385, score-0.622]

79 If this is the case, then additive regression should work well for tasks with novel payloads, even when using a library of GAR sequences constructed from zeropayload trials. [sent-389, score-0.591]

80 We ﬁrst generated a library of 100 GAR sequences using a training set of 3000 tasks where the simulated arm did not contain a payload. [sent-390, score-0.642]

81 5 Motor Tasks Speciﬁed in Cartesian Space In this subsection, we consider learning sequences for motor tasks when the desired trajectories are speciﬁed in Cartesian space instead of joint space. [sent-408, score-0.794]

82 2 0 0 10 20 30 Number of iterations 40 Figure 9: Results when motor tasks were speciﬁed in Cartesian space. [sent-416, score-0.527]

83 The GAR model was trained using 3000 motor tasks with a library of size 100. [sent-419, score-0.754]

84 02 was used to determine if a linear combination of torque sequences from the library provided a “good” aggregate sequence for a task (note that this is different from a threshold of ε = 0. [sent-422, score-0.747]

85 This outcome indicates that the GAR model has wide applicability in the sense that it is effective regardless of whether motor tasks are speciﬁed in joint space or Cartesian space. [sent-428, score-0.57]

86 Using these optimal sequences as data items, we created a library of PCA sequences by extracting the top thirty principal components based on these data items. [sent-464, score-0.578]

87 On average, the learner using GAR sequences and additive regression made 49 calls, the learner using PCA sequences and policy gradient made 3879 calls, the learner using PCA sequences and additive regression made 62 calls, and the learner using GAR sequences and policy gradient made 3422 calls. [sent-476, score-1.236]

88 The horizontal axis shows the learning system: (i) Optimal: optimal control signals; (ii) GAR+AR: GAR sequences with additive regression; (iii) PCA+PG: PCA sequences with policy gradient; (iv) PCA+AR: PCA sequences with additive regression; and (v) GAR+PG: GAR sequences with policy gradient. [sent-482, score-1.096]

89 Conclusions In summary, the computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. [sent-484, score-1.205]

90 We presented a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefﬁcients of a linear combination of library sequences minimizing a cost function. [sent-485, score-0.883]

91 Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. [sent-486, score-0.72]

92 Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. [sent-487, score-0.67]

93 Additionally, we showed that the GAR model works well regardless of whether motor tasks are speciﬁed in joint space or Cartesian space. [sent-489, score-0.57]

94 The GAR model uses a library of local features—each sequence in its library is a solution to a single task from the training set—and a local optimization procedure, namely, additive regression. [sent-492, score-0.622]

95 In contrast, the PCA model uses a library of global features—each item in its library reﬂects properties of all tasks in the training set—and policy gradient which is a global optimization procedure because it seeks good combinations of all items in its library. [sent-493, score-0.646]

96 Future research will need to focus on the implications of the model for our understanding of motor control in biological organisms, the theoretical foundations of the model, and further empirical evaluations. [sent-500, score-0.528]

97 In regard to our understanding of biological motor control, it would be interesting to know whether sets of motor synergies used by biological organisms are sparse and overcomplete as suggested by the GAR model, or are full-distributed and non-redundant as suggested by the PCA model. [sent-501, score-1.127]

98 In regard to empirical evaluations, future research will need to evaluate the GAR model with larger and more complex motor systems and motor tasks. [sent-506, score-0.966]

99 Properties of synergies arising from a theory of optimal motor behavior. [sent-550, score-0.594]

100 Combinations of muscle synergies in the construction of a natural motor behavior. [sent-560, score-0.594]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gar', 0.633), ('motor', 0.473), ('torque', 0.247), ('library', 0.207), ('arm', 0.205), ('sequences', 0.176), ('pca', 0.134), ('synergies', 0.121), ('trajectory', 0.106), ('primitives', 0.102), ('additive', 0.099), ('payload', 0.098), ('movements', 0.081), ('torques', 0.081), ('hhabra', 0.069), ('ombine', 0.063), ('otor', 0.063), ('rimitives', 0.063), ('feedback', 0.063), ('jacobs', 0.061), ('policy', 0.055), ('tasks', 0.054), ('libraries', 0.048), ('gradient', 0.047), ('cartesian', 0.045), ('angles', 0.045), ('sequence', 0.041), ('overcomplete', 0.04), ('percent', 0.039), ('trajectories', 0.036), ('pg', 0.036), ('control', 0.035), ('agent', 0.035), ('feedforward', 0.035), ('lacker', 0.035), ('perkins', 0.035), ('rochester', 0.035), ('viola', 0.034), ('combinations', 0.034), ('desired', 0.032), ('accounted', 0.032), ('axis', 0.032), ('ar', 0.03), ('representational', 0.029), ('payloads', 0.029), ('shoulder', 0.029), ('theiler', 0.029), ('todorov', 0.029), ('movement', 0.028), ('controller', 0.028), ('rmse', 0.028), ('aggregate', 0.028), ('novel', 0.028), ('regression', 0.027), ('activation', 0.026), ('task', 0.026), ('kg', 0.026), ('ft', 0.025), ('elbow', 0.024), ('atkeson', 0.024), ('coef', 0.024), ('cost', 0.024), ('descent', 0.023), ('joint', 0.023), ('chhabra', 0.023), ('hodgins', 0.023), ('joints', 0.023), ('matari', 0.023), ('sticks', 0.023), ('optimization', 0.022), ('combination', 0.022), ('altered', 0.022), ('learns', 0.022), ('performances', 0.022), ('earning', 0.021), ('dynamics', 0.021), ('greedy', 0.021), ('cients', 0.021), ('model', 0.02), ('jones', 0.02), ('procedures', 0.02), ('organisms', 0.02), ('velocities', 0.02), ('robotic', 0.02), ('learner', 0.019), ('created', 0.019), ('vertical', 0.018), ('ms', 0.017), ('bar', 0.017), ('ry', 0.017), ('horizontal', 0.017), ('accelerations', 0.017), ('ameliorated', 0.017), ('kgms', 0.017), ('resting', 0.017), ('system', 0.017), ('linearly', 0.017), ('cognitive', 0.016), ('signals', 0.016), ('robotics', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

Author: Manu Chhabra, Robert A. Jacobs

2 0.062178418 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

Author: Mikio L. Braun, Joachim M. Buhmann, Klaus-Robert Müller

Abstract: We show that the relevant information of a supervised learning problem is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem in the sense that it can asymptotically represent the function to be learned and is sufﬁciently smooth. Thus, kernels do not only transform data sets such that good generalization can be achieved using only linear discriminant functions, but this transformation is also performed in a manner which makes economical use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data for supervised learning problems. Practically, we propose an algorithm which enables us to recover the number of leading kernel PCA components relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to aid in model selection, and (3) to denoise in feature space in order to yield better classiﬁcation results. Keywords: kernel methods, feature space, dimension reduction, effective dimensionality

3 0.046183687 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data

Author: Konrad Rieck, Pavel Laskov

Abstract: Efﬁcient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and non-metric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, k-grams or all contiguous subsequences. As realizations of the framework we provide linear-time algorithms of different complexity and capabilities using sorted arrays, tries and sufﬁx trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efﬁciency of the proposed algorithms—enabling peak performances of up to 106 pairwise comparisons per second. The utility of distances and non-metric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA. Keywords: string kernels, string distances, learning with sequential data

4 0.043255851 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

Author: Matthias W. Seeger

Abstract: We propose a highly efﬁcient framework for penalized likelihood kernel methods applied to multiclass models with a large, structured set of classes. As opposed to many previous approaches which try to decompose the ﬁtting problem into many smaller ones, we focus on a Newton optimization of the complete model, making use of model structure and linear conjugate gradients in order to approximate Newton search directions. Crucially, our learning method is based entirely on matrix-vector multiplication primitives with the kernel matrices and their derivatives, allowing straightforward specialization to new kernels, and focusing code optimization efforts to these primitives only. Kernel parameters are learned automatically, by maximizing the cross-validation log likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate our approach on large scale text classiﬁcation tasks with hierarchical structure on thousands of classes, achieving state-of-the-art results in an order of magnitude less time than previous work. Parts of this work appeared in the conference paper Seeger (2007). Keywords: multi-way classiﬁcation, kernel logistic regression, hierarchical classiﬁcation, cross validation optimization, Newton-Raphson optimization

5 0.04025906 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

Author: Manfred K. Warmuth, Dima Kuzmin

Abstract: We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, that is, the total expected quadratic compression loss of the online algorithm minus the total quadratic compression loss of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic. We ﬁrst develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2 ) per trial, where n is the dimension of the instances. Keywords: principal component analysis, online learning, density matrix, expert setting, quantum Bayes rule

6 0.036010809 56 jmlr-2008-Magic Moments for Structured Output Prediction

7 0.03566891 85 jmlr-2008-Shark (Machine Learning Open Source Software Paper)

8 0.035008579 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

9 0.03234525 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

10 0.0315619 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

11 0.029850235 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

12 0.028968874 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

13 0.028589454 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis

14 0.028499674 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

15 0.026429251 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

16 0.024078039 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

17 0.02306775 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

18 0.021185938 96 jmlr-2008-Visualizing Data using t-SNE

19 0.020321971 65 jmlr-2008-Multi-Agent Reinforcement Learning in Common Interest and Fixed Sum Stochastic Games: An Experimental Study

20 0.019739315 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.107), (1, -0.038), (2, 0.001), (3, 0.013), (4, -0.061), (5, -0.068), (6, -0.049), (7, 0.054), (8, -0.06), (9, 0.009), (10, -0.084), (11, 0.069), (12, 0.084), (13, -0.015), (14, 0.015), (15, 0.091), (16, 0.043), (17, 0.014), (18, 0.036), (19, -0.187), (20, 0.051), (21, 0.227), (22, -0.205), (23, -0.036), (24, 0.083), (25, 0.197), (26, -0.051), (27, -0.067), (28, -0.09), (29, 0.081), (30, 0.23), (31, -0.003), (32, -0.155), (33, -0.043), (34, 0.132), (35, -0.147), (36, -0.081), (37, -0.107), (38, -0.252), (39, -0.229), (40, -0.253), (41, -0.207), (42, -0.033), (43, -0.171), (44, 0.04), (45, -0.064), (46, 0.192), (47, 0.212), (48, -0.281), (49, 0.087)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97201407 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

Author: Manu Chhabra, Robert A. Jacobs

2 0.20597616 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data

Author: Konrad Rieck, Pavel Laskov

3 0.20189779 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

Author: Tianjiao Chu, Clark Glymour

Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efﬁciently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model speciﬁcation procedure is able to extract relatively detailed causal information. We investigate the ﬁnite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices

4 0.19733736 2 jmlr-2008-A Library for Locally Weighted Projection Regression (Machine Learning Open Source Software Paper)

Author: Stefan Klanke, Sethu Vijayakumar, Stefan Schaal

Abstract: In this paper we introduce an improved implementation of locally weighted projection regression (LWPR), a supervised learning algorithm that is capable of handling high-dimensional input data. As the key features, our code supports multi-threading, is available for multiple platforms, and provides wrappers for several programming languages. Keywords: regression, local learning, online learning, C, C++, Matlab, Octave, Python

5 0.16568978 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

Author: Mikio L. Braun, Joachim M. Buhmann, Klaus-Robert Müller

6 0.15987168 56 jmlr-2008-Magic Moments for Structured Output Prediction

7 0.1597582 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

8 0.13208684 94 jmlr-2008-Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

9 0.12703541 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

10 0.12611437 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

11 0.1255717 85 jmlr-2008-Shark (Machine Learning Open Source Software Paper)

12 0.10952801 95 jmlr-2008-Value Function Based Reinforcement Learning in Changing Markovian Environments

13 0.10844859 78 jmlr-2008-Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

14 0.10671616 46 jmlr-2008-LIBLINEAR: A Library for Large Linear Classification (Machine Learning Open Source Software Paper)

15 0.095282927 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

16 0.092062756 75 jmlr-2008-Optimal Solutions for Sparse Principal Component Analysis

17 0.090265073 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

18 0.083713993 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

19 0.082769096 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses

20 0.077793054 32 jmlr-2008-Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.017), (5, 0.022), (40, 0.039), (54, 0.055), (58, 0.032), (66, 0.031), (76, 0.015), (78, 0.017), (88, 0.109), (92, 0.032), (94, 0.068), (95, 0.392), (99, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76531976 53 jmlr-2008-Learning to Combine Motor Primitives Via Greedy Additive Regression

Author: Manu Chhabra, Robert A. Jacobs

2 0.38540843 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming

Author: Ashwin Srinivasan, Ross D. King

Abstract: The use of computational models is increasingly expected to play an important role in predicting the behaviour of biological systems. Models are being sought at different scales of biological organisation namely: sub-cellular, cellular, tissue, organ, organism and ecosystem; with a view of identifying how different components are connected together, how they are controlled and how they behave when functioning as a system. Except for very simple biological processes, system identiﬁcation from ﬁrst principles can be extremely difﬁcult. This has brought into focus automated techniques for constructing models using data of system behaviour. Such techniques face three principal issues: (1) The model representation language must be rich enough to capture system behaviour; (2) The system identiﬁcation technique must be powerful enough to identify substantially complex models; and (3) There may not be sufﬁcient data to obtain both the model’s structure and precise estimates of all of its parameters. In this paper, we address these issues in the following ways: (1) Models are represented in an expressive subset of ﬁrst-order logic. Speciﬁcally, they are expressed as logic programs; (2) System identiﬁcation is done using techniques developed in Inductive Logic Programming (ILP). This allows the identiﬁcation of ﬁrst-order logic models from data. Speciﬁcally, we employ an incremental approach in which increasingly complex models are constructed from simpler ones using snapshots of system behaviour; and (3) We restrict ourselves to “qualitative” models. These are non-parametric: thus, usually less data are required than for identifying parametric quantitative models. A further advantage is that the data need not be precise numerical observations (instead, they are abstractions like positive, negative, zero, increasing, decreasing and so on). We describe incremental construction of qualitative models using a simple physical system and demonstrate its application to identiﬁcatio

3 0.355432 96 jmlr-2008-Visualizing Data using t-SNE

Author: Laurens van der Maaten, Geoffrey Hinton

Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces signiﬁcantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to inﬂuence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are signiﬁcantly better than those produced by the other techniques on almost all of the data sets. Keywords: visualization, dimensionality reduction, manifold learning, embedding algorithms, multidimensional scaling

4 0.34396198 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

Author: Hsuan-Tien Lin, Ling Li

Abstract: Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a ﬁnite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classiﬁer with a larger or even an inﬁnite number of hypotheses. In addition, constructing an inﬁnite ensemble itself is a challenging task. In this paper, we formulate an inﬁnite ensemble learning framework based on the support vector machine (SVM). The framework can output an inﬁnite and nonsparse ensemble through embedding inﬁnitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies inﬁnitely many decision stumps, and the perceptron kernel embodies inﬁnitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies inﬁnitely many decision trees, and can thus be explained through inﬁnite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the beneﬁt of faster parameter selection. These properties make the novel kernels favorable choices in practice. Keywords: ensemble learning, boosting, support vector machine, kernel

5 0.3361212 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

Author: Matthias W. Seeger

Abstract: The linear model with sparsity-favouring prior on the coefﬁcients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identiﬁcation from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in signiﬁcant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identiﬁcation application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing

6 0.33610725 9 jmlr-2008-Active Learning by Spherical Subdivision

7 0.3341482 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

8 0.33168441 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

9 0.3310051 57 jmlr-2008-Manifold Learning: The Price of Normalization

10 0.33057806 86 jmlr-2008-SimpleMKL

11 0.32969052 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine

12 0.3295126 74 jmlr-2008-Online Learning of Complex Prediction Problems Using Simultaneous Projections

13 0.32887286 58 jmlr-2008-Max-margin Classification of Data with Absent Features

14 0.32848179 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

15 0.32587904 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

16 0.32483515 66 jmlr-2008-Multi-class Discriminant Kernel Learning via Convex Programming (Special Topic on Model Selection)

17 0.32307392 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

18 0.32269931 83 jmlr-2008-Robust Submodular Observation Selection

19 0.32059661 56 jmlr-2008-Magic Moments for Structured Output Prediction

20 0.31901422 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers