jmlr jmlr2013 jmlr2013-1 knowledge-graph by maker-knowledge-mining

1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

Source: pdf

Author: Hervé Frezza-Buet, Matthieu Geist

Abstract: This paper introduces the rllib as an original C++ template-based library oriented toward value function estimation. Generic programming is promoted here as a way of having a good ﬁt between the mathematics of reinforcement learning and their implementation in a library. The main concepts of rllib are presented, as well as a short example. Keywords: reinforcement learning, C++, generic programming 1. C++ Genericity for Fitting the Mathematics of Reinforcement Learning Reinforcement learning (RL) is a ﬁeld of machine learning that beneﬁts from a rigorous mathematical formalism, as shown for example by Bertsekas (1995). Although this formalism is well accepted in the ﬁeld, its translation into efﬁcient computer science tools has surprisingly not led to any standard yet, as mentioned by Kovacs and Egginton (2011). The claim of this paper is that genericity enables a natural expression of the mathematics of RL. The rllib (2011) library implements this idea in the C++ language, where genericity relies on templates. Templates automate the re-writing of some generic code involving user types, offering a strong type checking at compile time that improves the code safety. Using the rllib templates requires that the user-deﬁned types ﬁt some documented concepts. For example, some class C deﬁning an agent should be designed so that C::state type is the type used for the states, C::action type is the type used for the actions, and the method C::action type policy(const C::state type& s) const is implemented, in order to compute the action to be performed in a given state. This concept deﬁnition speciﬁes what is required for an agent mathematically. Note that C does not need to inherit from any kind of abstract rl::Agent class to be used by the rllib tools. It can be directly provided as a type argument to any rllib template requiring an argument ﬁtting the concept of an agent, so that the re-written code actually compiles. 2. A Short Example Let us consider the following toy-example. The state space contains values from 0 to 9, and actions consist in increasing or decreasing the value. When the value gets out of bounds, a reward is returned ∗. Also at UMI 2958 Georgia Tech / CNRS, 2-3, rue Marconi, 57070 Metz, France. c 2013 Herv´ Frezza-Buet and Matthieu Geist. e F REZZA -B UET AND G EIST (-1 for bound 0, 1 for bound 9). Otherwise, a null reward is returned. Let us deﬁne this problem and run Sarsa. First, a simulator class ﬁtting the concept Simulator described in the documentation is needed. c l a s s Sim { / / Our s i m u l a t o r c l a s s . No i n h e r i t a n c e r e q u i r e d . private : i n t c u r r e n t ; double r ; public : typedef int phase type ; typedef int observation type ; t y p e d e f enum { up , down} a c t i o n t y p e ; t y p e d e f d o u b l e r e w a r d t y p e ; Sim ( v o i d ) : c u r r e n t ( 0 ) , r ( 0 ) {} v o i d s e t P h a s e ( c o n s t p h a s e t y p e &s; ) { c u r r e n t = s %10;} c o n s t o b s e r v a t i o n t y p e& s e n s e ( v o i d ) c o n s t { r e t u r n c u r r e n t ; } reward type reward ( void ) const { return r ;} v o i d t i m e S t e p ( c o n s t a c t i o n t y p e &a; ) { i f ( a == up ) c u r r e n t ++; e l s e c u r r e n t −−; i f ( c u r r e n t < 0 ) r =−1; e l s e i f ( c u r r e n t > 9 ) r = 1 ; e l s e r = 0 ; i f ( r ! = 0 ) throw r l : : e x c e p t i o n : : T e r m i n a l ( ” Out o f r a n g e ” ) ; } }; Following the concept requirements, the class Sim naturally implements a sensor method sense that provides an observation from the current phase of the controlled dynamical system, and a method timeStep that computes a transition consecutive to some action. Note the use of exceptions for terminal states. For the sake of simplicity in further code, the following is added. typedef typedef typedef typedef typedef Sim : : p h a s e t y p e Sim : : a c t i o n t y p e r l : : I t e r a t o r t y p e d e f r l : : a g e n t : : o n l i n e : : E p s i l o n G r e e d y Critic ; ArgmaxCritic ; TestAgent ; LearnAgent ; The rllib expresses that Sarsa provides a critic, offering a Q-function. As actions are discrete, the best action (i.e., argmaxa∈A Q(s, a)) can be found by considering all the actions sequentially. This is what ArgmaxCritic offers thanks to the action enumerator Aenum, in order to deﬁne greedy and ε-greedy agents. The main function then only consists in running episodes with the appropriate agents. i n t main ( i n t a r g c , char ∗ a r g v [ ] ) { Sim simulator ; Transition transition ; ArgmaxCritic c r i t i c ; LearnAgent learner ( critic ); TestAgent tester ( critic ); A a; S s; int episode , length , s t e p =0; // // // // // // // T h i s i s what t h e a g e n t c o n t r o l s . T h i s i s some s , a , r , s ’ , a ’ d a t a . T h i s c o m p u t e s Q and argmax a Q( s , a ) . SARSA u s e s t h i s a g e n t t o l e a r n t h e p o l i c y . This behaves according to the c r i t i c . Some a c t i o n . Some s t a t e . f o r ( e p i s o d e = 0 ; e p i s o d e < 1 0 0 0 0 ; ++ e p i s o d e ) { / / L e a r n i n g p h a s e s i m u l a t o r . setPhase ( rand ()%10); r l : : episode : : sa : : r u n a n d l e a r n ( simulator , l e a r n e r , t r a n s i t i o n , 0 , length ) ; } try { / / T e s t phase simulator . setPhase (0); while ( true ) { s = simulator . sense ( ) ; a = t e s t e r . policy ( s ) ; s t e p ++; s i m u l a t o r . t i m e S t e p ( a ) ; } } c a t c h ( r l : : e x c e p t i o n : : T e r m i n a l e ) { s t d : : c o u t << s t e p << ” s t e p s . ” << s t d : : e n d l ; } return 0 ; / / t h e message p r i n t e d i s ‘ ‘10 s t e p s . ’ ’ } 3. Features of the Library Using the library requires to deﬁne the features that are speciﬁc to the problem (the simulator and the Q-function architecture in our example) from scratch, but with the help of concepts. Then, the speciﬁc features can be handled by generic code provided by the library to implement RL techniques with value function estimation. 627 F REZZA -B UET AND G EIST Currently, Q-learing, Sarsa, KTD-Q, LSTD, and policy iteration are available, as well as a multi-layer perceptron architecture. Moreover, some benchmark problems (i.e., simulators) are also provided: the mountain car, the cliff walking, the inverted pendulum and the Boyan chain. Extending the library with new algorithms is allowed, since it consists in deﬁning new templates. This is a bit more technical than only using the existing algorithms, but the structure of existing concepts helps, since it reﬂects the mathematics of RL. For example, concepts like Feature, for linear approaches mainly (i.e., Q(s, a) = θT ϕ(s, a)) and Architecture (i.e., Q(s, a) = fθ (s, a) for more general approximation) orient the design toward functional approaches of RL. The algorithms implemented so far rely on the GNU Scientiﬁc Library (see GSL, 2011) for linear algebra computation, so the GPL licence of GSL propagates to the rllib. 4. Conclusion The rllib relies only on the C++ standard and the availability of the GSL on the system. It offers state-action function approximation tools for applying RL to real problems, as well as a design that ﬁts the mathematics. The latter allows for extensions, but is also compliant with pedagogical purpose. The design of the rllib aims at allowing the user to build (using C++ programming) its own experiment, using several algorithms, several agents, on-line or batch learning, and so on. Actually, the difﬁcult part of RL is the algorithms themselves, not the script-like part of the experiment where things are put together (see the main function in our example). With a framework, in the sense of Kovacs and Egginton (2011), the experiment is not directly accessible to the user programs, since it is handled by some libraries in order to offer graphical interface or analyzing tools. The user code is then called by the framework when required. We advocate that allowing the user to call the rllib functionality at his/her convenience provides an open and extensible access to RL for students, researchers and engineers. Last, the rllib ﬁts the requirements expressed by Kovacs and Egginton (2011, Section 4.3): support of good scientiﬁc research, formulation compliant with the domain, allowing for any kind of agents and any kind of approximators, interoperability of components (the Q function of the example can be used for different algorithms and agents), maximization of run-time speed (use of C++ and templates that inline massively the code), open source, etc. Extensions of rllib can be considered, for example for handling POMDPs, and contributions of users are expected. The use of templates is unfortunately unfamiliar to many programmers, but the effort is worth it, since it brings the code at the level of the mathematical formalism, increasing readability (by a rational use of typedefs) and reducing bugs. Even if the approach is dramatically different from existing frameworks, wrappings with frameworks can be considered in further development. References Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientiﬁc, 3rd (20052007) edition, 1995. GSL, 2011. http://http://www.gnu.org/software/gsl. Tim Kovacs and Robert Egginton. On the analysis and design of software for reinforcement learning, with a survey of existing systems. Machine Learning, 84:7–49, 2011. rllib, 2011. http://ims.metz.supelec.fr/spip.php?article122. 628

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 FR Sup´ lec e ´ 2 rue Edouard Belin 57070 Metz, France Editor: Mikio Braun Abstract This paper introduces the rllib as an original C++ template-based library oriented toward value function estimation. [sent-5, score-0.75]

2 Generic programming is promoted here as a way of having a good ﬁt between the mathematics of reinforcement learning and their implementation in a library. [sent-6, score-0.243]

3 The main concepts of rllib are presented, as well as a short example. [sent-7, score-0.607]

4 Although this formalism is well accepted in the ﬁeld, its translation into efﬁcient computer science tools has surprisingly not led to any standard yet, as mentioned by Kovacs and Egginton (2011). [sent-10, score-0.091]

5 The claim of this paper is that genericity enables a natural expression of the mathematics of RL. [sent-11, score-0.165]

6 The rllib (2011) library implements this idea in the C++ language, where genericity relies on templates. [sent-12, score-0.834]

7 Templates automate the re-writing of some generic code involving user types, offering a strong type checking at compile time that improves the code safety. [sent-13, score-0.436]

8 Using the rllib templates requires that the user-deﬁned types ﬁt some documented concepts. [sent-14, score-0.715]

9 This concept deﬁnition speciﬁes what is required for an agent mathematically. [sent-16, score-0.162]

10 Note that C does not need to inherit from any kind of abstract rl::Agent class to be used by the rllib tools. [sent-17, score-0.633]

11 It can be directly provided as a type argument to any rllib template requiring an argument ﬁtting the concept of an agent, so that the re-written code actually compiles. [sent-18, score-0.741]

12 The state space contains values from 0 to 9, and actions consist in increasing or decreasing the value. [sent-21, score-0.09]

13 When the value gets out of bounds, a reward is returned ∗. [sent-22, score-0.108]

14 Also at UMI 2958 Georgia Tech / CNRS, 2-3, rue Marconi, 57070 Metz, France. [sent-23, score-0.045]

15 First, a simulator class ﬁtting the concept Simulator described in the documentation is needed. [sent-28, score-0.283]

16 , argmaxa∈A Q(s, a)) can be found by considering all the actions sequentially. [sent-38, score-0.09]

17 This is what ArgmaxCritic offers thanks to the action enumerator Aenum, in order to deﬁne greedy and ε-greedy agents. [sent-39, score-0.107]

18 i n t main ( i n t a r g c , char ∗ a r g v [ ] ) { Sim simulator ; Transition transition ; ArgmaxCritic c r i t i c ; LearnAgent learner ( critic ); TestAgent tester ( critic ); A a; S s; int episode , length , s t e p =0; // // // // // // // T h i s i s what t h e a g e n t c o n t r o l s . [sent-41, score-0.787]

19 setPhase ( rand ()%10); r l : : episode : : sa : : r u n a n d l e a r n ( simulator , l e a r n e r , t r a n s i t i o n , 0 , length ) ; } try { / / T e s t phase simulator . [sent-49, score-0.612]

20 Features of the Library Using the library requires to deﬁne the features that are speciﬁc to the problem (the simulator and the Q-function architecture in our example) from scratch, but with the help of concepts. [sent-56, score-0.361]

21 Then, the speciﬁc features can be handled by generic code provided by the library to implement RL techniques with value function estimation. [sent-57, score-0.287]

22 627 F REZZA -B UET AND G EIST Currently, Q-learing, Sarsa, KTD-Q, LSTD, and policy iteration are available, as well as a multi-layer perceptron architecture. [sent-58, score-0.067]

23 Extending the library with new algorithms is allowed, since it consists in deﬁning new templates. [sent-62, score-0.108]

24 This is a bit more technical than only using the existing algorithms, but the structure of existing concepts helps, since it reﬂects the mathematics of RL. [sent-63, score-0.083]

25 For example, concepts like Feature, for linear approaches mainly (i. [sent-64, score-0.047]

26 , Q(s, a) = fθ (s, a) for more general approximation) orient the design toward functional approaches of RL. [sent-68, score-0.096]

27 The algorithms implemented so far rely on the GNU Scientiﬁc Library (see GSL, 2011) for linear algebra computation, so the GPL licence of GSL propagates to the rllib. [sent-69, score-0.029]

28 Conclusion The rllib relies only on the C++ standard and the availability of the GSL on the system. [sent-71, score-0.56]

29 It offers state-action function approximation tools for applying RL to real problems, as well as a design that ﬁts the mathematics. [sent-72, score-0.052]

30 The latter allows for extensions, but is also compliant with pedagogical purpose. [sent-73, score-0.086]

31 The design of the rllib aims at allowing the user to build (using C++ programming) its own experiment, using several algorithms, several agents, on-line or batch learning, and so on. [sent-74, score-0.66]

32 With a framework, in the sense of Kovacs and Egginton (2011), the experiment is not directly accessible to the user programs, since it is handled by some libraries in order to offer graphical interface or analyzing tools. [sent-76, score-0.089]

33 The user code is then called by the framework when required. [sent-77, score-0.131]

34 We advocate that allowing the user to call the rllib functionality at his/her convenience provides an open and extensible access to RL for students, researchers and engineers. [sent-78, score-0.694]

35 Last, the rllib ﬁts the requirements expressed by Kovacs and Egginton (2011, Section 4. [sent-79, score-0.591]

36 Extensions of rllib can be considered, for example for handling POMDPs, and contributions of users are expected. [sent-81, score-0.56]

37 The use of templates is unfortunately unfamiliar to many programmers, but the effort is worth it, since it brings the code at the level of the mathematical formalism, increasing readability (by a rational use of typedefs) and reducing bugs. [sent-82, score-0.271]

38 Even if the approach is dramatically different from existing frameworks, wrappings with frameworks can be considered in further development. [sent-83, score-0.049]

39 On the analysis and design of software for reinforcement learning, with a survey of existing systems. [sent-93, score-0.166]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rllib', 0.56), ('typedef', 0.301), ('simulator', 0.214), ('critic', 0.172), ('kovacs', 0.172), ('sim', 0.172), ('gsl', 0.147), ('rl', 0.146), ('reinforcement', 0.14), ('argmaxcritic', 0.129), ('egginton', 0.129), ('eist', 0.129), ('genericity', 0.129), ('rezza', 0.129), ('sarsa', 0.129), ('uet', 0.129), ('agent', 0.122), ('templates', 0.122), ('reward', 0.108), ('library', 0.108), ('agents', 0.092), ('actions', 0.09), ('code', 0.087), ('compliant', 0.086), ('episode', 0.086), ('learnagent', 0.086), ('metz', 0.086), ('setphase', 0.086), ('supelec', 0.086), ('testagent', 0.086), ('const', 0.086), ('action', 0.081), ('herv', 0.074), ('matthieu', 0.074), ('policy', 0.067), ('formalism', 0.065), ('int', 0.065), ('fitting', 0.061), ('type', 0.054), ('frameworks', 0.049), ('concepts', 0.047), ('generic', 0.047), ('kind', 0.046), ('handled', 0.045), ('rue', 0.045), ('user', 0.044), ('phase', 0.044), ('offering', 0.043), ('transition', 0.041), ('concept', 0.04), ('architecture', 0.039), ('implements', 0.037), ('toward', 0.037), ('approximators', 0.037), ('automate', 0.037), ('boyan', 0.037), ('char', 0.037), ('compile', 0.037), ('gnu', 0.037), ('massively', 0.037), ('mikio', 0.037), ('pomdps', 0.037), ('programmers', 0.037), ('promoted', 0.037), ('tech', 0.037), ('throw', 0.037), ('tim', 0.037), ('mathematics', 0.036), ('scienti', 0.033), ('ts', 0.033), ('braun', 0.033), ('documented', 0.033), ('lstd', 0.033), ('orient', 0.033), ('timestep', 0.033), ('unfamiliar', 0.033), ('fr', 0.031), ('cnrs', 0.031), ('functionality', 0.031), ('requirements', 0.031), ('walking', 0.031), ('allowing', 0.03), ('programming', 0.03), ('documentation', 0.029), ('extensible', 0.029), ('georgia', 0.029), ('private', 0.029), ('propagates', 0.029), ('readability', 0.029), ('terminal', 0.029), ('exceptions', 0.027), ('expresses', 0.027), ('inherit', 0.027), ('rand', 0.027), ('sa', 0.027), ('scratch', 0.027), ('students', 0.027), ('design', 0.026), ('offers', 0.026), ('accepted', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

Author: Hervé Frezza-Buet, Matthieu Geist

2 0.076807991 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

Author: Wendelin Böhmer, Steffen Grünewälder, Yun Shen, Marek Musial, Klaus Obermayer

Abstract: Linear reinforcement learning (RL) algorithms like least-squares temporal difference learning (LSTD) require basis functions that span approximation spaces of potential value functions. This article investigates methods to construct these bases from samples. We hypothesize that an ideal approximation spaces should encode diffusion distances and that slow feature analysis (SFA) constructs such spaces. To validate our hypothesis we provide theoretical statements about the LSTD value approximation error and induced metric of approximation spaces constructed by SFA and the state-of-the-art methods Krylov bases and proto-value functions (PVF). In particular, we prove that SFA minimizes the average (over all tasks in the same environment) bound on the above approximation error. Compared to other methods, SFA is very sensitive to sampling and can sometimes fail to encode the whole state space. We derive a novel importance sampling modiﬁcation to compensate for this effect. Finally, the LSTD and least squares policy iteration (LSPI) performance of approximation spaces constructed by Krylov bases, PVF, SFA and PCA is compared in benchmark tasks and a visual robot navigation experiment (both in a realistic simulation and with a robot). The results support our hypothesis and suggest that (i) SFA provides subspace-invariant features for MDPs with self-adjoint transition operators, which allows strong guarantees on the approximation error, (ii) the modiﬁed SFA algorithm is best suited for LSPI in both discrete and continuous state spaces and (iii) approximation spaces encoding diffusion distances facilitate LSPI performance. Keywords: reinforcement learning, diffusion distance, proto value functions, slow feature analysis, least-squares policy iteration, visual robot navigation c 2013 Wendelin B¨ hmer, Steffen Gr¨ new¨ lder, Yun Shen, Marek Musial and Klaus Obermayer. o u a ¨ ¨ ¨ B OHMER , G R UNEW ALDER , S HEN , M USIAL AND O BERMAYER

3 0.06331405 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines

Author: David Picard, Nicolas Thome, Matthieu Cord

Abstract: JKernelMachines is a Java library for learning with kernels. It is primarily designed to deal with custom kernels that are not easily found in standard libraries, such as kernels on structured data. These types of kernels are often used in computer vision or bioinformatics applications. We provide several kernels leading to state of the art classiﬁcation performances in computer vision, as well as various kernels on sets. The main focus of the library is to be easily extended with new kernels. Standard SVM optimization algorithms are available, but also more sophisticated learning-based kernel combination methods such as Multiple Kernel Learning (MKL), and a recently published algorithm to learn powered products of similarities (Product Kernel Learning). Keywords: classiﬁcation, support vector machines, kernel, computer vision

4 0.062536068 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

Author: Bruno Scherrer

Abstract: We consider the discrete-time inﬁnite-horizon optimal control problem formalized by Markov decision processes (Puterman, 1994; Bertsekas and Tsitsiklis, 1996). We revisit the work of Bertsekas and Ioffe (1996), that introduced λ policy iteration—a family of algorithms parametrized by a parameter λ—that generalizes the standard algorithms value and policy iteration, and has some deep connections with the temporal-difference algorithms described by Sutton and Barto (1998). We deepen the original theory developed by the authors by providing convergence rate bounds which generalize standard bounds for value iteration described for instance by Puterman (1994). Then, the main contribution of this paper is to develop the theory of this algorithm when it is used in an approximate form. We extend and unify the separate analyzes developed by Munos for approximate value iteration (Munos, 2007) and approximate policy iteration (Munos, 2003), and provide performance bounds in the discounted and the undiscounted situations. Finally, we revisit the use of this algorithm in the training of a Tetris playing controller as originally done by Bertsekas and Ioffe (1996). Our empirical results are different from those of Bertsekas and Ioffe (which were originally qualiﬁed as “paradoxical” and “intriguing”). We track down the reason to be a minor implementation error of the algorithm, which suggests that, in practice, λ policy iteration may be more stable than previously thought. Keywords: stochastic optimal control, reinforcement learning, Markov decision processes, analysis of algorithms

5 0.057279252 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

Author: Sean Ryan Fanello, Ilaria Gori, Giorgio Metta, Francesca Odone

Abstract: Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efﬁcient way. The proposed method relies on descriptors based on 3D Histograms of Scene Flow (3DHOFs) and Global Histograms of Oriented Gradient (GHOGs); adaptive sparse coding is applied to capture high-level patterns from data. We then propose a simultaneous on-line video segmentation and recognition of actions using linear SVMs. The main contribution of the paper is an effective realtime system for one-shot action modeling and recognition; the paper highlights the effectiveness of sparse coding techniques to represent 3D actions. We obtain very good results on three different data sets: a benchmark data set for one-shot action learning (the ChaLearn Gesture Data Set), an in-house data set acquired by a Kinect sensor including complex actions and gestures differing by small details, and a data set created for human-robot interaction purposes. Finally we demonstrate that our system is effective also in a human-robot interaction setting and propose a memory game, “All Gestures You Can”, to be played against a humanoid robot. Keywords: real-time action recognition, sparse representation, one-shot action learning, human robot interaction

6 0.046011977 65 jmlr-2013-Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem

7 0.041550331 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

8 0.033844948 67 jmlr-2013-MLPACK: A Scalable C++ Machine Learning Library

9 0.026218552 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

10 0.025501877 68 jmlr-2013-Machine Learning with Operational Costs

11 0.022641305 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

12 0.020306757 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

13 0.019644029 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

14 0.017786562 83 jmlr-2013-Orange: Data Mining Toolbox in Python

15 0.015238919 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections

16 0.014183113 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

17 0.01281271 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

18 0.012427966 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

19 0.012213862 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

20 0.011994717 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.073), (1, 0.025), (2, -0.104), (3, 0.046), (4, -0.117), (5, 0.052), (6, -0.138), (7, 0.025), (8, -0.028), (9, -0.041), (10, 0.036), (11, -0.105), (12, 0.062), (13, -0.009), (14, 0.059), (15, -0.182), (16, 0.036), (17, -0.033), (18, 0.032), (19, -0.057), (20, 0.118), (21, -0.103), (22, 0.044), (23, -0.016), (24, -0.064), (25, 0.01), (26, 0.041), (27, -0.09), (28, -0.008), (29, -0.099), (30, 0.039), (31, 0.098), (32, 0.06), (33, 0.039), (34, -0.051), (35, -0.182), (36, -0.061), (37, 0.167), (38, 0.011), (39, 0.105), (40, 0.019), (41, -0.069), (42, -0.113), (43, -0.247), (44, -0.074), (45, 0.017), (46, 0.068), (47, 0.05), (48, 0.073), (49, 0.129)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97868878 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

Author: Hervé Frezza-Buet, Matthieu Geist

2 0.52877146 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines

Author: David Picard, Nicolas Thome, Matthieu Cord

3 0.46169749 67 jmlr-2013-MLPACK: A Scalable C++ Machine Learning Library

Author: Ryan R. Curtin, James R. Cline, N. P. Slagle, William B. March, Parikshit Ram, Nishant A. Mehta, Alexander G. Gray

Abstract: MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both a simple, consistent API accessible to novice users and high performance and ﬂexibility to expert users by leveraging modern features of C++. MLPACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. MLPACK version 1.0.3, licensed under the LGPL, is available at http://www.mlpack.org. Keywords: C++, dual-tree algorithms, machine learning software, open source software, largescale learning 1. Introduction and Goals Though several machine learning libraries are freely available online, few, if any, offer efﬁcient algorithms to the average user. For instance, the popular Weka toolkit (Hall et al., 2009) emphasizes ease of use but scales poorly; the distributed Apache Mahout library offers scalability at a cost of higher overhead (such as clusters and powerful servers often unavailable to the average user). Also, few libraries offer breadth; for instance, libsvm (Chang and Lin, 2011) and the Tilburg MemoryBased Learner (TiMBL) are highly scalable and accessible yet each offer only a single method. MLPACK, intended to be the machine learning analog to the general-purpose LAPACK linear algebra library, aims to combine efﬁciency and accessibility. Written in C++, MLPACK uses the highly efﬁcient Armadillo matrix library (Sanderson, 2010) and is freely available under the GNU Lesser General Public License (LGPL). Through the use of C++ templates, MLPACK both eliminates unnecessary copying of data sets and performs expression optimizations unavailable in other languages. Also, MLPACK is, to our knowledge, unique among existing libraries in using generic programming features of C++ to allow customization of the available machine learning methods without incurring performance penalties. c 2013 Ryan R. Curtin, James R. Cline, N. P. Slagle, William B. March, Parikshit Ram, Nishant A. Mehta and Alexander G. Gray. C URTIN , C LINE , S LAGLE , M ARCH , R AM , M EHTA AND G RAY In addition, users ranging from students to experts should ﬁnd the consistent, intuitive interface of MLPACK to be highly accessible. Finally, the source code provides references and comprehensive documentation. Four major goals of the development team of MLPACK are • • • • to implement scalable, fast machine learning algorithms, to design an intuitive, consistent, and simple API for non-expert users, to implement a variety of machine learning methods, and to provide cutting-edge machine learning algorithms unavailable elsewhere. This paper offers both an introduction to the simple and extensible API and a glimpse of the superior performance of the library. 2. Package Overview Each algorithm available in MLPACK features both a set of C++ library functions and a standalone command-line executable. Version 1.0.3 includes the following methods: • • • • • • • • • • • • • • nearest/furthest neighbor search with cover trees or kd-trees (k-nearest-neighbors) range search with cover trees or kd-trees Gaussian mixture models (GMMs) hidden Markov models (HMMs) LARS / Lasso regression k-means clustering fast hierarchical clustering (Euclidean MST calculation)1 (March et al., 2010) kernel PCA (and regular PCA) local coordinate coding1 (Yu et al., 2009) sparse coding using dictionary learning RADICAL (Robust, Accurate, Direct ICA aLgorithm) (Learned-Miller and Fisher, 2003) maximum variance unfolding (MVU) via LRSDP1 (Burer and Monteiro, 2003) the naive Bayes classiﬁer density estimation trees1 (Ram and Gray, 2011) The development team manages MLPACK with Subversion and the Trac bug reporting system, allowing easy downloads and simple bug reporting. The entire development process is transparent, so any interested user can easily contribute to the library. MLPACK can compile from source on Linux, Mac OS, and Windows; currently, different Linux distributions are reviewing MLPACK for inclusion in their package managers, which will allow users to install MLPACK without needing to compile from source. 3. A Consistent, Simple API MLPACK features a highly accessible API, both in style (such as consistent naming schemes and coding conventions) and ease of use (such as templated defaults), as well as stringent documentation standards. Consequently, a new user can execute algorithms out-of-the-box often with little or no adjustment to parameters, while the seasoned expert can expect extreme ﬂexibility in algorithmic 1. This algorithm is not available in any other comparable software package. 802 MLPACK: A S CALABLE C++ M ACHINE L EARNING L IBRARY Data Set wine cloud wine-qual isolet miniboone yp-msd corel covtype mnist randu MLPACK 0.0003 0.0069 0.0290 13.0197 20.2045 5430.0478 4.9716 14.3449 2719.8087 1020.9142 Weka 0.0621 0.1174 0.8868 213.4735 216.1469 >9000.0000 14.4264 45.9912 >9000.0000 2665.0921 Shogun 0.0277 0.5000 4.3617 37.6190 2351.4637 >9000.0000 555.9600 >9000.0000 3536.4477 >9000.0000 MATLAB 0.0021 0.0210 0.6465 46.9518 1088.1127 >9000.0000 60.8496 >9000.0000 4838.6747 1679.2893 mlpy 0.0025 0.3520 4.0431 52.0437 3219.2696 >9000.0000 209.5056 >9000.0000 5192.3586 >9000.0000 sklearn 0.0008 0.0192 0.1668 46.8016 714.2385 >9000.0000 160.4597 651.6259 5363.9650 8780.0176 Table 1: k-NN benchmarks (in seconds). Data Set UCI Name Size wine Wine 178x13 cloud Cloud 2048x10 wine-qual Wine Quality 6497x11 isolet ISOLET 7797x617 miniboone MiniBooNE 130064x50 Data Set UCI Name Size yp-msd YearPredictionMSD 515345x90 corel Corel 37749x32 covtype Covertype 581082x54 mnist N/A 70000x784 randu N/A 1000000x10 Table 2: Benchmark data set sizes. tuning. For example, the following line initializes an object which will perform the standard kmeans clustering in Euclidean space: KMeans

4 0.42609209 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library

Author: Sergey Lisitsyn, Christian Widmer, Fernando J. Iglesias Garcia

Abstract: We present Tapkee, a C++ template library that provides efﬁcient implementations of more than 20 widely used dimensionality reduction techniques ranging from Locally Linear Embedding (Roweis and Saul, 2000) and Isomap (de Silva and Tenenbaum, 2002) to the recently introduced BarnesHut-SNE (van der Maaten, 2013). Our library was designed with a focus on performance and ﬂexibility. For performance, we combine efﬁcient multi-core algorithms, modern data structures and state-of-the-art low-level libraries. To achieve ﬂexibility, we designed a clean interface for applying methods to user data and provide a callback API that facilitates integration with the library. The library is freely available as open-source software and is distributed under the permissive BSD 3-clause license. We encourage the integration of Tapkee into other open-source toolboxes and libraries. For example, Tapkee has been integrated into the codebase of the Shogun toolbox (Sonnenburg et al., 2010), giving us access to a rich set of kernels, distance measures and bindings to common programming languages including Python, Octave, Matlab, R, Java, C#, Ruby, Perl and Lua. Source code, examples and documentation are available at http://tapkee.lisitsyn.me. Keywords: dimensionality reduction, machine learning, C++, open source software

5 0.39882928 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

Author: Bruno Scherrer

6 0.338539 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

7 0.28061399 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

8 0.1808234 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming

9 0.16313486 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

10 0.15911357 65 jmlr-2013-Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem

11 0.14692876 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

12 0.14221641 68 jmlr-2013-Machine Learning with Operational Costs

13 0.13697413 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising

14 0.13144717 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

15 0.12527573 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

16 0.11888722 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

17 0.112422 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

18 0.10669226 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries

19 0.10458657 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

20 0.10309923 80 jmlr-2013-One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.015), (5, 0.097), (6, 0.018), (10, 0.023), (20, 0.029), (23, 0.034), (44, 0.018), (70, 0.012), (75, 0.022), (87, 0.039), (90, 0.592)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81045723 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics

Author: Hervé Frezza-Buet, Matthieu Geist

2 0.46419317 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classiﬁcation function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classiﬁcation function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efﬁcient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artiﬁcial and standard benchmark data sets where it obtains superior classiﬁcation accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy

3 0.23159498 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs

Author: Yu-Feng Li, Ivor W. Tsang, James T. Kwok, Zhi-Hua Zhou

Abstract: In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difﬁcult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the W ELL SVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Deﬁnite Programming (SDP) relaxations. Moreover, the W ELL SVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and W ELL SVM is also readily applicable on large data sets. Keywords: weakly labeled data, semi-supervised learning, multi-instance learning, clustering, cutting plane, convex relaxation

4 0.19897857 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

Author: Wendelin Böhmer, Steffen Grünewälder, Yun Shen, Marek Musial, Klaus Obermayer

5 0.18733181 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

Author: Mark Vere Culp, Kenneth Joseph Ryan

Abstract: The cluster assumption had a signiﬁcant impact on the reasoning behind semi-supervised classiﬁcation methods in graph-based learning. The literature includes numerous applications where harmonic functions provided estimates that conformed to data satisfying this well-known assumption, but the relationship between this assumption and harmonic functions is not as well-understood theoretically. We investigate these matters from the perspective of supervised kernel classiﬁcation and provide concrete answers to two fundamental questions. (i) Under what conditions do semisupervised harmonic approaches satisfy this assumption? (ii) If such an assumption is satisﬁed then why precisely would an observation sacriﬁce its own supervised estimate in favor of the cluster? First, a harmonic function is guaranteed to assign labels to data in harmony with the cluster assumption if a speciﬁc condition on the boundary of the harmonic function is satisﬁed. Second, it is shown that any harmonic function estimate within the interior is a probability weighted average of supervised estimates, where the weight is focused on supervised kernel estimates near labeled cases. We demonstrate that the uniqueness criterion for harmonic estimators is sensitive when the graph is sparse or the size of the boundary is relatively small. This sets the stage for a third contribution, a new regularized joint harmonic function for semi-supervised learning based on a joint optimization criterion. Mathematical properties of this estimator, such as its uniqueness even when the graph is sparse or the size of the boundary is relatively small, are proven. A main selling point is its ability to operate in circumstances where the cluster assumption may not be fully satisﬁed on real data by compromising between the purely harmonic and purely supervised estimators. The competitive stature of the new regularized joint harmonic approach is established. Keywords: harmonic function, joint training, cluster assumption, s

6 0.182924 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

7 0.18048605 15 jmlr-2013-Bayesian Canonical Correlation Analysis

8 0.18007271 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

9 0.17959438 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

10 0.1784728 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

11 0.17820151 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

12 0.17755099 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

13 0.17732763 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

14 0.17699024 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

15 0.17667191 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

16 0.17605139 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

17 0.17588003 114 jmlr-2013-The Rate of Convergence of AdaBoost

18 0.17473385 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

19 0.1747123 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

20 0.17460412 59 jmlr-2013-Large-scale SVD and Manifold Learning