jmlr jmlr2012 jmlr2012-79 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski
Abstract: Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http: //organic.elis.ugent.be/oger. Keywords: Python, modular architectures, sequential processing
Reference: text
sentIndex sentText sentNum sentScore
1 AT Institute for Theoretical Computer Science Graz University of Technology Graz, Austria Editor: Cheng Soon Ong Abstract Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. [sent-13, score-0.266]
2 It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. [sent-14, score-0.178]
3 Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. [sent-15, score-0.413]
4 Introduction The Oger toolbox originated from the need to rapidly implement, investigate and compare complex architectures built from state-of-the-art sequential processing algorithms, focused on but not limited to reservoir computing, and to apply these architectures to large real-world tasks. [sent-21, score-0.681]
5 , 2007) whereby a random non-linear dynamical system (usually a recurrent neural network) is left untrained and used as input to a simple learning algorithm such as linear regression. [sent-23, score-0.025]
6 A number of smaller toolboxes for reservoir computing are available, written in C++, Java and Matlab. [sent-24, score-0.375]
7 1 However, these are generally focused on specific implementations of RC (echo state networks or liquid state machines) and offer less flexibility in creating and evaluating complex architectures. [sent-25, score-0.05]
8 Rather than contribute yet another toolbox which reimplements many standard algorithms, one of our design choices for Oger was to incorporate existing packages where possible. [sent-26, score-0.081]
9 The basic processing blocks (nodes) are combined with methods for constructing and training architectures. [sent-34, score-0.023]
10 These architectures can then be evaluated in a validation and optimization framework. [sent-35, score-0.144]
11 ularity was one of the key requirements for Oger, it has been based on the well known and widely used Modular Data Processing toolkit (MDP), which provides this modularity in addition to a wide variety of machine learning algorithms (Zito et al. [sent-36, score-0.084]
12 These nodes can then be combined into an arbitrary feedforward graph structure called a Flow. [sent-39, score-0.058]
13 Much of the error- and type-checking is abstracted away through the object-oriented interface, such that the developer can focus on implementing the actual algorithm. [sent-40, score-0.045]
14 Mature and feature-complete packages for plotting (matplotlib) and general scientific computing (SciPy) that in many respects come close to commercial alternatives are available, along with a plethora of smaller libraries providing specific functions. [sent-42, score-0.041]
15 Features In this section we describe the main features of Oger and give a usage example. [sent-44, score-0.073]
16 2 Oger adds several new methods to this set: – Several reservoir implementations : a basic reservoir with customizable nonlinear function and weight topologies, a leaky integrator reservoir, and a GPU-optimized reservoir using CUDA. [sent-47, score-1.102]
17 – Wrappers for creating spiking reservoirs using PyNN-compatible neural network simulators (Davison et al. [sent-48, score-0.075]
18 – A logistic regression node trainable with different optimizers such as IRLS, conjugate gradient, BFGS and others. [sent-50, score-0.134]
19 Additionally, Oger supports backpropagation training using various methods of gradient descent, such as stochastic gradient descent, RPROP and others. [sent-57, score-0.023]
20 Finally, a FreerunFlow allows easy training and execution of architectures with feedback, for instance for time-series generation tasks (see the usage example below). [sent-58, score-0.271]
21 2 Validation, Optimization and Parallel Execution Around the data processing algorithms described above, Oger offers functionality for large-scale validation and optimization. [sent-60, score-0.033]
22 The validation automates the process of constructing training and test sets, and the actual training and evaluation. [sent-61, score-0.101]
23 Several standard validation schemes are provided (nfold, leave-one-out (LOO) cross-validation and others), but this can be customized (for example, if a fixed training and test set is defined). [sent-62, score-0.078]
24 The optimization itself can be done using grid-searching, or using an interface to any of the algorithms in scipy. [sent-65, score-0.033]
25 Finally, a variety of error measures and utility classes such as a ConfusionMatrix are included. [sent-67, score-0.023]
26 Oger allows two modes of parallel execution, both local (multi-threaded or multi-process) and on a computing grid. [sent-68, score-0.102]
27 The first mode is inherited from MDP, where the training and execution of a flow on a data set consisting of different chunks can be done in parallel (if the nodes in the flow support this). [sent-69, score-0.261]
28 The second mode is the parallel evaluation of parameter points for grid-searching and CMA-ES (the scipy. [sent-70, score-0.093]
29 Both modes use runtime overloading of class methods by their parallel versions, which makes the transition from sequential to parallel execution very user-friendly and possible using a couple of lines of code (see the usage example below). [sent-72, score-0.405]
30 Usage Example As an illustrative example, we construct and train a reservoir and readout setup with output feedback for generating the Mackey-Glass time-series. [sent-74, score-0.556]
31 We refer to the Oger website and the Oger installation package for more usage examples. [sent-75, score-0.123]
32 1 2 3 4 5 6 7 8 9 10 11 from scipy import * import Oger , mdp signals = Oger . [sent-76, score-0.333]
33 mackey_glass ( n_samples =4 , sample_len =3000) res = Oger . [sent-78, score-0.055]
34 LeakyReservoirNode ( output_dim =400 , reset_states = False ) readout = Oger . [sent-80, score-0.177]
35 FreerunFlow ([ res , readout ], freerun_steps =300) parameters = { res :{ ’ input_scaling ’: arange (. [sent-84, score-0.405]
36 1) }} internal_params = { readout :{ ’ ridge_param ’: 10. [sent-91, score-0.177]
37 activate_extension ( ’ parallel ’) 2997 V ERSTRAETEN , S CHRAUWEN , D IELEMAN , B RAKEL , B UTENEERS AND P ECEVSKI 12 opt . [sent-101, score-0.144]
38 grid_search ([[] , signals [: -1]] , flow , Oger . [sent-102, score-0.098]
39 leave_one_out , internal_params ) 13 opt_flow = opt . [sent-104, score-0.074]
40 execute ( signals [ -1][0]) On line 3, the data set is generated, which in this case consists of four Mackey-Glass timeseries generated from different initial states. [sent-107, score-0.078]
41 In the next two lines, a reservoir node and a linear readout node trained with ridge regression are created. [sent-108, score-0.715]
42 Line 6 concatenates these nodes into a FreerunFlow, which provides one-step ahead prediction during training and feeds the output back to the input of the flow during execution. [sent-109, score-0.081]
43 Lines 7 and 8 define a search space for the reservoir parameters and the regularization constant of the readout node which is optimized separately for each set of reservoir parameters. [sent-110, score-0.983]
44 On line 12, the actual optimization is performed using LOO cross-validation on the four time-series, while for each fold the regularization constant for the ridge regression is optimized again using LOO crossvalidation. [sent-113, score-0.085]
45 On line 13 the Optimizer is queried to return the optimal flow, which is subsequently trained using all the training signals and applied to an unseen test signal in lines 14 and 15 respectively. [sent-115, score-0.105]
46 u Pynn: A common interface for neuronal network simulators. [sent-128, score-0.058]
47 Modular toolkit for data processing (mdp): A python data processing framework. [sent-147, score-0.131]
wordName wordTfidf (topN-words)
[('oger', 0.706), ('reservoir', 0.352), ('readout', 0.177), ('mdp', 0.157), ('verstraeten', 0.147), ('arange', 0.118), ('dejan', 0.118), ('schrauwen', 0.118), ('architectures', 0.111), ('python', 0.092), ('modular', 0.092), ('brakel', 0.088), ('buteneers', 0.088), ('dieleman', 0.088), ('freerunflow', 0.088), ('pecevski', 0.088), ('philemon', 0.088), ('sander', 0.075), ('node', 0.075), ('optimizer', 0.074), ('opt', 0.074), ('usage', 0.073), ('parallel', 0.07), ('pieter', 0.068), ('loo', 0.068), ('execution', 0.064), ('chrauwen', 0.059), ('davison', 0.059), ('ecevski', 0.059), ('erstraeten', 0.059), ('ghent', 0.059), ('ieleman', 0.059), ('neuroinformatics', 0.059), ('organic', 0.059), ('rakel', 0.059), ('trainable', 0.059), ('uteneers', 0.059), ('zito', 0.059), ('nodes', 0.058), ('res', 0.055), ('signals', 0.053), ('ow', 0.053), ('benjamin', 0.052), ('graz', 0.05), ('spiking', 0.05), ('modularity', 0.045), ('frontiers', 0.045), ('flow', 0.045), ('scipy', 0.045), ('sequential', 0.042), ('packages', 0.041), ('toolbox', 0.04), ('toolkit', 0.039), ('import', 0.039), ('ridge', 0.036), ('boltzmann', 0.035), ('interface', 0.033), ('validation', 0.033), ('modes', 0.032), ('website', 0.029), ('rc', 0.029), ('lines', 0.029), ('david', 0.027), ('optimized', 0.027), ('feedback', 0.027), ('creating', 0.025), ('implementations', 0.025), ('igi', 0.025), ('untrained', 0.025), ('neuronal', 0.025), ('mature', 0.025), ('bfgs', 0.025), ('timeseries', 0.025), ('originated', 0.025), ('rocessing', 0.025), ('blas', 0.025), ('cale', 0.025), ('crbm', 0.025), ('echo', 0.025), ('gpu', 0.025), ('overloading', 0.025), ('exibility', 0.024), ('mode', 0.023), ('utility', 0.023), ('training', 0.023), ('optionally', 0.023), ('crossvalidation', 0.023), ('lgpl', 0.023), ('offering', 0.023), ('commission', 0.023), ('chunks', 0.023), ('ger', 0.023), ('irls', 0.023), ('developer', 0.023), ('toolboxes', 0.023), ('numpy', 0.023), ('schemes', 0.022), ('actual', 0.022), ('adds', 0.021), ('installation', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing
Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski
Abstract: Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http: //organic.elis.ugent.be/oger. Keywords: Python, modular architectures, sequential processing
2 0.051045612 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
Author: Stephen R. Piccolo, Lewis J. Frey
Abstract: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet flexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net. Keywords: toolbox, classification, parallel, ensemble, reproducible research
3 0.043310747 90 jmlr-2012-Pattern for Python
Author: Tom De Smedt, Walter Daelemans
Abstract: Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. Keywords: Python, data mining, natural language processing, machine learning, graph networks
4 0.039235447 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning
Author: Aviv Tamar, Dotan Di Castro, Ron Meir
Abstract: In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent’s knowledge. Our method relies on a novel definition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem. Keywords: reinforcement learning, temporal difference, stochastic approximation, markov decision processes, hybrid model based model free algorithms
5 0.037183661 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy
Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné
Abstract: DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license. Keywords: distributed evolutionary algorithms, software tools
6 0.026039401 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems
7 0.02301072 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
8 0.0219969 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox
9 0.020875784 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
10 0.020634167 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
11 0.020558449 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization
12 0.019756826 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
13 0.019372983 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
14 0.018066341 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering
15 0.016233791 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
16 0.015819857 20 jmlr-2012-Analysis of a Random Forests Model
17 0.015674761 55 jmlr-2012-Learning Algorithms for the Classification Restricted Boltzmann Machine
18 0.013601675 53 jmlr-2012-Jstacs: A Java Framework for Statistical Analysis and Classification of Biological Sequences
19 0.013456326 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
20 0.013131901 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
topicId topicWeight
[(0, -0.055), (1, 0.013), (2, 0.095), (3, -0.052), (4, 0.034), (5, 0.027), (6, 0.029), (7, -0.032), (8, -0.042), (9, -0.018), (10, -0.074), (11, 0.013), (12, 0.116), (13, -0.08), (14, -0.088), (15, 0.052), (16, 0.03), (17, -0.052), (18, -0.119), (19, -0.103), (20, -0.03), (21, -0.0), (22, -0.02), (23, 0.044), (24, -0.039), (25, -0.161), (26, 0.133), (27, -0.099), (28, -0.087), (29, -0.021), (30, -0.102), (31, 0.129), (32, -0.037), (33, 0.023), (34, 0.003), (35, 0.041), (36, -0.09), (37, -0.065), (38, -0.056), (39, 0.03), (40, 0.224), (41, -0.255), (42, 0.191), (43, -0.106), (44, -0.185), (45, 0.144), (46, 0.023), (47, -0.116), (48, 0.128), (49, -0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.97436434 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing
Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski
Abstract: Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http: //organic.elis.ugent.be/oger. Keywords: Python, modular architectures, sequential processing
2 0.44956756 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy
Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné
Abstract: DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license. Keywords: distributed evolutionary algorithms, software tools
3 0.41252017 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
Author: Stephen R. Piccolo, Lewis J. Frey
Abstract: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classification analyses in a systematic yet flexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net. Keywords: toolbox, classification, parallel, ensemble, reproducible research
4 0.38729423 90 jmlr-2012-Pattern for Python
Author: Tom De Smedt, Walter Daelemans
Abstract: Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. Keywords: Python, data mining, natural language processing, machine learning, graph networks
5 0.32524446 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems
Author: Daniel L. Ly, Hod Lipson
Abstract: A hybrid dynamical system is a mathematical model suitable for describing an extensive spectrum of multi-modal, time-series behaviors, ranging from bouncing balls to air traffic controllers. This paper describes multi-modal symbolic regression (MMSR): a learning algorithm to construct non-linear symbolic representations of discrete dynamical systems with continuous mappings from unlabeled, time-series data. MMSR consists of two subalgorithms—clustered symbolic regression, a method to simultaneously identify distinct behaviors while formulating their mathematical expressions, and transition modeling, an algorithm to infer symbolic inequalities that describe binary classification boundaries. These subalgorithms are combined to infer hybrid dynamical systems as a collection of apt, mathematical expressions. MMSR is evaluated on a collection of four synthetic data sets and outperforms other multi-modal machine learning approaches in both accuracy and interpretability, even in the presence of noise. Furthermore, the versatility of MMSR is demonstrated by identifying and inferring classical expressions of transistor modes from recorded measurements. Keywords: hybrid dynamical systems, evolutionary computation, symbolic piecewise functions, symbolic binary classification
6 0.27923781 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning
7 0.16673462 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
8 0.16482961 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
10 0.16045502 54 jmlr-2012-Large-scale Linear Support Vector Regression
11 0.15130654 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox
12 0.13830629 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
13 0.13166861 20 jmlr-2012-Analysis of a Random Forests Model
14 0.12993625 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
15 0.12659934 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
16 0.11856937 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting
17 0.10820568 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
18 0.10761247 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
19 0.1067974 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
20 0.10199341 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering
topicId topicWeight
[(21, 0.043), (26, 0.613), (27, 0.057), (29, 0.016), (56, 0.047), (69, 0.015), (75, 0.015), (77, 0.01), (92, 0.013), (96, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.96014261 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing
Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski
Abstract: Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http: //organic.elis.ugent.be/oger. Keywords: Python, modular architectures, sequential processing
2 0.80616891 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
Author: Zhiwei Qin, Donald Goldfarb
Abstract: We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty l1 /l2 -norm and the l1 /l∞ -norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As one of the core building-blocks of this framework, we develop new algorithms using a partial-linearization/splitting technique and prove that the accelerated versions 1 of these algorithms require O( √ε ) iterations to obtain an ε-optimal solution. We compare the performance of these algorithms against that of the alternating direction augmented Lagrangian and FISTA methods on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms. Keywords: structured sparsity, overlapping Group Lasso, alternating direction methods, variable splitting, augmented Lagrangian
Author: Michael U. Gutmann, Aapo Hyvärinen
Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities. Keywords: statistics unnormalized models, partition function, computation, estimation, natural image
4 0.78237808 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet
5 0.44334692 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor
6 0.4341107 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
7 0.42424363 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
8 0.41317767 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
9 0.41294548 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
10 0.3842321 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
11 0.37076527 72 jmlr-2012-Multi-Target Regression with Rule Ensembles
12 0.36537129 54 jmlr-2012-Large-scale Linear Support Vector Regression
13 0.36516225 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets
14 0.36385739 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
15 0.36323214 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
16 0.35983887 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox
17 0.35954332 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization
18 0.35283208 103 jmlr-2012-Sampling Methods for the Nyström Method
19 0.34835085 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
20 0.33294857 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints