jmlr jmlr2012 jmlr2012-79 knowledge-graph by maker-knowledge-mining

79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing

Source: pdf

Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski

Abstract: Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. Oger is released under the GNU LGPL, and is available from http: //organic.elis.ugent.be/oger. Keywords: Python, modular architectures, sequential processing

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 AT Institute for Theoretical Computer Science Graz University of Technology Graz, Austria Editor: Cheng Soon Ong Abstract Oger (OrGanic Environment for Reservoir computing) is a Python toolbox for building, training and evaluating modular learning architectures on large data sets. [sent-13, score-0.266]

2 It builds on MDP for its modularity, and adds processing of sequential data sets, gradient descent training, several crossvalidation schemes and parallel parameter optimization methods. [sent-14, score-0.178]

3 Additionally, several learning algorithms are implemented, such as different reservoir implementations (both sigmoid and spiking), ridge regression, conditional restricted Boltzmann machine (CRBM) and others, including GPU accelerated versions. [sent-15, score-0.413]

4 Introduction The Oger toolbox originated from the need to rapidly implement, investigate and compare complex architectures built from state-of-the-art sequential processing algorithms, focused on but not limited to reservoir computing, and to apply these architectures to large real-world tasks. [sent-21, score-0.681]

5 , 2007) whereby a random non-linear dynamical system (usually a recurrent neural network) is left untrained and used as input to a simple learning algorithm such as linear regression. [sent-23, score-0.025]

6 A number of smaller toolboxes for reservoir computing are available, written in C++, Java and Matlab. [sent-24, score-0.375]

7 1 However, these are generally focused on speciﬁc implementations of RC (echo state networks or liquid state machines) and offer less ﬂexibility in creating and evaluating complex architectures. [sent-25, score-0.05]

8 Rather than contribute yet another toolbox which reimplements many standard algorithms, one of our design choices for Oger was to incorporate existing packages where possible. [sent-26, score-0.081]

9 The basic processing blocks (nodes) are combined with methods for constructing and training architectures. [sent-34, score-0.023]

10 These architectures can then be evaluated in a validation and optimization framework. [sent-35, score-0.144]

11 ularity was one of the key requirements for Oger, it has been based on the well known and widely used Modular Data Processing toolkit (MDP), which provides this modularity in addition to a wide variety of machine learning algorithms (Zito et al. [sent-36, score-0.084]

12 These nodes can then be combined into an arbitrary feedforward graph structure called a Flow. [sent-39, score-0.058]

13 Much of the error- and type-checking is abstracted away through the object-oriented interface, such that the developer can focus on implementing the actual algorithm. [sent-40, score-0.045]

14 Mature and feature-complete packages for plotting (matplotlib) and general scientiﬁc computing (SciPy) that in many respects come close to commercial alternatives are available, along with a plethora of smaller libraries providing speciﬁc functions. [sent-42, score-0.041]

15 Features In this section we describe the main features of Oger and give a usage example. [sent-44, score-0.073]

16 2 Oger adds several new methods to this set: – Several reservoir implementations : a basic reservoir with customizable nonlinear function and weight topologies, a leaky integrator reservoir, and a GPU-optimized reservoir using CUDA. [sent-47, score-1.102]

17 – Wrappers for creating spiking reservoirs using PyNN-compatible neural network simulators (Davison et al. [sent-48, score-0.075]

18 – A logistic regression node trainable with different optimizers such as IRLS, conjugate gradient, BFGS and others. [sent-50, score-0.134]

19 Additionally, Oger supports backpropagation training using various methods of gradient descent, such as stochastic gradient descent, RPROP and others. [sent-57, score-0.023]

20 Finally, a FreerunFlow allows easy training and execution of architectures with feedback, for instance for time-series generation tasks (see the usage example below). [sent-58, score-0.271]

21 2 Validation, Optimization and Parallel Execution Around the data processing algorithms described above, Oger offers functionality for large-scale validation and optimization. [sent-60, score-0.033]

22 The validation automates the process of constructing training and test sets, and the actual training and evaluation. [sent-61, score-0.101]

23 Several standard validation schemes are provided (nfold, leave-one-out (LOO) cross-validation and others), but this can be customized (for example, if a ﬁxed training and test set is deﬁned). [sent-62, score-0.078]

24 The optimization itself can be done using grid-searching, or using an interface to any of the algorithms in scipy. [sent-65, score-0.033]

25 Finally, a variety of error measures and utility classes such as a ConfusionMatrix are included. [sent-67, score-0.023]

26 Oger allows two modes of parallel execution, both local (multi-threaded or multi-process) and on a computing grid. [sent-68, score-0.102]

27 The ﬁrst mode is inherited from MDP, where the training and execution of a ﬂow on a data set consisting of different chunks can be done in parallel (if the nodes in the ﬂow support this). [sent-69, score-0.261]

28 The second mode is the parallel evaluation of parameter points for grid-searching and CMA-ES (the scipy. [sent-70, score-0.093]

29 Both modes use runtime overloading of class methods by their parallel versions, which makes the transition from sequential to parallel execution very user-friendly and possible using a couple of lines of code (see the usage example below). [sent-72, score-0.405]

30 Usage Example As an illustrative example, we construct and train a reservoir and readout setup with output feedback for generating the Mackey-Glass time-series. [sent-74, score-0.556]

31 We refer to the Oger website and the Oger installation package for more usage examples. [sent-75, score-0.123]

32 1 2 3 4 5 6 7 8 9 10 11 from scipy import * import Oger , mdp signals = Oger . [sent-76, score-0.333]

33 mackey_glass ( n_samples =4 , sample_len =3000) res = Oger . [sent-78, score-0.055]

34 LeakyReservoirNode ( output_dim =400 , reset_states = False ) readout = Oger . [sent-80, score-0.177]

35 FreerunFlow ([ res , readout ], freerun_steps =300) parameters = { res :{ ’ input_scaling ’: arange (. [sent-84, score-0.405]

36 1) }} internal_params = { readout :{ ’ ridge_param ’: 10. [sent-91, score-0.177]

37 activate_extension ( ’ parallel ’) 2997 V ERSTRAETEN , S CHRAUWEN , D IELEMAN , B RAKEL , B UTENEERS AND P ECEVSKI 12 opt . [sent-101, score-0.144]

38 grid_search ([[] , signals [: -1]] , flow , Oger . [sent-102, score-0.098]

39 leave_one_out , internal_params ) 13 opt_flow = opt . [sent-104, score-0.074]

40 execute ( signals [ -1][0]) On line 3, the data set is generated, which in this case consists of four Mackey-Glass timeseries generated from different initial states. [sent-107, score-0.078]

41 In the next two lines, a reservoir node and a linear readout node trained with ridge regression are created. [sent-108, score-0.715]

42 Line 6 concatenates these nodes into a FreerunFlow, which provides one-step ahead prediction during training and feeds the output back to the input of the ﬂow during execution. [sent-109, score-0.081]

43 Lines 7 and 8 deﬁne a search space for the reservoir parameters and the regularization constant of the readout node which is optimized separately for each set of reservoir parameters. [sent-110, score-0.983]

44 On line 12, the actual optimization is performed using LOO cross-validation on the four time-series, while for each fold the regularization constant for the ridge regression is optimized again using LOO crossvalidation. [sent-113, score-0.085]

45 On line 13 the Optimizer is queried to return the optimal ﬂow, which is subsequently trained using all the training signals and applied to an unseen test signal in lines 14 and 15 respectively. [sent-115, score-0.105]

46 u Pynn: A common interface for neuronal network simulators. [sent-128, score-0.058]

47 Modular toolkit for data processing (mdp): A python data processing framework. [sent-147, score-0.131]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('oger', 0.706), ('reservoir', 0.352), ('readout', 0.177), ('mdp', 0.157), ('verstraeten', 0.147), ('arange', 0.118), ('dejan', 0.118), ('schrauwen', 0.118), ('architectures', 0.111), ('python', 0.092), ('modular', 0.092), ('brakel', 0.088), ('buteneers', 0.088), ('dieleman', 0.088), ('freerunflow', 0.088), ('pecevski', 0.088), ('philemon', 0.088), ('sander', 0.075), ('node', 0.075), ('optimizer', 0.074), ('opt', 0.074), ('usage', 0.073), ('parallel', 0.07), ('pieter', 0.068), ('loo', 0.068), ('execution', 0.064), ('chrauwen', 0.059), ('davison', 0.059), ('ecevski', 0.059), ('erstraeten', 0.059), ('ghent', 0.059), ('ieleman', 0.059), ('neuroinformatics', 0.059), ('organic', 0.059), ('rakel', 0.059), ('trainable', 0.059), ('uteneers', 0.059), ('zito', 0.059), ('nodes', 0.058), ('res', 0.055), ('signals', 0.053), ('ow', 0.053), ('benjamin', 0.052), ('graz', 0.05), ('spiking', 0.05), ('modularity', 0.045), ('frontiers', 0.045), ('flow', 0.045), ('scipy', 0.045), ('sequential', 0.042), ('packages', 0.041), ('toolbox', 0.04), ('toolkit', 0.039), ('import', 0.039), ('ridge', 0.036), ('boltzmann', 0.035), ('interface', 0.033), ('validation', 0.033), ('modes', 0.032), ('website', 0.029), ('rc', 0.029), ('lines', 0.029), ('david', 0.027), ('optimized', 0.027), ('feedback', 0.027), ('creating', 0.025), ('implementations', 0.025), ('igi', 0.025), ('untrained', 0.025), ('neuronal', 0.025), ('mature', 0.025), ('bfgs', 0.025), ('timeseries', 0.025), ('originated', 0.025), ('rocessing', 0.025), ('blas', 0.025), ('cale', 0.025), ('crbm', 0.025), ('echo', 0.025), ('gpu', 0.025), ('overloading', 0.025), ('exibility', 0.024), ('mode', 0.023), ('utility', 0.023), ('training', 0.023), ('optionally', 0.023), ('crossvalidation', 0.023), ('lgpl', 0.023), ('offering', 0.023), ('commission', 0.023), ('chunks', 0.023), ('ger', 0.023), ('irls', 0.023), ('developer', 0.023), ('toolboxes', 0.023), ('numpy', 0.023), ('schemes', 0.022), ('actual', 0.022), ('adds', 0.021), ('installation', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing

Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski

2 0.051045612 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel

Author: Stephen R. Piccolo, Lewis J. Frey

Abstract: Motivated by a need to classify high-dimensional, heterogeneous data from the bioinformatics domain, we developed ML-Flex, a machine-learning toolbox that enables users to perform two-class and multi-class classiﬁcation analyses in a systematic yet ﬂexible manner. ML-Flex was written in Java but is capable of interfacing with third-party packages written in other programming languages. It can handle multiple input-data formats and supports a variety of customizations. MLFlex provides implementations of various validation strategies, which can be executed in parallel across multiple computing cores, processors, and nodes. Additionally, ML-Flex supports aggregating evidence across multiple algorithms and data sets via ensemble learning. This open-source software package is freely available from http://mlflex.sourceforge.net. Keywords: toolbox, classiﬁcation, parallel, ensemble, reproducible research

3 0.043310747 90 jmlr-2012-Pattern for Python

Author: Tom De Smedt, Walter Daelemans

Abstract: Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classiﬁers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. Keywords: Python, data mining, natural language processing, machine learning, graph networks

4 0.039235447 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

Author: Aviv Tamar, Dotan Di Castro, Ron Meir

Abstract: In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent’s knowledge. Our method relies on a novel deﬁnition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem. Keywords: reinforcement learning, temporal difference, stochastic approximation, markov decision processes, hybrid model based model free algorithms

5 0.037183661 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy

Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné

Abstract: DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. Its design departs from most other existing frameworks in that it seeks to make algorithms explicit and data structures transparent, as opposed to the more common black-box frameworks. Freely available with extensive documentation at http://deap.gel.ulaval.ca, DEAP is an open source project under an LGPL license. Keywords: distributed evolutionary algorithms, software tools

6 0.026039401 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

7 0.02301072 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

8 0.0219969 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

9 0.020875784 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

10 0.020634167 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression

11 0.020558449 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization

12 0.019756826 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit

13 0.019372983 95 jmlr-2012-Random Search for Hyper-Parameter Optimization

14 0.018066341 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering

15 0.016233791 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

16 0.015819857 20 jmlr-2012-Analysis of a Random Forests Model

17 0.015674761 55 jmlr-2012-Learning Algorithms for the Classification Restricted Boltzmann Machine

18 0.013601675 53 jmlr-2012-Jstacs: A Java Framework for Statistical Analysis and Classification of Biological Sequences

19 0.013456326 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

20 0.013131901 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.055), (1, 0.013), (2, 0.095), (3, -0.052), (4, 0.034), (5, 0.027), (6, 0.029), (7, -0.032), (8, -0.042), (9, -0.018), (10, -0.074), (11, 0.013), (12, 0.116), (13, -0.08), (14, -0.088), (15, 0.052), (16, 0.03), (17, -0.052), (18, -0.119), (19, -0.103), (20, -0.03), (21, -0.0), (22, -0.02), (23, 0.044), (24, -0.039), (25, -0.161), (26, 0.133), (27, -0.099), (28, -0.087), (29, -0.021), (30, -0.102), (31, 0.129), (32, -0.037), (33, 0.023), (34, 0.003), (35, 0.041), (36, -0.09), (37, -0.065), (38, -0.056), (39, 0.03), (40, 0.224), (41, -0.255), (42, 0.191), (43, -0.106), (44, -0.185), (45, 0.144), (46, 0.023), (47, -0.116), (48, 0.128), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97436434 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing

Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski

2 0.44956756 31 jmlr-2012-DEAP: Evolutionary Algorithms Made Easy

Author: Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, Christian Gagné

3 0.41252017 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel

Author: Stephen R. Piccolo, Lewis J. Frey

4 0.38729423 90 jmlr-2012-Pattern for Python

Author: Tom De Smedt, Walter Daelemans

5 0.32524446 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

Author: Daniel L. Ly, Hod Lipson

Abstract: A hybrid dynamical system is a mathematical model suitable for describing an extensive spectrum of multi-modal, time-series behaviors, ranging from bouncing balls to air trafﬁc controllers. This paper describes multi-modal symbolic regression (MMSR): a learning algorithm to construct non-linear symbolic representations of discrete dynamical systems with continuous mappings from unlabeled, time-series data. MMSR consists of two subalgorithms—clustered symbolic regression, a method to simultaneously identify distinct behaviors while formulating their mathematical expressions, and transition modeling, an algorithm to infer symbolic inequalities that describe binary classiﬁcation boundaries. These subalgorithms are combined to infer hybrid dynamical systems as a collection of apt, mathematical expressions. MMSR is evaluated on a collection of four synthetic data sets and outperforms other multi-modal machine learning approaches in both accuracy and interpretability, even in the presence of noise. Furthermore, the versatility of MMSR is demonstrated by identifying and inferring classical expressions of transistor modes from recorded measurements. Keywords: hybrid dynamical systems, evolutionary computation, symbolic piecewise functions, symbolic binary classiﬁcation

6 0.27923781 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

7 0.16673462 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit

8 0.16482961 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

9 0.16046707 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

10 0.16045502 54 jmlr-2012-Large-scale Linear Support Vector Regression

11 0.15130654 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

12 0.13830629 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression

13 0.13166861 20 jmlr-2012-Analysis of a Random Forests Model

14 0.12993625 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

15 0.12659934 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

16 0.11856937 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

17 0.10820568 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

18 0.10761247 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

19 0.1067974 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

20 0.10199341 101 jmlr-2012-SVDFeature: A Toolkit for Feature-based Collaborative Filtering

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.043), (26, 0.613), (27, 0.057), (29, 0.016), (56, 0.047), (69, 0.015), (75, 0.015), (77, 0.01), (92, 0.013), (96, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96014261 79 jmlr-2012-Oger: Modular Learning Architectures For Large-Scale Sequential Processing

Author: David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, Dejan Pecevski

2 0.80616891 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods

Author: Zhiwei Qin, Donald Goldfarb

Abstract: We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty l1 /l2 -norm and the l1 /l∞ -norm. We propose a uniﬁed framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efﬁciently solved. As one of the core building-blocks of this framework, we develop new algorithms using a partial-linearization/splitting technique and prove that the accelerated versions 1 of these algorithms require O( √ε ) iterations to obtain an ε-optimal solution. We compare the performance of these algorithms against that of the alternating direction augmented Lagrangian and FISTA methods on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms. Keywords: structured sparsity, overlapping Group Lasso, alternating direction methods, variable splitting, augmented Lagrangian

3 0.79885238 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

Author: Michael U. Gutmann, Aapo Hyvärinen

Abstract: We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a ﬁnite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only speciﬁed up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artiﬁcially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities. Keywords: statistics unnormalized models, partition function, computation, estimation, natural image

4 0.78237808 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon

Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet

5 0.44334692 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identiﬁcation of an appropriate distance metric on the input data. It is often beneﬁcial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semideﬁnite. Semideﬁnite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semideﬁnite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semideﬁnite matrices as weak learners within an efﬁcient and scalable boosting-based learning process. The resulting methods are easy to implement, efﬁcient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semideﬁnite matrix with trace and rank being one rather than a classiﬁer or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classiﬁcation accuracy and running time. Keywords: Mahalanobis distance, semideﬁnite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor

6 0.4341107 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

7 0.42424363 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks

8 0.41317767 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

9 0.41294548 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

10 0.3842321 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

11 0.37076527 72 jmlr-2012-Multi-Target Regression with Rule Ensembles

12 0.36537129 54 jmlr-2012-Large-scale Linear Support Vector Regression

13 0.36516225 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

14 0.36385739 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

15 0.36323214 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

16 0.35983887 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

17 0.35954332 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization

18 0.35283208 103 jmlr-2012-Sampling Methods for the Nyström Method

19 0.34835085 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

20 0.33294857 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints