jmlr jmlr2011 jmlr2011-83 knowledge-graph by maker-knowledge-mining

83 jmlr-2011-Scikit-learn: Machine Learning in Python

Source: pdf

Author: Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simpliﬁed BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net. Keywords: Python, supervised learning, unsupervised learning, model selection

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 FR Parietal, INRIA Saclay Neurospin, Bˆ t 145, CEA Saclay a 91191 Gif sur Yvette – France Olivier Grisel OLIVIER . [sent-11, score-0.044]

2 COM Total SA, CSTJF avenue Larribau 64000 Pau – France Matthieu Perrot ´ Edouard Duchesnay MATTHIEU . [sent-33, score-0.039]

3 FR LNAO Neurospin, Bˆ t 145, CEA Saclay a 91191 Gif sur Yvette – France Editor: Mikio Braun Abstract Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. [sent-37, score-0.08]

4 Emphasis is put on ease of use, performance, documentation, and API consistency. [sent-39, score-0.038]

5 It has minimal dependencies and is distributed under the simpliﬁed BSD license, encouraging its use in both academic and commercial settings. [sent-40, score-0.058]

6 Source code, binaries, and documentation can be downloaded from http://scikit-learn. [sent-41, score-0.052]

7 Keywords: Python, supervised learning, unsupervised learning, model selection 1. [sent-44, score-0.036]

8 Introduction The Python programming language is establishing itself as one of the most popular languages for scientiﬁc computing. [sent-45, score-0.036]

9 Thanks to its high-level interactive nature and its maturing ecosystem of scientiﬁc libraries, it is an appealing choice for algorithmic development and exploratory data analysis (Dubois, 2007; Milmann and Avaizis, 2011). [sent-46, score-0.08]

10 Scikit-learn harnesses this rich environment to provide state-of-the-art implementations of many well known machine learning algorithms, while maintaining an easy-to-use interface tightly integrated with the Python language. [sent-48, score-0.045]

11 Scikit-learn differs from other machine learning toolboxes in Python for various reasons: i) it is distributed under the BSD license ii) it incorporates compiled code for efﬁciency, unlike MDP (Zito et al. [sent-50, score-0.301]

12 , 2010), iii) it depends only on numpy and scipy to facilitate easy distribution, unlike pymvpa (Hanke et al. [sent-52, score-0.5]

13 , 2009) that has optional dependencies such as R and shogun, and iv) it focuses on imperative programming, unlike pybrain which uses a data-ﬂow framework. [sent-53, score-0.125]

14 While the package is mostly written in Python, it incorporates the C++ libraries LibSVM (Chang and Lin, 2001) and LibLinear (Fan et al. [sent-54, score-0.129]

15 Binary packages are available on a rich set of platforms including Windows and any POSIX platforms. [sent-56, score-0.058]

16 Finally, we strive to use consistent naming for the functions and parameters used throughout a strict adherence to the Python coding guidelines and numpy style documentation. [sent-63, score-0.25]

17 Most of the Python ecosystem is licensed with non-copyleft licenses. [sent-65, score-0.08]

18 While such policy is beneﬁcial for adoption of these tools by commercial projects, it does impose some restrictions: we are unable to use some existing scientiﬁc code, such as the GSL. [sent-66, score-0.058]

19 To lower the barrier of entry, we avoid framework code and keep the number of different objects to a minimum, relying on numpy arrays for data containers. [sent-68, score-0.369]

20 Scikit-learn provides a ∼300 page user guide including narrative documentation, class references, a tutorial, installation instructions, as well as more than 60 examples, some featuring real-world applications. [sent-73, score-0.063]

21 Input data is presented as numpy arrays, thus integrating seamlessly with other scientiﬁc Python libraries. [sent-77, score-0.25]

22 Numpy’s viewbased memory model limits copies, even when binding with compiled code (Van der Walt et al. [sent-78, score-0.167]

23 Scipy has bindings for many Fortran-based standard numerical packages, such as LAPACK. [sent-82, score-0.094]

24 This is important for ease of installation and portability, as providing libraries around Fortran code can prove challenging on various platforms. [sent-83, score-0.263]

25 Cython makes it easy to reach the performance of compiled languages with Python-like syntax and high-level operations. [sent-85, score-0.096]

26 It is also used to bind compiled libraries, eliminating the boilerplate code of Python/C extensions. [sent-86, score-0.167]

27 To facilitate the use of external objects with scikit-learn, inheritance is not enforced; instead, code conventions provide a consistent interface. [sent-89, score-0.071]

28 The central object is an estimator, that implements a fit method, accepting as arguments an input data array and, optionally, an array of labels for supervised problems. [sent-90, score-0.241]

29 Table 1: Time in seconds on the Madelon data set for various machine learning libraries exposed in Python: MLPy (Albanese et al. [sent-121, score-0.091]

30 The other important object is the cross-validation iterator, which provides pairs of train and test indices to split input data, for example K-fold, leave one out, or stratiﬁed cross-validation. [sent-130, score-0.039]

31 Scikit-learn can evaluate an estimator’s performance or select parameters using cross-validation, optionally distributing the computation to several cores. [sent-132, score-0.063]

32 This is accomplished by wrapping an estimator in a GridSearchCV object, where the “CV” stands for “cross-validated”. [sent-133, score-0.048]

33 This object can therefore be used transparently as any other estimator. [sent-136, score-0.039]

34 Cross validation can be made more efﬁcient for certain estimators by exploiting speciﬁc properties, such as warm restarts or regularization paths (Friedman et al. [sent-137, score-0.037]

35 Finally, a Pipeline object can combine several transformers and an estimator to create a combined estimator to, for example, apply dimension reduction before ﬁtting. [sent-140, score-0.198]

36 High-level yet Efﬁcient: Some Trade Offs While scikit-learn focuses on ease of use, and is mostly written in a high level language, care has been taken to maximize computational efﬁciency. [sent-143, score-0.038]

37 While all of the packages compared call libsvm in the background, the performance of scikitlearn can be explained by two factors. [sent-148, score-0.141]

38 First, our bindings avoid memory copies and have up to 40% less overhead than the original libsvm Python bindings. [sent-149, score-0.177]

39 Second, we patch libsvm to improve efﬁciency on dense data, use a smaller memory footprint, and better use memory alignment and pipelining capabilities of modern processors. [sent-150, score-0.083]

40 Pymvpa uses this implementation via the Rpy R bindings and pays a heavy price to memory copies. [sent-154, score-0.094]

41 Its performance is limited by the fact that numpy’s array operations take multiple passes over data. [sent-166, score-0.061]

42 Conclusion Scikit-learn exposes a wide variety of machine learning algorithms, both supervised and unsupervised, using a consistent, task-oriented interface, thus enabling easy comparison of methods for a given application. [sent-168, score-0.036]

43 A supervised clustering approach for fMRI-based inference of brain states. [sent-250, score-0.036]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('python', 0.454), ('numpy', 0.25), ('matthieu', 0.188), ('bsd', 0.186), ('varoquaux', 0.186), ('pymvpa', 0.156), ('cea', 0.125), ('pybrain', 0.125), ('alexandre', 0.115), ('gramfort', 0.106), ('shogun', 0.106), ('gmail', 0.099), ('michel', 0.096), ('compiled', 0.096), ('license', 0.096), ('bertrand', 0.094), ('bindings', 0.094), ('brucher', 0.094), ('dubourg', 0.094), ('duchesnay', 0.094), ('edouard', 0.094), ('edregosa', 0.094), ('grisel', 0.094), ('hanke', 0.094), ('kobe', 0.094), ('mlpy', 0.094), ('pedregosa', 0.094), ('perrot', 0.094), ('prettenhofer', 0.094), ('ramfort', 0.094), ('saclay', 0.094), ('schaul', 0.094), ('scipy', 0.094), ('vanderplas', 0.094), ('zito', 0.094), ('libraries', 0.091), ('libsvm', 0.083), ('mdp', 0.082), ('ecosystem', 0.08), ('thirion', 0.08), ('fr', 0.078), ('inria', 0.077), ('fabian', 0.072), ('code', 0.071), ('france', 0.063), ('albanese', 0.063), ('avaizis', 0.063), ('cournapeau', 0.063), ('cython', 0.063), ('dubois', 0.063), ('enthought', 0.063), ('fortran', 0.063), ('gif', 0.063), ('gridsearchcv', 0.063), ('installation', 0.063), ('mathieu', 0.063), ('milmann', 0.063), ('neurospin', 0.063), ('optionally', 0.063), ('passos', 0.063), ('transformers', 0.063), ('weimar', 0.063), ('ython', 0.063), ('yvette', 0.063), ('array', 0.061), ('vincent', 0.061), ('com', 0.058), ('commercial', 0.058), ('packages', 0.058), ('al', 0.055), ('elastic', 0.055), ('blondel', 0.053), ('neuroinformatics', 0.053), ('walt', 0.053), ('rokhlin', 0.053), ('ron', 0.053), ('olivier', 0.052), ('documentation', 0.052), ('scienti', 0.049), ('estimator', 0.048), ('arrays', 0.048), ('jake', 0.048), ('guyon', 0.048), ('liblinear', 0.048), ('gpl', 0.048), ('madelon', 0.048), ('interface', 0.045), ('sur', 0.044), ('fit', 0.044), ('achine', 0.041), ('pca', 0.039), ('avenue', 0.039), ('object', 0.039), ('amherst', 0.038), ('incorporates', 0.038), ('ease', 0.038), ('chang', 0.037), ('estimators', 0.037), ('supervised', 0.036), ('language', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 83 jmlr-2011-Scikit-learn: Machine Learning in Python

2 0.036440812 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

Author: Jan Saputra Müller, Paul von Bünau, Frank C. Meinecke, Franz J. Király, Klaus-Robert Müller

Abstract: The Stationary Subspace Analysis (SSA) algorithm linearly factorizes a high-dimensional time series into stationary and non-stationary components. The SSA Toolbox is a platform-independent efﬁcient stand-alone implementation of the SSA algorithm with a graphical user interface written in Java, that can also be invoked from the command line and from Matlab. The graphical interface guides the user through the whole process; data can be imported and exported from comma separated values (CSV) and Matlab’s .mat ﬁles. Keywords: non-stationarities, blind source separation, dimensionality reduction, unsupervised learning

3 0.034936007 50 jmlr-2011-LPmade: Link Prediction Made Easy

Author: Ryan N. Lichtenwalter, Nitesh V. Chawla

Abstract: LPmade is a complete cross-platform software solution for multi-core link prediction and related tasks and analysis. Its ﬁrst principal contribution is a scalable network library supporting highperformance implementations of the most commonly employed unsupervised link prediction methods. Link prediction in longitudinal data requires a sophisticated and disciplined procedure for correct results and fair evaluation, so the second principle contribution of LPmade is a sophisticated GNU make architecture that completely automates link prediction, prediction evaluation, and network analysis. Finally, LPmade streamlines and automates the procedure for creating multivariate supervised link prediction models with a version of WEKA modiﬁed to operate effectively on extremely large data sets. With mere minutes of manual work, one may start with a raw stream of records representing a network and progress through hundreds of steps to generate plots, gigabytes or terabytes of output, and actionable or publishable results. Keywords: link prediction, network analysis, multicore, GNU make, PropFlow, HPLP

4 0.034321137 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

Author: Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, Christian Buchta

Abstract: This paper describes the ecosystem of R add-on packages developed around the infrastructure provided by the package arules. The packages provide comprehensive functionality for analyzing interesting patterns including frequent itemsets, association rules, frequent sequences and for building applications like associative classiﬁcation. After discussing the ecosystem’s design we illustrate the ease of mining and visualizing rules with a short example. Keywords: frequent itemsets, association rules, frequent sequences, visualization 1. Overview Mining frequent itemsets and association rules is a popular and well researched method for discovering interesting relations between variables in large databases. Association rules are used in many applications and have become prominent as an important exploratory method for uncovering cross-selling opportunities in large retail databases. Agrawal et al. (1993) introduced the problem of mining association rules from transaction data as follows: Let I = {i1 , i2 , . . . , in } be a set of n binary attributes called items. Let D = {t1 ,t2 , . . . ,tm } be a set of transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. A rule is deﬁned as an implication of the form X ⇒ Y where / X,Y ⊆ I and X ∩ Y = 0 are called itemsets. On itemsets and rules several quality measures can be deﬁned. The most important measures are support and conﬁdence. The support supp(X) of an itemset X is deﬁned as the proportion of transactions in the data set which contain the itemset. Itemsets with a support which surpasses a user deﬁned threshold σ are called frequent itemsets. The conﬁdence of a rule is deﬁned as conf(X ⇒ Y ) = supp(X ∪Y )/supp(X). Association rules are rules with supp(X ∪Y ) ≥ σ and conf(X) ≥ δ where σ and δ are user deﬁned thresholds. ©2011 Michael Hahsler, Sudheer Chelluboina, Kurt Hornik and Christian Buchta. H AHSLER , C HELLUBOINA , H ORNIK AND B UCHTA Figure 1: The arules ecosystem. The R package arules (Hahsler et al., 2005, 2010) implements the basic infrastructure for creating and manipulating transaction databases and basic algorithms to efﬁciently ﬁnd and analyze association rules. Over the last ﬁve years several packages were built around the arules infrastructure to create the ecosystem shown in Figure 1. Compared to other tools, the arules ecosystem is fully integrated, implements the latest approaches and has the vast functionality of R for further analysis of found patterns at its disposal. 2. Design and Implementation The core package arules provides an object-oriented framework to represent transaction databases and patterns. To facilitate extensibility, patterns are implemented as an abstract superclass associations and then concrete subclasses implement individual types of patterns. In arules the associations itemsets and rules are provided. Databases and associations both use a sparse matrix representation for efﬁcient storage and basic operations like sorting, subsetting and matching are supported. Different aspects of arules were discussed in previous publications (Hahsler et al., 2005; Hahsler and Hornik, 2007b,a; Hahsler et al., 2008). In this paper we focus on the ecosystem of several R-packages which are built on top of the arules infrastructure. While arules provides Apriori and Eclat (implementations by Borgelt, 2003), two of the most important frequent itemset/association rule mining algorithms, additional algorithms can easily be added as new packages. For example, package arulesNBMiner (Hahsler, 2010) implements an algorithm to ﬁnd NB-frequent itemsets (Hahsler, 2006). A collection of further implementations which could be interfaced by arules in the future and a comparison of state-of-the-art algorithms can be found at the Frequent Itemset Mining Implementations Repository.1 arulesSequences (Buchta and Hahsler, 2010) implements mining frequent sequences in transaction databases. It implements additional association classes called sequences and sequencerules and provides the algorithm cSpade (Zaki, 2001) to efﬁciently mine frequent sequences. Another application currently under development is arulesClassify which uses the arules infrastructure to implement rule-based classiﬁers, including Classiﬁcation Based on Association rules (CBA, Liu et al., 1998) and general associative classiﬁcation techniques (Jalali-Heravi and Zaïane, 2010). A known drawback of mining for frequent patterns such as association rules is that typically the algorithm returns a very large set of results where only a small fraction of patterns is of interest to the analysts. Many researchers introduced visualization techniques including scatter plots, matrix 1. The Frequent Itemset Mining Implementations Repository can be found at http://fimi.ua.ac.be/. 2022 T HE ARULES R-PACKAGE E COSYSTEM Graph for 3 rules Scatter plot for 410 rules 1 size: support (0.001 − 0.0019) color: lift (8.3404 − 11.2353) red/blush wine 10 citrus fruit soda 0.95 confidence liquor 8 bottled beer fruit/vegetable juice 0.9 other vegetables 6 root vegetables 0.85 oil 4 0.8 0.001 0.0015 0.002 0.0025 0.003 whole milk lift yogurt tropical fruit support (a) (b) Figure 2: Visualization of all 410 rules as (a) a scatter plot and (b) shows the top 3 rules according to lift as a graph. visualizations, graphs, mosaic plots and parallel coordinates plots to analyze large sets of association rules (see Bruzzese and Davino, 2008, for a recent overview paper). arulesViz (Hahsler and Chelluboina, 2010) implements most of these methods for arules while also providing improvements using color shading, reordering and interactive features. Finally, arules provides a Predictive Model Markup Language (PMML) interface to import and export rules via package pmml (Williams et al., 2010). PMML is the leading standard for exchanging statistical and data mining models and is supported by all major solution providers. Although pmml provides interfaces for different packages it is still considered part of the arules ecosystem. The packages in the described ecosystem are available for Linux, OS X and Windows. All packages are distributed via the Comprehensive R Archive Network2 under GPL-2, along with comprehensive manuals, documentation, regression tests and source code. Development versions of most packages are available from R-Forge.3 3. User Interface We illustrate the user interface and the interaction between the packages in the arules ecosystem with a small example using a retail data set called Groceries which contains 9835 transactions with items aggregated to 169 categories. We mine association rules and then present the rules found as well as the top 3 rules according to the interest measure lift (deviation from independence) in two visualizations. > > > > library(

5 0.031157767 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

Abstract: We present a breadth-oriented collection of cross-platform command-line tools for researchers in machine learning called Wafﬂes. The Wafﬂes tools are designed to offer a broad spectrum of functionality in a manner that is friendly for scripted automation. All functionality is also available in a C++ class library. Wafﬂes is available under the GNU Lesser General Public License. Keywords: machine learning, toolkits, data mining, C++, open source

6 0.030941844 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

7 0.028863307 63 jmlr-2011-MULAN: A Java Library for Multi-Label Learning

8 0.02851361 58 jmlr-2011-Learning from Partial Labels

9 0.023853324 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

10 0.023546297 97 jmlr-2011-Union Support Recovery in Multi-task Learning

11 0.022499692 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

12 0.022280183 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

13 0.021981614 105 jmlr-2011-lp-Norm Multiple Kernel Learning

14 0.021679422 37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices

15 0.021605104 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

16 0.020708527 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

17 0.020576652 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

18 0.019687157 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

19 0.019554159 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

20 0.019262454 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.091), (1, -0.026), (2, 0.014), (3, -0.004), (4, -0.052), (5, 0.029), (6, -0.029), (7, -0.025), (8, -0.016), (9, -0.034), (10, -0.045), (11, -0.039), (12, 0.132), (13, -0.043), (14, -0.22), (15, -0.076), (16, 0.03), (17, 0.037), (18, 0.095), (19, 0.101), (20, 0.013), (21, -0.121), (22, 0.042), (23, -0.037), (24, -0.039), (25, -0.011), (26, -0.061), (27, 0.089), (28, 0.047), (29, -0.05), (30, 0.036), (31, -0.152), (32, 0.045), (33, 0.023), (34, -0.106), (35, 0.248), (36, 0.041), (37, -0.107), (38, -0.231), (39, -0.152), (40, 0.19), (41, -0.049), (42, 0.047), (43, -0.115), (44, 0.061), (45, 0.156), (46, -0.062), (47, -0.367), (48, -0.028), (49, -0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97623903 83 jmlr-2011-Scikit-learn: Machine Learning in Python

2 0.38512903 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

Author: Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, Christian Buchta

3 0.34183919 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

4 0.28641108 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

Author: Fabien Lauer, Yann Guermeur

Abstract: This paper describes MSVMpack, an open source software package dedicated to our generic model of multi-class support vector machine. All four multi-class support vector machines (M-SVMs) proposed so far in the literature appear as instances of this model. MSVMpack provides for them the ﬁrst uniﬁed implementation and offers a convenient basis to develop other instances. This is also the ﬁrst parallel implementation for M-SVMs. The package consists in a set of command-line tools with a callable library. The documentation includes a tutorial, a user’s guide and a developer’s guide. Keywords: multi-class support vector machines, open source, C

5 0.28255227 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

Author: Jan Saputra Müller, Paul von Bünau, Frank C. Meinecke, Franz J. Király, Klaus-Robert Müller

6 0.21039878 6 jmlr-2011-A Simpler Approach to Matrix Completion

7 0.18932045 58 jmlr-2011-Learning from Partial Labels

8 0.15260199 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

9 0.15080658 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

10 0.14821172 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis

11 0.14682008 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

12 0.14496559 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

13 0.12457649 12 jmlr-2011-Bayesian Co-Training

14 0.12002219 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

15 0.11906984 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods

16 0.11889856 95 jmlr-2011-Training SVMs Without Offset

17 0.11176367 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

18 0.10647646 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

19 0.10001723 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

20 0.095629603 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.013), (9, 0.014), (10, 0.016), (24, 0.026), (31, 0.04), (32, 0.024), (41, 0.011), (60, 0.717), (73, 0.013), (78, 0.027), (87, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94862443 83 jmlr-2011-Scikit-learn: Machine Learning in Python

2 0.71126401 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

Author: Lucas Theis, Sebastian Gerwinn, Fabian Sinz, Matthias Bethge

Abstract: Statistical models of natural images provide an important tool for researchers in the ﬁelds of machine learning and computational neuroscience. The canonical measure to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical models which has recently gained increasing popularity and has been applied to a variety of complex data is formed by deep belief networks. Analyses of these models, however, have often been limited to qualitative analyses based on samples due to the computationally intractable nature of their likelihood. Motivated by these circumstances, the present article introduces a consistent estimator for the likelihood of deep belief networks which is computationally tractable and simple to apply in practice. Using this estimator, we quantitatively investigate a deep belief network for natural image patches and compare its performance to the performance of other models for natural image patches. We ﬁnd that the deep belief network is outperformed with respect to the likelihood even by very simple mixture models. Keywords: deep belief network, restricted Boltzmann machine, likelihood estimation, natural image statistics, potential log-likelihood

3 0.24638635 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

Author: Fabien Lauer, Yann Guermeur

4 0.22203784 48 jmlr-2011-Kernel Analysis of Deep Networks

Author: Grégoire Montavon, Mikio L. Braun, Klaus-Robert Müller

Abstract: When training deep networks it is common knowledge that an efﬁcient and well generalizing representation of the problem is formed. In this paper we aim to elucidate what makes the emerging representation successful. We analyze the layer-wise evolution of the representation in a deep network by building a sequence of deeper and deeper kernels that subsume the mapping performed by more and more layers of the deep network and measuring how these increasingly complex kernels ﬁt the learning problem. We observe that deep networks create increasingly better representations of the learning problem and that the structure of the deep network controls how fast the representation of the task is formed layer after layer. Keywords: deep networks, kernel principal component analysis, representations

5 0.20339571 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

Author: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

Abstract: We propose a uniﬁed neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-speciﬁc engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements. Keywords: natural language processing, neural networks

6 0.19913329 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

7 0.19721153 102 jmlr-2011-Waffles: A Machine Learning Toolkit

8 0.15641521 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

9 0.15574068 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

10 0.15123332 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

11 0.13634311 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels

12 0.13528693 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

13 0.13266854 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

14 0.13232908 40 jmlr-2011-Hyper-Sparse Optimal Aggregation

15 0.12975472 12 jmlr-2011-Bayesian Co-Training

16 0.12774697 50 jmlr-2011-LPmade: Link Prediction Made Easy

17 0.12515588 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

18 0.12503763 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

19 0.12500367 46 jmlr-2011-Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

20 0.1238476 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling