jmlr jmlr2011 jmlr2011-50 knowledge-graph by maker-knowledge-mining

50 jmlr-2011-LPmade: Link Prediction Made Easy

Source: pdf

Author: Ryan N. Lichtenwalter, Nitesh V. Chawla

Abstract: LPmade is a complete cross-platform software solution for multi-core link prediction and related tasks and analysis. Its ﬁrst principal contribution is a scalable network library supporting highperformance implementations of the most commonly employed unsupervised link prediction methods. Link prediction in longitudinal data requires a sophisticated and disciplined procedure for correct results and fair evaluation, so the second principle contribution of LPmade is a sophisticated GNU make architecture that completely automates link prediction, prediction evaluation, and network analysis. Finally, LPmade streamlines and automates the procedure for creating multivariate supervised link prediction models with a version of WEKA modiﬁed to operate effectively on extremely large data sets. With mere minutes of manual work, one may start with a raw stream of records representing a network and progress through hundreds of steps to generate plots, gigabytes or terabytes of output, and actionable or publishable results. Keywords: link prediction, network analysis, multicore, GNU make, PropFlow, HPLP

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Computer Science University of Notre Dame Notre Dame, IN 46556, USA Editor: Geoff Holmes Abstract LPmade is a complete cross-platform software solution for multi-core link prediction and related tasks and analysis. [sent-5, score-0.528]

2 Its ﬁrst principal contribution is a scalable network library supporting highperformance implementations of the most commonly employed unsupervised link prediction methods. [sent-6, score-0.958]

3 Finally, LPmade streamlines and automates the procedure for creating multivariate supervised link prediction models with a version of WEKA modiﬁed to operate effectively on extremely large data sets. [sent-8, score-0.691]

4 With mere minutes of manual work, one may start with a raw stream of records representing a network and progress through hundreds of steps to generate plots, gigabytes or terabytes of output, and actionable or publishable results. [sent-9, score-0.405]

5 Keywords: link prediction, network analysis, multicore, GNU make, PropFlow, HPLP 1. [sent-10, score-0.487]

6 Introduction Link prediction is succinctly stated as the problem of identifying yet-unobserved links in a network. [sent-11, score-0.108]

7 This task is of increasing interest in both research and corporate contexts. [sent-12, score-0.067]

8 Virtually every major conference and journal in data mining or machine learning now has a signiﬁcant network science component, and these often include treatments of link prediction. [sent-13, score-0.487]

9 Further, even for standard prediction algorithms, researchers must often write new code or cobble together existing code fragments. [sent-16, score-0.259]

10 The work ﬂow to achieve predictions and fair evaluation is time-consuming, challenging, and error-prone. [sent-17, score-0.132]

11 LPmade is the ﬁrst library to focus on link prediction speciﬁcally, incorporating general and extensible forms of the predictors introduced by Liben-Nowell and Kleinberg (2007). [sent-18, score-0.767]

12 It also streamlines and parameterizes the complex link prediction work ﬂow so that researchers can start with source data and achieve predictions in minimal time. [sent-19, score-0.711]

13 Some offer extreme generality, some offer extreme efﬁciency, some offer modeling utilities, and some have a dizzying array of algorithms. [sent-21, score-0.242]

14 Its software components are, by necessity, designed for high performance, and it offers a wide array of graph analysis algorithms, but it is ﬁrst and foremost c 2011 Ryan N. [sent-23, score-0.147]

15 L ICHTENWALTER AND C HAWLA an extensive toolkit for performing link prediction to achieve both research and application goals. [sent-26, score-0.522]

16 Unlike other options, LPmade provides an organized collection of link prediction algorithms in a build framework that is accessible to researchers across many disciplines. [sent-27, score-0.627]

17 The Software Package The purpose of LPmade is to provide a workbench on which others may conduct link prediction research and applications. [sent-31, score-0.488]

18 For link prediction tasks in many large networks even a restricted set of predictions may involve millions, billions, or even trillions of lines of output. [sent-32, score-0.544]

19 Each unsupervised link prediction method, the supervised classiﬁcation framework from Lichtenwalter et al. [sent-33, score-0.576]

20 (2010), and all the evaluation tools are optimized for just such quantities of data. [sent-34, score-0.083]

21 Nonetheless, the entire process of starting from raw source data and ending with predictions, evaluations, and plots involves an extensive series of steps that may each take a long time. [sent-35, score-0.245]

22 The software includes a carefully constructed dependency tracking system that minimizes overhead and simpliﬁes the management of correct procedures. [sent-36, score-0.167]

23 Both the build system and the link prediction library are modular and extensible. [sent-37, score-0.833]

24 Researchers can incorporate their own prediction methods into the library and the automation framework just by writing a C++ class and changing a make variable. [sent-38, score-0.461]

25 The library includes clearly written yet optimized versions of the most common asymptotically optimal network analysis algorithms for sampling, ﬁnding connected components, computing centrality measures, and calculating useful statistics. [sent-41, score-0.354]

26 LPmade specializes in link prediction by including commonly used unsupervised link prediction methods: Adamic/Adar, common neighbors, Jaccard’s coefﬁcient, Katz, preferential attachment, PropFlow, rooted PageRank, SimRank, and weighted rooted PageRank. [sent-42, score-1.167]

27 The library also has some simpler methods useful in producing feature vectors for supervised learners: clustering coefﬁcient, geodesic distance, degree, PageRank, volume or gregariousness, mutuality, path count, and shortest path count. [sent-43, score-0.256]

28 These methods may be selectively incorporated as features into the supervised framework by Lichtenwalter et al. [sent-44, score-0.111]

29 Several graph libraries such as the Boost Graph Library are brilliantly designed for maximum generality and ﬂexibility with template parameters and complex inheritance models. [sent-46, score-0.156]

30 One minor drawback to such libraries is that the code is complex to read and modify. [sent-47, score-0.118]

31 The code base for this library takes a narrower approach by offering fewer mechanisms for generality, but as a result it has a much shallower learning curve. [sent-48, score-0.326]

32 2 GNU make Script and Supporting Tools Although it can be used and extended as such, LPmade is not just a library of C++ code for network analysis and link prediction. [sent-50, score-0.777]

33 It is additionally an extensive set of scripts designed for sophisticated automation and dependency resolution. [sent-51, score-0.235]

34 These scripts are all incorporated into a set of 2 co-dependent Makeﬁles: task-speciﬁc and common. [sent-52, score-0.081]

35 Each step involves multiple invocations of many programs to properly assemble data and perform fair evaluation. [sent-55, score-0.041]

36 speciﬁc Makeﬁle, which generally requires less than 20 lines of user code. [sent-56, score-0.044]

37 This Makeﬁle is where users specify the manner in which raw source data is converted to the initial data stream required by subsequent steps in the pipeline. [sent-57, score-0.3]

38 It is also where rules from the common Makeﬁle can be overridden for task-speciﬁc reasons. [sent-58, score-0.042]

39 common, includes all the general rules that apply to any network analysis or link prediction task once the task-speciﬁc Makeﬁle is written to enable proper handling of raw input. [sent-60, score-0.812]

40 The common Makeﬁle script is designed with advanced template features that allow make to modify original Makeﬁle rules in accordance with user requirements. [sent-61, score-0.286]

41 Logical tasks are aggressively provided with their own rules so that the multi-core features of GNU make are of optimal beneﬁt. [sent-62, score-0.072]

42 In general, users need not be familiar with writing Makeﬁles. [sent-63, score-0.078]

43 The important options for the behavior of the automatic build system are presented at the top of the common Makeﬁle along with documentation. [sent-64, score-0.167]

44 Each rule in the Makeﬁle script with no outstanding prerequisites is handled by a separate process to make use of additional cores. [sent-67, score-0.113]

45 For many large networks, link prediction and supporting analysis yields very large output ﬁles. [sent-68, score-0.607]

46 When this proliﬁc output is further combined into data sets, both the I/O capacity and bandwidth requirements may be problematic. [sent-69, score-0.148]

47 To combat this, most steps in the work ﬂow create, accept, and output gzip-compressed results. [sent-70, score-0.075]

48 Especially on multi-core systems, this results in a hefty decrease in I/O capacity and bandwidth requirements with a minimal impact on performance. [sent-71, score-0.106]

49 In most cases, the output from gunzip is produced faster than the consuming process can accept it. [sent-72, score-0.097]

50 Where necessary, named pipes are used to ameliorate potentially large temporary storage requirements. [sent-73, score-0.066]

51 3 WEKA Modiﬁcations LPmade includes a modiﬁed version of WEKA 3. [sent-75, score-0.036]

52 Instead the build system uses WEKA classiﬁer implementations to construct 2491 L ICHTENWALTER AND C HAWLA supervised models for link prediction. [sent-79, score-0.559]

53 Unmodiﬁed, WEKA has several limitations that make even its command-line mode problematic for operation on enormous link prediction testing sets. [sent-80, score-0.518]

54 These include processing overhead for unwanted computations, Java string overﬂow and potential thrashing from in-memory result concatenation, and inability to handle compressed C4. [sent-81, score-0.076]

55 Alternatives such as MOA solve some but not all of these problems, and WEKA internal classes such as AbstractOutput are unavailable at the command line. [sent-83, score-0.033]

56 We have chosen to modify the WEKA command-line evaluation path to compute only the necessary information and to output directly to standard output for LPmade scripted downstream processing. [sent-84, score-0.185]

57 5 input and use this support in the build system to take advantage of signiﬁcant space savings on disk. [sent-86, score-0.134]

58 The network library includes an easily extended testing architecture for testing and veriﬁcation of individual binaries. [sent-89, score-0.419]

59 The C++ library is written in platform-independent C++ code using only STL extensions. [sent-90, score-0.26]

60 The library may thus be built on any architecture and any operating system that provides a C++ compiler. [sent-91, score-0.324]

61 An included set of high-speed evaluation tools is written in C99 and builds on any system with such a compiler. [sent-92, score-0.131]

62 The bundled distribution of WEKA is cross-platform but requires version 1. [sent-93, score-0.079]

63 The common Makeﬁle additionally employs many standard tools such as cut, paste, sed, awk, perl, sort, gzip, and bundled gnuplot 4. [sent-96, score-0.127]

64 Acknowledgments Research was sponsored in part by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 and in part by the National Science Foundation Grant BCS-0826958. [sent-99, score-0.033]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lpmade', 0.55), ('link', 0.38), ('lichtenwalter', 0.236), ('weka', 0.221), ('library', 0.211), ('raw', 0.139), ('gnu', 0.138), ('le', 0.136), ('nitesh', 0.118), ('prediction', 0.108), ('network', 0.107), ('ryan', 0.1), ('build', 0.086), ('stream', 0.083), ('cores', 0.083), ('script', 0.083), ('automates', 0.079), ('bundled', 0.079), ('dame', 0.079), ('hawla', 0.079), ('ichtenwalter', 0.079), ('notre', 0.079), ('pagerank', 0.079), ('propflow', 0.079), ('streamlines', 0.079), ('automation', 0.077), ('supporting', 0.077), ('libraries', 0.069), ('ow', 0.068), ('corporate', 0.067), ('witten', 0.067), ('architecture', 0.065), ('longitudinal', 0.06), ('predictions', 0.056), ('accept', 0.055), ('growth', 0.055), ('researchers', 0.053), ('template', 0.051), ('code', 0.049), ('scripts', 0.048), ('system', 0.048), ('tools', 0.048), ('offer', 0.046), ('supervised', 0.045), ('user', 0.044), ('users', 0.043), ('overhead', 0.043), ('manual', 0.043), ('unsupervised', 0.043), ('evaluations', 0.043), ('rules', 0.042), ('output', 0.042), ('boost', 0.041), ('rooted', 0.041), ('fair', 0.041), ('bandwidth', 0.04), ('software', 0.04), ('sophisticated', 0.04), ('array', 0.038), ('format', 0.038), ('plots', 0.037), ('designed', 0.036), ('includes', 0.036), ('writing', 0.035), ('predictors', 0.035), ('evaluation', 0.035), ('source', 0.035), ('extensive', 0.034), ('extreme', 0.033), ('requirements', 0.033), ('utilities', 0.033), ('jon', 0.033), ('army', 0.033), ('selectively', 0.033), ('extensible', 0.033), ('specializes', 0.033), ('ameliorate', 0.033), ('combat', 0.033), ('downstream', 0.033), ('eibe', 0.033), ('foremost', 0.033), ('geoff', 0.033), ('inability', 0.033), ('jaccard', 0.033), ('narrower', 0.033), ('parallelism', 0.033), ('paste', 0.033), ('preferential', 0.033), ('scripted', 0.033), ('shallower', 0.033), ('sponsored', 0.033), ('temporary', 0.033), ('terabytes', 0.033), ('incorporated', 0.033), ('capacity', 0.033), ('internal', 0.033), ('options', 0.033), ('scalable', 0.032), ('paths', 0.031), ('make', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 50 jmlr-2011-LPmade: Link Prediction Made Easy

Author: Ryan N. Lichtenwalter, Nitesh V. Chawla

2 0.10519243 63 jmlr-2011-MULAN: A Java Library for Multi-Label Learning

Author: Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, Ioannis Vlahavas

Abstract: M ULAN is a Java library for learning from multi-label data. It offers a variety of classiﬁcation, ranking, thresholding and dimensionality reduction algorithms, as well as algorithms for learning from hierarchically structured labels. In addition, it contains an evaluation framework that calculates a rich variety of performance measures. Keywords: multi-label data, classiﬁcation, ranking, thresholding, dimensionality reduction, hierarchical classiﬁcation, evaluation 1. Multi-Label Learning A multi-label data set consists of training examples that are associated with a subset of a ﬁnite set of labels. Nowadays, multi-label data are becoming ubiquitous. They arise in an increasing number and diversity of applications, such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions. There exist two major multi-label learning tasks (Tsoumakas et al., 2010): multi-label classiﬁcation and label ranking. The former is concerned with learning a model that outputs a bipartition of the set of labels into relevant and irrelevant with respect to a query instance. The latter is concerned with learning a model that outputs a ranking of the labels according to their relevance to a query instance. Some algorithms learn models that serve both tasks. Several algorithms learn models that primarily output a vector of numerical scores, one for each label. This vector is then converted to a ranking after solving ties, or to a bipartition, after thresholding (Ioannou et al., 2010). Multi-label learning methods addressing these tasks can be grouped into two categories (Tsoumakas et al., 2010): problem transformation and algorithm adaptation. The ﬁrst group of methods are algorithm independent. They transform the learning task into one or more singlelabel classiﬁcation tasks, for which a large body of learning algorithms exists. The second group of methods extend speciﬁc learning algorithms in order to handle multi-label data directly. There exist extensions of decision tree learners, nearest neighbor classiﬁers, neural networks, ensemble methods, support vector machines, kernel methods, genetic algorithms and others. Multi-label learning stretches across several other tasks. When labels are structured as a treeshaped hierarchy or a directed acyclic graph, then we have the interesting task of hierarchical multilabel learning. Dimensionality reduction is another important task for multi-label data, as it is for c 2011 Grigorios Tsoumakas, Eleftherios Spyromitros-Xiouﬁs, Jozef Vilcek and Ioannis Vlahavas. T SOUMAKAS , S PYROMITROS -X IOUFIS , V ILCEK AND V LAHAVAS any kind of data. When bags of instances are used to represent a training object, then multi-instance multi-label learning algorithms are required. There also exist semi-supervised learning and active learning algorithms for multi-label data. 2. The M ULAN Library The main goal of M ULAN is to bring the beneﬁts of machine learning open source software (MLOSS) (Sonnenburg et al., 2007) to people working with multi-label data. The availability of MLOSS is especially important in emerging areas like multi-label learning, because it removes the burden of implementing related work and speeds up the scientiﬁc progress. In multi-label learning, an extra burden is implementing appropriate evaluation measures, since these are different compared to traditional supervised learning tasks. Evaluating multi-label algorithms with a variety of measures, is considered important by the community, due to the different types of output (bipartition, ranking) and diverse applications. Towards this goal, M ULAN offers a plethora of state-of-the-art algorithms for multi-label classiﬁcation and label ranking and an evaluation framework that computes a large variety of multi-label evaluation measures through hold-out evaluation and cross-validation. In addition, the library offers a number of thresholding strategies that produce bipartitions from score vectors, simple baseline methods for multi-label dimensionality reduction and support for hierarchical multi-label classiﬁcation, including an implemented algorithm. M ULAN is a library. As such, it offers only programmatic API to the library users. There is no graphical user interface (GUI) available. The possibility to use the library via command line, is also currently not supported. Another drawback of M ULAN is that it runs everything in main memory so there exist limitations with very large data sets. M ULAN is written in Java and is built on top of Weka (Witten and Frank, 2005). This choice was made in order to take advantage of the vast resources of Weka on supervised learning algorithms, since many state-of-the-art multi-label learning algorithms are based on problem transformation. The fact that several machine learning researchers and practitioners are familiar with Weka was another reason for this choice. However, many aspects of the library are independent of Weka and there are interfaces for most of the core classes. M ULAN is an advocate of open science in general. One of the unique features of the library is a recently introduced experiments package, whose goal is to host code that reproduces experimental results reported on published papers on multi-label learning. To the best of our knowledge, most of the general learning platforms, like Weka, don’t support multi-label data. There are currently only a number of implementations of speciﬁc multi-label learning algorithms, but not a general library like M ULAN. 3. Using M ULAN This section presents an example of how to setup an experiment for empirically evaluating two multi-label algorithms on a multi-label data set using cross-validation. We create a new Java class for this experiment, which we call MulanExp1.java. The ﬁrst thing to do is load the multi-label data set that will be used for the empirical evaluation. M ULAN requires two text ﬁles for the speciﬁcation of a data set. The ﬁrst one is in the ARFF format of Weka. The labels should be speciﬁed as nominal attributes with values “0” and “1” indicating 2412 M ULAN : A JAVA L IBRARY FOR M ULTI -L ABEL L EARNING absence and presence of the label respectively. The second ﬁle is in XML format. It speciﬁes the labels and any hierarchical relationships among them. Hierarchies of labels can be expressed in the XML ﬁle by nesting the label tag. In our example, the two ﬁlenames are given to the experiment class through command-line parameters. String arffFile = Utils.getOption(

3 0.064670347 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

Abstract: We present a breadth-oriented collection of cross-platform command-line tools for researchers in machine learning called Wafﬂes. The Wafﬂes tools are designed to offer a broad spectrum of functionality in a manner that is friendly for scripted automation. All functionality is also available in a C++ class library. Wafﬂes is available under the GNU Lesser General Public License. Keywords: machine learning, toolkits, data mining, C++, open source

4 0.034959294 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

Author: Yoshinori Tamada, Seiya Imoto, Satoru Miyano

Abstract: We present a parallel algorithm for the score-based optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n · 2n ) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n nodes. The bottleneck of the problem is the memory requirement, and therefore, the algorithm is currently applicable for up to a few tens of nodes. While the recently proposed algorithm overcomes this limitation by a space-time trade-off, our proposed algorithm realizes direct parallelization of the original DP algorithm with O(nσ ) time and space overhead calculations, where σ > 0 controls the communication-space trade-off. The overall time and space complexity is O(nσ+1 2n ). This algorithm splits the search space so that the required communication between independent calculations is minimal. Because of this advantage, our algorithm can run on distributed memory supercomputers. Through computational experiments, we conﬁrmed that our algorithm can run in parallel using up to 256 processors with a parallelization efﬁciency of 0.74, compared to the original DP algorithm with a single processor. We also demonstrate optimal structure search for a 32-node network without any constraints, which is the largest network search presented in literature. Keywords: optimal Bayesian network structure, parallel algorithm

5 0.034936007 83 jmlr-2011-Scikit-learn: Machine Learning in Python

Author: Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simpliﬁed BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net. Keywords: Python, supervised learning, unsupervised learning, model selection

6 0.034860071 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

7 0.034314707 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

8 0.033456415 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

9 0.032560546 48 jmlr-2011-Kernel Analysis of Deep Networks

10 0.030710522 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

11 0.028868554 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

12 0.027411535 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

13 0.025296099 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

14 0.025211688 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

15 0.024798321 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

16 0.024721127 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

17 0.024164902 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

18 0.023461005 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

19 0.021467503 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

20 0.020547599 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.1), (1, -0.056), (2, -0.003), (3, -0.043), (4, -0.073), (5, 0.018), (6, 0.021), (7, -0.001), (8, 0.005), (9, 0.014), (10, -0.121), (11, -0.03), (12, 0.232), (13, 0.046), (14, -0.337), (15, -0.145), (16, -0.001), (17, 0.057), (18, 0.146), (19, 0.181), (20, 0.015), (21, -0.125), (22, 0.048), (23, 0.009), (24, -0.001), (25, 0.011), (26, 0.137), (27, -0.163), (28, -0.039), (29, 0.118), (30, -0.05), (31, -0.009), (32, -0.054), (33, -0.082), (34, 0.118), (35, -0.147), (36, -0.043), (37, 0.014), (38, 0.12), (39, 0.162), (40, -0.021), (41, -0.02), (42, -0.058), (43, -0.133), (44, 0.033), (45, -0.099), (46, -0.076), (47, 0.1), (48, 0.008), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98369157 50 jmlr-2011-LPmade: Link Prediction Made Easy

Author: Ryan N. Lichtenwalter, Nitesh V. Chawla

2 0.88843775 63 jmlr-2011-MULAN: A Java Library for Multi-Label Learning

Author: Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, Ioannis Vlahavas

3 0.57522291 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

4 0.20069388 48 jmlr-2011-Kernel Analysis of Deep Networks

Author: Grégoire Montavon, Mikio L. Braun, Klaus-Robert Müller

Abstract: When training deep networks it is common knowledge that an efﬁcient and well generalizing representation of the problem is formed. In this paper we aim to elucidate what makes the emerging representation successful. We analyze the layer-wise evolution of the representation in a deep network by building a sequence of deeper and deeper kernels that subsume the mapping performed by more and more layers of the deep network and measuring how these increasingly complex kernels ﬁt the learning problem. We observe that deep networks create increasingly better representations of the learning problem and that the structure of the deep network controls how fast the representation of the task is formed layer after layer. Keywords: deep networks, kernel principal component analysis, representations

5 0.19117528 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

Author: Yoshinori Tamada, Seiya Imoto, Satoru Miyano

6 0.18807951 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

7 0.17786708 58 jmlr-2011-Learning from Partial Labels

8 0.16660024 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

9 0.16460587 93 jmlr-2011-The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

10 0.16011049 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

11 0.14940557 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

12 0.13935208 60 jmlr-2011-Locally Defined Principal Curves and Surfaces

13 0.13622884 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

14 0.1359672 92 jmlr-2011-The Stationary Subspace Analysis Toolbox

15 0.13315511 72 jmlr-2011-On the Relation between Realizable and Nonrealizable Cases of the Sequence Prediction Problem

16 0.125093 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

17 0.12494544 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

18 0.12168486 61 jmlr-2011-Logistic Stick-Breaking Process

19 0.12085337 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

20 0.11222591 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.034), (9, 0.017), (10, 0.039), (24, 0.022), (31, 0.068), (32, 0.044), (36, 0.559), (41, 0.017), (60, 0.029), (73, 0.017), (78, 0.03), (90, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87651193 50 jmlr-2011-LPmade: Link Prediction Made Easy

Author: Ryan N. Lichtenwalter, Nitesh V. Chawla

2 0.28800881 102 jmlr-2011-Waffles: A Machine Learning Toolkit

Author: Michael Gashler

3 0.20975403 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

Author: Fabien Lauer, Yann Guermeur

Abstract: This paper describes MSVMpack, an open source software package dedicated to our generic model of multi-class support vector machine. All four multi-class support vector machines (M-SVMs) proposed so far in the literature appear as instances of this model. MSVMpack provides for them the ﬁrst uniﬁed implementation and offers a convenient basis to develop other instances. This is also the ﬁrst parallel implementation for M-SVMs. The package consists in a set of command-line tools with a callable library. The documentation includes a tutorial, a user’s guide and a developer’s guide. Keywords: multi-class support vector machines, open source, C

4 0.19701606 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Author: Volodymyr Melnykov, Ranjan Maitra

Abstract: This paper presents the C LUSTERING A LGORITHMS ’ R EFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper brieﬂy describes the software and its capabilities. Keywords: CARP, M IX S IM, clustering algorithm, Gaussian mixture, overlap

5 0.1884084 68 jmlr-2011-Natural Language Processing (Almost) from Scratch

Author: Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa

Abstract: We propose a uniﬁed neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-speciﬁc engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements. Keywords: natural language processing, neural networks

6 0.17743918 63 jmlr-2011-MULAN: A Java Library for Multi-Label Learning

7 0.17727032 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

8 0.17599013 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

9 0.16998486 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

10 0.16720949 48 jmlr-2011-Kernel Analysis of Deep Networks

11 0.16575845 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

12 0.16535804 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

13 0.16525257 42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

14 0.16354744 16 jmlr-2011-Clustering Algorithms for Chains

15 0.16236497 5 jmlr-2011-A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin

16 0.16186339 12 jmlr-2011-Bayesian Co-Training

17 0.16108765 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

18 0.16028313 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

19 0.15960784 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

20 0.15942821 95 jmlr-2011-Training SVMs Without Offset