jmlr jmlr2011 knowledge-graph by maker-knowledge-mining
Author: Stéphane Ross, Joelle Pineau, Brahim Chaib-draa, Pierre Kreitmann
Abstract: Bayesian learning methods have recently been shown to provide an elegant solution to the explorationexploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent’s return improve as a function of experience. Keywords: processes reinforcement learning, Bayesian inference, partially observable Markov decision
2 jmlr-2011-A Bayesian Approximation Method for Online Ranking
Author: Ruby C. Weng, Chih-Jen Lin
Abstract: This paper describes a Bayesian approximation method to obtain online ranking algorithms for games with multiple teams and multiple players. Recently for Internet games large online ranking systems are much needed. We consider game models in which a k-team game is treated as several two-team games. By approximating the expectation of teams’ (or players’) performances, we derive simple analytic update rules. These update rules, without numerical integrations, are very easy to interpret and implement. Experiments on game data show that the accuracy of our approach is competitive with state of the art systems such as TrueSkill, but the running time as well as the code is much shorter. Keywords: Bayesian inference, rating system, Bradley-Terry model, Thurstone-Mosteller model, Plackett-Luce model
3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
Author: Trine Julie Abrahamsen, Lars Kai Hansen
Abstract: Small sample high-dimensional principal component analysis (PCA) suffers from variance inflation and lack of generalizability. It has earlier been pointed out that a simple leave-one-out variance renormalization scheme can cure the problem. In this paper we generalize the cure in two directions: First, we propose a computationally less intensive approximate leave-one-out estimator, secondly, we show that variance inflation is also present in kernel principal component analysis (kPCA) and we provide a non-parametric renormalization scheme which can quite efficiently restore generalizability in kPCA. As for PCA our analysis also suggests a simplified approximate expression. Keywords: PCA, kernel PCA, generalizability, variance renormalization
4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi
Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-definite programming, semi-supervised learning, side information, pairwise constraints
5 jmlr-2011-A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin
Author: Liwei Wang, Masashi Sugiyama, Zhaoxiang Jing, Cheng Yang, Zhi-Hua Zhou, Jufu Feng
Abstract: Much attention has been paid to the theoretical explanation of the empirical success of AdaBoost. The most influential work is the margin theory, which is essentially an upper bound for the generalization error of any voting classifier in terms of the margin distribution over the training data. However, important questions were raised about the margin explanation. Breiman (1999) proved a bound in terms of the minimum margin, which is sharper than the margin distribution bound. He argued that the minimum margin would be better in predicting the generalization error. Grove and Schuurmans (1998) developed an algorithm called LP-AdaBoost which maximizes the minimum margin while keeping all other factors the same as AdaBoost. In experiments however, LP-AdaBoost usually performs worse than AdaBoost, putting the margin explanation into serious doubt. In this paper, we make a refined analysis of the margin theory. We prove a bound in terms of a new margin measure called the Equilibrium margin (Emargin). The Emargin bound is uniformly ©2011 Liwei Wang, Masashi Sugiyama, Zhaoxiang Jing, Cheng Yang, Zhi-Hua Zhou and Jufu Feng. WANG , S UGIYAMA , J ING , YANG , Z HOU AND F ENG sharper than Breiman’s minimum margin bound. Thus our result suggests that the minimum margin may be not crucial for the generalization error. We also show that a large Emargin and a small empirical error at Emargin imply a smaller bound of the generalization error. Experimental results on benchmark data sets demonstrate that AdaBoost usually has a larger Emargin and a smaller test error than LP-AdaBoost, which agrees well with our theory. Keywords: boosting, margin bounds, voting classifier
6 jmlr-2011-A Simpler Approach to Matrix Completion
Author: Benjamin Recht
Abstract: This paper provides the best bounds to date on the number of randomly sampled entries required to reconstruct an unknown low-rank matrix. These results improve on prior work by Cand` s and e Recht (2009), Cand` s and Tao (2009), and Keshavan et al. (2009). The reconstruction is accome plished by minimizing the nuclear norm, or sum of the singular values, of the hidden matrix subject to agreement with the provided entries. If the underlying matrix satisfies a certain incoherence condition, then the number of entries required is equal to a quadratic logarithmic factor times the number of parameters in the singular value decomposition. The proof of this assertion is short, self contained, and uses very elementary analysis. The novel techniques herein are based on recent work in quantum information theory. Keywords: matrix completion, low-rank matrices, convex optimization, nuclear norm minimization, random matrices, operator Chernoff bound, compressed sensing
7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
8 jmlr-2011-Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Author: John Duchi, Elad Hazan, Yoram Singer
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms. Keywords: subgradient methods, adaptivity, online learning, stochastic convex optimization
9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
Author: Piotr Zwiernik
Abstract: The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisÄ?Ĺš ed in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold. Keywords: BIC, marginal likelihood, singular models, tree models, Bayesian networks, real logcanonical threshold
10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
Author: Lars Omlor, Martin A. Giese
Abstract: Blind source separation problems emerge in many applications, where signals can be modeled as superpositions of multiple sources. Many popular applications of blind source separation are based on linear instantaneous mixture models. If specific invariance properties are known about the sources, for example, translation or rotation invariance, the simple linear model can be extended by inclusion of the corresponding transformations. When the sources are invariant against translations (spatial displacements or time shifts) the resulting model is called an anechoic mixing model. We present a new algorithmic framework for the solution of anechoic problems in arbitrary dimensions. This framework is derived from stochastic time-frequency analysis in general, and the marginal properties of the Wigner-Ville spectrum in particular. The method reduces the general anechoic problem to a set of anechoic problems with non-negativity constraints and a phase retrieval problem. The first type of subproblem can be solved by existing algorithms, for example by an appropriate modification of non-negative matrix factorization (NMF). The second subproblem is solved by established phase retrieval methods. We discuss and compare implementations of this new algorithmic framework for several example problems with synthetic and real-world data, including music streams, natural 2D images, human motion trajectories and two-dimensional shapes. Keywords: blind source separation, anechoic mixtures, time-frequency transformations, linear canonical transform, Wigner-Ville spectrum
11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
Author: Botond Cseke, Tom Heskes
Abstract: We consider the problem of improving the Gaussian approximate posterior marginals computed by expectation propagation and the Laplace method in latent Gaussian models and propose methods that are similar in spirit to the Laplace approximation of Tierney and Kadane (1986). We show that in the case of sparse Gaussian models, the computational complexity of expectation propagation can be made comparable to that of the Laplace method by using a parallel updating scheme. In some cases, expectation propagation gives excellent estimates where the Laplace approximation fails. Inspired by bounds on the correct marginals, we arrive at factorized approximations, which can be applied on top of both expectation propagation and the Laplace method. The factorized approximations can give nearly indistinguishable results from the non-factorized approximations and their computational complexity scales linearly with the number of variables. We experienced that the expectation propagation based marginal approximations we introduce are typically more accurate than the methods of similar complexity proposed by Rue et al. (2009). Keywords: approximate marginals, Gaussian Markov random fields, Laplace approximation, variational inference, expectation propagation
12 jmlr-2011-Bayesian Co-Training
Author: Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, R. Bharat Rao
Abstract: Co-training (or more generally, co-regularization) has been a popular algorithm for semi-supervised learning in data with two feature representations (or views), but the fundamental assumptions underlying this type of models are still unclear. In this paper we propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, and it can also automatically estimate how much each view should be trusted to accommodate noisy or unreliable views. The Bayesian co-training approach can also elegantly handle data samples with missing views, that is, some of the views are not available for some data points at learning time. This is further extended to an active sensing framework, in which the missing (sample, view) pairs are actively acquired to improve learning performance. The strength of active sensing model is that one actively sensed (sample, view) pair would improve the joint multi-view classification on all the samples. Experiments on toy data and several real world data sets illustrate the benefits of this approach. Keywords: co-training, multi-view learning, semi-supervised learning, Gaussian processes, undirected graphical models, active sensing
13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
Author: Zhihua Zhang, Guang Dai, Michael I. Jordan
Abstract: We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Lo` ve expansion of a Gaussian process (GP). Thus, our sparse e modeling framework leads to a flexible approximation method for GPs. Keywords: reproducing kernel Hilbert spaces, generalized kernel models, Silverman’s g-prior, Bayesian model averaging, Gaussian processes
14 jmlr-2011-Better Algorithms for Benign Bandits
Author: Elad Hazan, Satyen Kale
Abstract: The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information. A very general form of this problem is the non-stochastic bandit linear optimization problem, where the set of decisions is a convex set in some√ Euclidean space, and the cost functions are linear. ˜ Only recently an efficient algorithm attaining O( T ) regret was discovered in this setting. In this paper we propose a new algorithm for the bandit linear optimization problem which √ ˜ obtains a tighter regret bound of O( Q), where Q is the total variation in the cost functions. This regret bound, previously conjectured to hold in the full information case, shows that it is possible to incur much less regret in a slowly changing environment even in the bandit setting. Our algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling. Keywords: multi-armed bandit, regret minimization, online learning
15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms
Author: Volodymyr Melnykov, Ranjan Maitra
Abstract: This paper presents the C LUSTERING A LGORITHMS ’ R EFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper briefly describes the software and its capabilities. Keywords: CARP, M IX S IM, clustering algorithm, Gaussian mixture, overlap
16 jmlr-2011-Clustering Algorithms for Chains
Author: Antti Ukkonen
Abstract: We consider the problem of clustering a set of chains to k clusters. A chain is a totally ordered subset of a finite set of items. Chains are an intuitive way to express preferences over a set of alternatives, as well as a useful representation of ratings in situations where the item-specific scores are either difficult to obtain, too noisy due to measurement error, or simply not as relevant as the order that they induce over the items. First we adapt the classical k-means for chains by proposing a suitable distance function and a centroid structure. We also present two different approaches for mapping chains to a vector space. The first one is related to the planted partition model, while the second one has an intuitive geometrical interpretation. Finally we discuss a randomization test for assessing the significance of a clustering. To this end we present an MCMC algorithm for sampling random sets of chains that share certain properties with the original data. The methods are studied in a series of experiments using real and artificial data. Results indicate that the methods produce interesting clusterings, and for certain types of inputs improve upon previous work on clustering algorithms for orders. Keywords: Lloyd’s algorithm, orders, preference statements, planted partition model, randomization testing
17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
Author: Mauricio A. Álvarez, Neil D. Lawrence
Abstract: Recently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efficient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression data. Keywords: Gaussian processes, convolution processes, efficient approximations, multitask learning, structured outputs, multivariate processes
18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
Author: Adam D. Bull
Abstract: In the efficient global optimization problem, we minimize an unknown function f , using as few observations f (x) as possible. It can be considered a continuum-armed-bandit problem, with noiseless data, and simple regret. Expected-improvement algorithms are perhaps the most popular methods for solving the problem; in this paper, we provide theoretical results on their asymptotic behaviour. Implementing these algorithms requires a choice of Gaussian-process prior, which determines an associated space of functions, its reproducing-kernel Hilbert space (RKHS). When the prior is fixed, expected improvement is known to converge on the minimum of any function in its RKHS. We provide convergence rates for this procedure, optimal for functions of low smoothness, and describe a modified algorithm attaining optimal rates for smoother functions. In practice, however, priors are typically estimated sequentially from the data. For standard estimators, we show this procedure may never find the minimum of f . We then propose alternative estimators, chosen to minimize the constants in the rate of convergence, and show these estimators retain the convergence rates of a fixed prior. Keywords: convergence rates, efficient global optimization, expected improvement, continuumarmed bandit, Bayesian optimization
19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
Author: Benoît Patra
Abstract: Motivated by the problem of effectively executing clustering algorithms on very large data sets, we address a model for large scale distributed clustering methods. To this end, we briefly recall some standards on the quantization problem and some results on the almost sure convergence of the competitive learning vector quantization (CLVQ) procedure. A general model for linear distributed asynchronous algorithms well adapted to several parallel computing architectures is also discussed. Our approach brings together this scalable model and the CLVQ algorithm, and we call the resulting technique the distributed asynchronous learning vector quantization algorithm (DALVQ). An indepth analysis of the almost sure convergence of the DALVQ algorithm is performed. A striking result is that we prove that the multiple versions of the quantizers distributed among the processors in the parallel architecture asymptotically reach a consensus almost surely. Furthermore, we also show that these versions converge almost surely towards the same nearly optimal value for the quantization criterion. Keywords: k-means, vector quantization, distributed, asynchronous, stochastic optimization, scalability, distributed consensus
20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity
Author: Julien Mairal, Rodolphe Jenatton, Guillaume Obozinski, Francis Bach
Abstract: We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of ℓ2 - or ℓ∞ -norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of ℓ∞ norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches. Keywords: convex optimization, proximal methods, sparse coding, structured sparsity, matrix factorization, network flow optimization, alternating direction method of multipliers
22 jmlr-2011-Differentially Private Empirical Risk Minimization
23 jmlr-2011-DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model
24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
28 jmlr-2011-Double Updating Online Learning
29 jmlr-2011-Efficient Learning with Partially Observed Attributes
30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
33 jmlr-2011-Exploiting Best-Match Equations for Efficient Reinforcement Learning
34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
35 jmlr-2011-Forest Density Estimation
36 jmlr-2011-Generalized TD Learning
37 jmlr-2011-Group Lasso Estimation of High-dimensional Covariance Matrices
38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
40 jmlr-2011-Hyper-Sparse Optimal Aggregation
41 jmlr-2011-Improved Moves for Truncated Convex Models
42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough
43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
47 jmlr-2011-Inverse Reinforcement Learning in Partially Observable Environments
48 jmlr-2011-Kernel Analysis of Deep Networks
49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
50 jmlr-2011-LPmade: Link Prediction Made Easy
51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal
52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership
53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
54 jmlr-2011-Learning Latent Tree Graphical Models
55 jmlr-2011-Learning Multi-modal Similarity
56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
58 jmlr-2011-Learning from Partial Labels
59 jmlr-2011-Learning with Structured Sparsity
60 jmlr-2011-Locally Defined Principal Curves and Surfaces
61 jmlr-2011-Logistic Stick-Breaking Process
62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package
63 jmlr-2011-MULAN: A Java Library for Multi-Label Learning
64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
65 jmlr-2011-Models of Cooperative Teaching and Learning
66 jmlr-2011-Multiple Kernel Learning Algorithms
67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
68 jmlr-2011-Natural Language Processing (Almost) from Scratch
69 jmlr-2011-Neyman-Pearson Classification, Convexity and Stochastic Constraints
71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding
80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach
81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation
82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
83 jmlr-2011-Scikit-learn: Machine Learning in Python
84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning
86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
87 jmlr-2011-Stochastic Methods forl1-regularized Loss Minimization
88 jmlr-2011-Structured Variable Selection with Sparsity-Inducing Norms
90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
91 jmlr-2011-The Sample Complexity of Dictionary Learning
92 jmlr-2011-The Stationary Subspace Analysis Toolbox
94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization
95 jmlr-2011-Training SVMs Without Offset
96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
97 jmlr-2011-Union Support Recovery in Multi-task Learning
98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures
100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
101 jmlr-2011-Variable Sparsity Kernel Learning
102 jmlr-2011-Waffles: A Machine Learning Toolkit
103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
105 jmlr-2011-lp-Norm Multiple Kernel Learning