jmlr jmlr2009 knowledge-graph by maker-knowledge-mining
1 jmlr-2009-A Least-squares Approach to Direct Importance Estimation
Author: Takafumi Kanamori, Shohei Hido, Masashi Sugiyama
Abstract: We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. Keywords: importance sampling, covariate shift adaptation, novelty detection, regularization path, leave-one-out cross validation
Author: Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert
Abstract: We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators mapping a set of “users” to a set of possibly desired “objects”. In particular, several recent low-rank type matrix-completion methods for CF are shown to be special cases of our proposed framework. Unlike existing regularization-based CF, our approach can be used to incorporate additional information such as attributes of the users/objects—a feature currently lacking in existing regularization-based CF approaches—using popular and well-known kernel methods. We provide novel representer theorems that we use to develop new estimation methods. We then provide learning algorithms based on low-rank decompositions and test them on a standard CF data set. The experiments indicate the advantages of generalizing the existing regularization-based CF methods to incorporate related information about users and objects. Finally, we show that certain multi-task learning methods can be also seen as special cases of our proposed approach. Keywords: collaborative filtering, matrix completion, kernel methods, spectral regularization
3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
Author: Marc Boullé
Abstract: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time. Keywords: large scale learning, naive Bayes, Bayesianism, model selection, model averaging
4 jmlr-2009-A Survey of Accuracy Evaluation Metrics of Recommendation Tasks
Author: Asela Gunawardana, Guy Shani
Abstract: Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms offline using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of offline experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing offline experiments. Keywords: recommender systems, collaborative filtering, statistical analysis, comparative studies
5 jmlr-2009-Adaptive False Discovery Rate Control under Independence and Dependence
Author: Gilles Blanchard, Étienne Roquain
Abstract: In the context of multiple hypothesis testing, the proportion π0 of true null hypotheses in the pool of hypotheses to test often plays a crucial role, although it is generally unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity in order to improve its efficency is called adaptive. In this paper, we focus on the issue of false discovery rate (FDR) control and we present new adaptive multiple testing procedures with control of the FDR. In a first part, assuming independence of the p-values, we present two new procedures and give a unified review of other existing adaptive procedures that have provably controlled FDR. We report extensive simulation results comparing these procedures and testing their robustness when the independence assumption is violated. The new proposed procedures appear competitive with existing ones. The overall best, though, is reported to be Storey’s estimator, albeit for a specific parameter setting that does not appear to have been considered before. In a second part, we propose adaptive versions of step-up procedures that have provably controlled FDR under positive dependence and unspecified dependence of the p-values, respectively. In the latter case, while simulations only show an improvement over non-adaptive procedures in limited situations, these are to our knowledge among the first theoretically founded adaptive multiple testing procedures that control the FDR when the p-values are not independent. Keywords: multiple testing, false discovery rate, adaptive procedure, positive regression dependence, p-values
Author: Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér
Abstract: We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability distribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n2 (e + n)) for n nodes and e edges. Finally, we illustrate how the graphical criterion can be used within bioinformatics to identify biologically meaningful gene dependencies. Keywords: graphical models, vertex separation, graphoids, weak transitivity, bioinformatics
7 jmlr-2009-An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs
Author: M. Pawan Kumar, Vladimir Kolmogorov, Philip H.S. Torr
Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete Markov random field (i.e., a Markov random field defined using a discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximation algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger (1976) for a special case and independently in Chekuri et al. (2001), Koster et al. (1998), and Wainwright et al. (2005) for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation of Ravikumar and Lafferty (2006); and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki (2003) for two label problems and later extended by Kumar et al. (2006) for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e., provides a better approximation than) QP - RL and SOCP MS . We generalize these results by defining a large class of SOCP (and equivalent QP ) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which define constraints using random variables that form cycles or cliques in the graphical model representation of the random field. Using some examples we show that the new SOCP relaxations strictly dominate the previous approaches. Keywords: probabilistic models, MAP estimation, discrete MRF, convex relaxations, linear programming, second-order cone programming, quadratic programming, dominating relaxations ∗. Work done while at the Dept. of Computing, Oxford Brookes University. c 2008 M. Pawan Kumar, Vladimir Kolmogorov, Philip H.S. Torr. K UMAR , KOLMOG
8 jmlr-2009-An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
Author: Luciana Ferrer, Kemal Sönmez, Elizabeth Shriberg
Abstract: We present a method for training support vector machine (SVM)-based classification systems for combination with other classification systems designed for the same task. Ideally, a new system should be designed such that, when combined with existing systems, the resulting performance is optimized. We present a simple model for this problem and use the understanding gained from this analysis to propose a method to achieve better combination performance when training SVM systems. We include a regularization term in the SVM objective function that aims to reduce the average class-conditional covariance between the resulting scores and the scores produced by the existing systems, introducing a trade-off between such covariance and the system’s individual performance. That is, the new system “takes one for the team”, falling somewhat short of its best possible performance in order to increase the diversity of the ensemble. We report results on the NIST 2005 and 2006 speaker recognition evaluations (SREs) for a variety of subsystems. We show a gain of 19% on the equal error rate (EER) of a combination of four systems when applying the proposed method with respect to the performance obtained when the four systems are trained independently of each other. Keywords: system combination, ensemble diversity, multiple classifier systems, support vector machines, speaker recognition, kernel methods ∗. This author performed part of the work presented in this paper while at the Information Systems Laboratory, Department of Electrical Engineering, Stanford University. c 2009 Luciana Ferrer, Kemal S¨ nmez and Elizabeth Shriberg. o ¨ F ERRER , S ONMEZ AND S HRIBERG
9 jmlr-2009-Analysis of Perceptron-Based Active Learning
Author: Sanjoy Dasgupta, Adam Tauman Kalai, Claire Monteleoni
Abstract: We start by showing that in an active learning setting, the Perceptron algorithm needs Ω( ε12 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit ˜ sphere, we show that our algorithm reaches generalization error ε after asking for just O(d log 1 ) ε labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm. Keywords: active learning, perceptron, label complexity bounds, online learning
Author: Eitan Greenshtein, Junyong Park
Abstract: We consider the problem of classification using high dimensional features’ space. In a paper by Bickel and Levina (2004), it is recommended to use naive-Bayes classifiers, that is, to treat the features as if they are statistically independent. Consider now a sparse setup, where only a few of the features are informative for classification. Fan and Fan (2008), suggested a variable selection and classification method, called FAIR. The FAIR method improves the design of naive-Bayes classifiers in sparse setups. The improvement is due to reducing the noise in estimating the features’ means. This reduction is since that only the means of a few selected variables should be estimated. We also consider the design of naive Bayes classifiers. We show that a good alternative to variable selection is estimation of the means through a certain non parametric empirical Bayes procedure. In sparse setups the empirical Bayes implicitly performs an efficient variable selection. It also adapts very well to non sparse setups, and has the advantage of making use of the information from many “weakly informative” variables, which variable selection type of classification procedures give up on using. We compare our method with FAIR and other classification methods in simulation for sparse and non sparse setups, and in real data examples involving classification of normal versus malignant tissues based on microarray data. Keywords: non parametric empirical Bayes, high dimension, classification
11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
Author: Raanan Yehezkel, Boaz Lerner
Abstract: We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, run-time, structural correctness and classification accuracy over the PC, Three Phase Dependency Analysis, Optimal Reinsertion, greedy search, Greedy Equivalence Search, Sparse Candidate, and Max-Min Hill-Climbing algorithms. Keywords: Bayesian networks, constraint-based structure learning
12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
Author: Saharon Rosset
Abstract: We show how to follow the path of cross validated solutions to families of regularized optimization problems, defined by a combination of a parameterized loss function and a regularization term. A primary example is kernel quantile regression, where the parameter of the loss function is the quantile being estimated. Even though the bi-level optimization problem we encounter for every quantile is non-convex, the manner in which the optimal cross-validated solution evolves with the parameter of the loss function allows tracking of this solution. We prove this property, construct the resulting algorithm, and demonstrate it on real and artificial data. This algorithm allows us to efficiently solve the whole family of bi-level problems. We show how it can be extended to cover other modeling problems, like support vector regression, and alternative in-sample model selection approaches.1
13 jmlr-2009-Bounded Kernel-Based Online Learning
Author: Francesco Orabona, Joseph Keshet, Barbara Caputo
Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set
Author: Roberto Esposito, Daniele P. Radicioni
Abstract: The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role, by allowing the evaluation of the best (most probable) sequence of labels with a time complexity linear in the number of time events, and quadratic in the number of labels. In this paper we propose CarpeDiem, a novel algorithm allowing the evaluation of the best possible sequence of labels with a sub-quadratic time complexity.1 We provide theoretical grounding together with solid empirical results supporting two chief facts. CarpeDiem always finds the optimal solution requiring, in most cases, only a small fraction of the time taken by the Viterbi algorithm; meantime, CarpeDiem is never asymptotically worse than the Viterbi algorithm, thus confirming it as a sound replacement. Keywords: Viterbi algorithm, sequence labeling, conditional models, classifiers optimization, exact inference
15 jmlr-2009-Cautious Collective Classification
Author: Luke K. McDowell, Kalyan Moy Gupta, David W. Aha
Abstract: Many collective classification (CC) algorithms have been shown to increase accuracy when instances are interrelated. However, CC algorithms must be carefully applied because their use of estimated labels can in some cases decrease accuracy. In this article, we show that managing this label uncertainty through cautious algorithmic behavior is essential to achieving maximal, robust performance. First, we describe cautious inference and explain how four well-known families of CC algorithms can be parameterized to use varying degrees of such caution. Second, we introduce cautious learning and show how it can be used to improve the performance of almost any CC algorithm, with or without cautious inference. We then evaluate cautious inference and learning for the four collective inference families, with three local classifiers and a range of both synthetic and real-world data. We find that cautious learning and cautious inference typically outperform less cautious approaches. In addition, we identify the data characteristics that predict more substantial performance differences. Our results reveal that the degree of caution used usually has a larger impact on performance than the choice of the underlying inference algorithm. Together, these results identify the most appropriate CC algorithms to use for particular task characteristics and explain multiple conflicting findings from prior CC research. Keywords: collective inference, statistical relational learning, approximate probabilistic inference, networked data, cautious inference
16 jmlr-2009-Classification with Gaussians and Convex Loss
Author: Dao-Hong Xiang, Ding-Xuan Zhou
Abstract: This paper considers binary classification algorithms generated from Tikhonov regularization schemes associated with general convex loss functions and varying Gaussian kernels. Our main goal is to provide fast convergence rates for the excess misclassification error. Allowing varying Gaussian kernels in the algorithms improves learning rates measured by regularization error and sample error. Special structures of Gaussian kernels enable us to construct, by a nice approximation scheme with a Fourier analysis technique, uniformly bounded regularizing functions achieving polynomial decays of the regularization error under a Sobolev smoothness condition. The sample error is estimated by using a projection operator and a tight bound for the covering numbers of reproducing kernel Hilbert spaces generated by Gaussian kernels. The convexity of the general loss function plays a very important role in our analysis. Keywords: reproducing kernel Hilbert space, binary classification, general convex loss, varying Gaussian kernels, covering number, approximation
17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors
Author: Mathias Drton, Michael Eichler, Thomas S. Richardson
Abstract: In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by a recursive (or acyclic) system of linear structural equations. These linear models have a long tradition and appear in seemingly unrelated regressions, structural equation modelling, and approaches to causal inference. They are also related to Gaussian graphical models via a classical representation known as a path diagram. Despite the models’ long history, a number of problems remain open. In this paper, we address the problem of computing maximum likelihood estimates in the subclass of ‘bow-free’ recursive linear models. The term ‘bow-free’ refers to the condition that the errors for variables i and j be uncorrelated if variable i occurs in the structural equation for variable j. We introduce a new algorithm, termed Residual Iterative Conditional Fitting (RICF), that can be implemented using only least squares computations. In contrast to existing algorithms, RICF has clear convergence properties and yields exact maximum likelihood estimates after the first iteration whenever the MLE is available in closed form. Keywords: linear regression, maximum likelihood estimation, path diagram, structural equation model, recursive semi-Markov model, residual iterative conditional fitting
18 jmlr-2009-Consistency and Localizability
Author: Alon Zakai, Ya'acov Ritov
Abstract: We show that all consistent learning methods—that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y )—are necessarily localizable, by which we mean that they do not signiÄ?Ĺš cantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be deÄ?Ĺš ned in a non-local manner, such as support vector machines in classiÄ?Ĺš cation and least-squares estimators in regression. Aside from showing that consistency implies a speciÄ?Ĺš c form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method’s global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global. Keywords: consistency, local learning, regression, classiÄ?Ĺš cation
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics
Author: Jens Lehmann
Abstract: In this paper, we introduce DL-Learner, a framework for learning in description logics and OWL. OWL is the ofÄ?Ĺš cial W3C standard ontology language for the Semantic Web. Concepts in this language can be learned for constructing and maintaining OWL ontologies or for solving problems similar to those in Inductive Logic Programming. DL-Learner includes several learning algorithms, support for different OWL formats, reasoner interfaces, and learning problems. It is a cross-platform framework implemented in Java. The framework allows easy programmatic access and provides a command line interface, a graphical interface as well as a WSDL-based web service. Keywords: concept learning, description logics, OWL, classiÄ?Ĺš cation, open-source
21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression
23 jmlr-2009-Discriminative Learning Under Covariate Shift
24 jmlr-2009-Distance Metric Learning for Large Margin Nearest Neighbor Classification
25 jmlr-2009-Distributed Algorithms for Topic Models
26 jmlr-2009-Dlib-ml: A Machine Learning Toolkit (Machine Learning Open Source Software Paper)
27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting
29 jmlr-2009-Estimating Labels from Label Proportions
30 jmlr-2009-Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods
31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling
33 jmlr-2009-Exploring Strategies for Training Deep Neural Networks
36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations
37 jmlr-2009-Generalization Bounds for Ranking Algorithms via Algorithmic Stability
38 jmlr-2009-Hash Kernels for Structured Data
39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training
40 jmlr-2009-Identification of Recurrent Neural Networks by Bayesian Interrogation Techniques
42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks
43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths
45 jmlr-2009-Learning Approximate Sequential Patterns for Classification
46 jmlr-2009-Learning Halfspaces with Malicious Noise
47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
48 jmlr-2009-Learning Nondeterministic Classifiers
49 jmlr-2009-Learning Permutations with Exponential Weights
50 jmlr-2009-Learning When Concepts Abound
51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences
52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost
53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models
55 jmlr-2009-Maximum Entropy Discrimination Markov Networks
57 jmlr-2009-Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
60 jmlr-2009-Nieme: Large-Scale Energy-Based Models (Machine Learning Open Source Software Paper)
61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures
62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory
64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning
66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression
67 jmlr-2009-Online Learning with Sample Path Constraints
68 jmlr-2009-Online Learning with Samples Drawn from Non-identical Distributions
69 jmlr-2009-Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization
70 jmlr-2009-Particle Swarm Model Selection (Special Topic on Model Selection)
71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications
72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes
73 jmlr-2009-Prediction With Expert Advice For The Brier Game
74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs
75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models
78 jmlr-2009-Refinement of Reproducing Kernels
79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis
80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning
82 jmlr-2009-Robustness and Regularization of Support Vector Machines
83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent
86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms
87 jmlr-2009-Sparse Online Learning via Truncated Gradient
88 jmlr-2009-Stable and Efficient Gaussian Process Calculations
89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks
91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
96 jmlr-2009-Transfer Learning for Reinforcement Learning Domains: A Survey
97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
99 jmlr-2009-Using Local Dependencies within Batches to Improve Large Margin Classifiers
100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers