jmlr jmlr2010 jmlr2010-28 knowledge-graph by maker-knowledge-mining

28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Source: pdf

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

Abstract: We present a continuous time Bayesian network reasoning and learning engine (CTBN-RLE). A continuous time Bayesian network (CTBN) provides a compact (factored) description of a continuoustime Markov process. This software provides libraries and programs for most of the algorithms developed for CTBNs. For learning, CTBN-RLE implements structure and parameter learning for both complete and partial data. For inference, it implements exact inference and Gibbs and importance sampling approximate inference for any type of evidence pattern. Additionally, the library supplies visualization methods for graphically displaying CTBNs or trajectories of evidence. Keywords: continuous time Bayesian networks, C++, open source software

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Computer Science and Engineering University of California Riverside, CA 92521, USA Editor: Soeren Sonnenburg Abstract We present a continuous time Bayesian network reasoning and learning engine (CTBN-RLE). [sent-12, score-0.28]

2 A continuous time Bayesian network (CTBN) provides a compact (factored) description of a continuoustime Markov process. [sent-13, score-0.173]

3 This software provides libraries and programs for most of the algorithms developed for CTBNs. [sent-14, score-0.158]

4 For learning, CTBN-RLE implements structure and parameter learning for both complete and partial data. [sent-15, score-0.054]

5 For inference, it implements exact inference and Gibbs and importance sampling approximate inference for any type of evidence pattern. [sent-16, score-0.244]

6 Additionally, the library supplies visualization methods for graphically displaying CTBNs or trajectories of evidence. [sent-17, score-0.263]

7 Keywords: continuous time Bayesian networks, C++, open source software 1. [sent-18, score-0.154]

8 Introduction Continuous time Bayesian networks (CTBNs) represent a continuous-time ﬁnite-state Markov process compactly factored according to a graph (Nodelman et al. [sent-19, score-0.162]

9 The dynamics of the process is also factorized according to a directed graph, but this graph may contain cycles. [sent-22, score-0.293]

10 The edges in this second graph represent causal inﬂuence between variables of the system. [sent-23, score-0.07]

11 CTBNs compactly represent the dynamics differently than models in queueing theory (Bolch et al. [sent-24, score-0.263]

12 , 1998), Petri nets (Petri, 1962), or matrix diagrams (Ciardo and Miner, 1999). [sent-25, score-0.058]

13 The representation and algorithms developed so far for CTBNs emphasize reasoning about the transient properties over the steady-state of the system. [sent-26, score-0.046]

14 Unfortunately, previously there were no commonly available software packages implementing CTBN algorithms. [sent-27, score-0.135]

15 This software package aims to reduce this barrier to entry by supplying our implementations of these methods in a complete object-oriented design. [sent-29, score-0.142]

16 It is implemented in C++ with demonstration programs for common functionality and a documented interface for users to develop their own programs. [sent-31, score-0.082]

17 Continuous Time Bayesian Networks The dynamics of a continuous-time n-state Markov process are often described with an n-by-n intensity (or rate) matrix, Q with elements qi j . [sent-36, score-0.214]

18 The diagonal elements are non-positive and correspond to the parameters of the exponential distributions describing the duration of time the process stays 1 in each state. [sent-37, score-0.091]

19 Therefore the expected duration in state i is −qii . [sent-38, score-0.091]

20 The probability of transitioning from state i to state j is proportional to qi j . [sent-40, score-0.042]

21 QX|U is deﬁned as a set of intensity matrices QX|u for each assignment u to the variables U. [sent-42, score-0.121]

22 At any instant, the evolution of variable X is governed by the intensity matrix QX|u if u is the current assignment to the parents of X. [sent-43, score-0.121]

23 The common three types of queries are all implemented and additional query types can be added (through subclassing) without knowledge of the details of the inference algorithms. [sent-48, score-0.109]

24 All are conditioned on a (possibly incomplete) trajectory of events (transitions of the states of the variables of the process). [sent-50, score-0.101]

25 Exact inference (which takes exponential time in terms of the number of variables) is implemented. [sent-52, score-0.066]

26 Additionally, two approximate inference methods based on sampling are also implemented: Gibbs sampling (El-Hay et al. [sent-53, score-0.182]

27 2 Learning All learning methods estimate both the dynamics graph and the Bayesian network of the initial distribution. [sent-56, score-0.267]

28 For the latter, this is just standard Bayesian network learning which exists in other packages; however, we supply our implementation here for simplicity and to avoid relying on other software packages. [sent-57, score-0.147]

29 01 Figure 1: (Left) Automatic layout of the drug effect network without parameters. [sent-92, score-0.284]

30 Maximum likelihood parameter learning is implemented both for complete data (Nodelman et al. [sent-94, score-0.043]

31 Structure learning is also implemented for both complete and incomplete data. [sent-97, score-0.094]

32 Two structure searches are available: a brute force search that tries all parent sets up to a certain size (not possible for the initial distribution due to acyclic constraints) and a graph edit search that makes local changes to the graph. [sent-99, score-0.224]

33 3 Modularity Any supplied (or user-deﬁned) inference method can be used in expectation-maximization and structural expectation maximization. [sent-101, score-0.121]

34 Different proposal distributions for importance sampling can be added through simple subclassing. [sent-102, score-0.058]

35 As an example, we deﬁne “toggle” variables (only the change of state has meaning) through a very short subclass that implements the necessary parameter tieing. [sent-104, score-0.054]

36 This new process can be mixed freely with other processes to create CTBNs of mixed node types. [sent-105, score-0.078]

37 The ﬁrst converts a CTBN into a text ﬁle suitable to be read by the open source package graphviz which can then layout the CTBN in a variety of formats. [sent-108, score-0.211]

38 Figure 1 shows the output of this automatic visualization on the drug effect network from Nodelman et al. [sent-109, score-0.318]

39 Additionally, either an postscript or text visualization of a trajectory can be automatically generated. [sent-111, score-0.284]

40 Figure 2 shows these outputs for a partially observed trajectory drawn from the same drug effect network. [sent-112, score-0.211]

41 993380 0 0 1 0 2 0 1 3 1 4 0 5 1 6 1 7 0 -1 2 -1 0 -1 0 1 0 1 0 1 -1 1 1 -1 1 2 -1 1 -1 0 -1 1 0 0 0 1 2 1 1 Figure 2: Automatic visualizations of a trajectory of the drug effect network (with all variables missing observations from t = 1 to 1. [sent-148, score-0.31]

42 5), as a postscript ﬁle (left) and in text (right). [sent-149, score-0.097]

43 for sharing his initial implementation of CTBNs and Tal El-Hay for sharing his implementation of Gibbs sampling for CTBNs. [sent-150, score-0.152]

44 Any faults in our software package are purely our own. [sent-151, score-0.142]

45 Sampling for approximate inference in continuous time Bayesian networks. [sent-166, score-0.135]

46 Expectation maximization and complex duration distributions for continuous time Bayesian networks. [sent-178, score-0.16]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ctbns', 0.299), ('nodelman', 0.299), ('ctbn', 0.262), ('stomach', 0.249), ('shelton', 0.224), ('eating', 0.208), ('hungry', 0.208), ('ucr', 0.208), ('christian', 0.173), ('dynamics', 0.135), ('bayesian', 0.129), ('uri', 0.129), ('qx', 0.129), ('barometer', 0.125), ('fan', 0.118), ('layout', 0.112), ('drug', 0.11), ('engine', 0.103), ('querying', 0.103), ('trajectory', 0.101), ('ciardo', 0.097), ('postscript', 0.097), ('petri', 0.097), ('duration', 0.091), ('daphne', 0.086), ('visualization', 0.086), ('software', 0.085), ('supplies', 0.083), ('bolch', 0.083), ('drowsy', 0.083), ('helton', 0.083), ('joon', 0.083), ('queueing', 0.083), ('intensity', 0.079), ('pain', 0.075), ('uptake', 0.075), ('gibbs', 0.072), ('graph', 0.07), ('continuous', 0.069), ('inference', 0.066), ('tal', 0.064), ('jing', 0.064), ('lam', 0.064), ('network', 0.062), ('automatic', 0.06), ('cs', 0.059), ('sampling', 0.058), ('nets', 0.058), ('package', 0.057), ('supplied', 0.055), ('implements', 0.054), ('trajectories', 0.052), ('william', 0.052), ('incomplete', 0.051), ('packages', 0.05), ('factorized', 0.05), ('ee', 0.048), ('additionally', 0.047), ('sharing', 0.047), ('factored', 0.047), ('reasoning', 0.046), ('markov', 0.046), ('compactly', 0.045), ('force', 0.043), ('implemented', 0.043), ('uncertainty', 0.042), ('transitions', 0.042), ('assignment', 0.042), ('converts', 0.042), ('instant', 0.042), ('bonn', 0.042), ('continuoustime', 0.042), ('cshelton', 0.042), ('displaying', 0.042), ('jingxu', 0.042), ('kishor', 0.042), ('kommunikation', 0.042), ('meer', 0.042), ('riverside', 0.042), ('scanning', 0.042), ('transitioning', 0.042), ('yfan', 0.042), ('programs', 0.039), ('mixed', 0.039), ('directed', 0.038), ('hermann', 0.037), ('brute', 0.037), ('edit', 0.037), ('etwork', 0.037), ('ontinuous', 0.037), ('visualizations', 0.037), ('concentration', 0.037), ('parent', 0.037), ('yu', 0.035), ('gunter', 0.034), ('air', 0.034), ('qii', 0.034), ('modularity', 0.034), ('libraries', 0.034), ('raz', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

2 0.440292 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Author: Yu Fan, Jing Xu, Christian R. Shelton

Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a ﬁnite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we ﬁrst present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle ﬁltering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efﬁciency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, ﬁltering, smoothing

3 0.21169358 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean ﬁeld variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efﬁciently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efﬁcient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean ﬁeld approximation

4 0.056869883 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and trafﬁc patterns during the day. We deﬁne the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods

5 0.050396606 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random ﬁelds and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random ﬁelds, Bayesian networks

6 0.044565391 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

7 0.041577969 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

8 0.040565234 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

9 0.038795646 56 jmlr-2010-Introduction to Causal Inference

10 0.037873887 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

11 0.03723627 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

12 0.035945833 44 jmlr-2010-Graph Kernels

13 0.035586622 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

14 0.033643283 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

15 0.03314371 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

16 0.030617436 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

17 0.027775962 63 jmlr-2010-Learning Instance-Specific Predictive Models

18 0.026730366 117 jmlr-2010-Why Does Unsupervised Pre-training Help Deep Learning?

19 0.025803449 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

20 0.024524249 4 jmlr-2010-A Generalized Path Integral Control Approach to Reinforcement Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, 0.131), (2, -0.215), (3, -0.535), (4, 0.041), (5, 0.203), (6, -0.12), (7, 0.032), (8, -0.035), (9, -0.26), (10, 0.001), (11, -0.142), (12, -0.024), (13, 0.045), (14, -0.06), (15, -0.084), (16, -0.078), (17, -0.047), (18, 0.021), (19, 0.069), (20, 0.006), (21, -0.033), (22, 0.135), (23, -0.001), (24, 0.015), (25, 0.06), (26, 0.02), (27, 0.054), (28, -0.022), (29, 0.019), (30, 0.017), (31, -0.048), (32, 0.012), (33, 0.004), (34, 0.039), (35, 0.099), (36, -0.058), (37, -0.007), (38, -0.063), (39, -0.03), (40, -0.035), (41, -0.03), (42, 0.034), (43, 0.007), (44, 0.075), (45, 0.098), (46, -0.06), (47, 0.02), (48, -0.044), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96904874 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

2 0.89219052 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

Author: Yu Fan, Jing Xu, Christian R. Shelton

3 0.62509549 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

4 0.37653089 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

Author: Joshua W. Robinson, Alexander J. Hartemink

5 0.18915139 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

Abstract: The FastInf C++ library is designed to perform memory and time efﬁcient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean ﬁeld algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random ﬁeld, loopy belief propagation, approximate inference

6 0.18866388 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

7 0.17849135 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

8 0.16380963 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

9 0.15488349 56 jmlr-2010-Introduction to Causal Inference

10 0.15176317 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

11 0.14916569 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

12 0.13424292 63 jmlr-2010-Learning Instance-Specific Predictive Models

13 0.12856011 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

14 0.12640291 93 jmlr-2010-PyBrain

15 0.12636657 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

16 0.12193632 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

17 0.11154912 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

18 0.11105192 44 jmlr-2010-Graph Kernels

19 0.10930086 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

20 0.10696224 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.01), (36, 0.053), (37, 0.031), (72, 0.02), (75, 0.787)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99737555 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

Author: Christian R. Shelton, Yu Fan, William Lam, Joon Lee, Jing Xu

2 0.9898386 77 jmlr-2010-Model-based Boosting 2.0

Author: Torsten Hothorn, Peter Bühlmann, Thomas Kneib, Matthias Schmid, Benjamin Hofner

Abstract: We describe version 2.0 of the R add-on package mboost. The package implements boosting for optimizing general risk functions using component-wise (penalized) least squares estimates or regression trees as base-learners for ﬁtting generalized linear, additive and interaction models to potentially high-dimensional data. Keywords: component-wise functional gradient descent, splines, decision trees 1. Overview The R add-on package mboost (Hothorn et al., 2010) implements tools for ﬁtting and evaluating a variety of regression and classiﬁcation models that have been suggested in machine learning and statistics. Optimization within the empirical risk minimization framework is performed via a component-wise functional gradient descent algorithm. The algorithm originates from the statistical view on boosting algorithms (Friedman et al., 2000; Bühlmann and Yu, 2003). The theory and its implementation in mboost allow for ﬁtting complex prediction models, taking potentially many interactions of features into account, as well as for ﬁtting additive and linear models. The model class the package deals with is best described by so-called structured additive regression (STAR) models, where some characteristic ξ of the conditional distribution of a response variable Y given features X is modeled through a regression function f of the features ξ(Y |X = x) = f (x). In order to facilitate parsimonious and interpretable models, the regression function f is structured, that is, restricted to additive functions f (x) = ∑ p f j (x). Each model component f j (x) might take only j=1 c 2010 Torsten Hothorn, Peter Bühlmann, Thomas Kneib, Matthias Schmid and Benjamin Hofner. H OTHORN , B ÜHLMANN , K NEIB , S CHMID AND H OFNER a subset of the features into account. Special cases are linear models f (x) = x⊤ β, additive models f (x) = ∑ p f j (x( j) ), where f j is a function of the jth feature x( j) only (smooth functions or j=1 stumps, for example) or a more complex function where f (x) is implicitly deﬁned as the sum of multiple decision trees including higher-order interactions. The latter case corresponds to boosting with trees. Combinations of these structures are also possible. The most important advantage of such a decomposition of the regression function is that each component of a ﬁtted model can be looked at and interpreted separately for gaining a better understanding of the model at hand. The characteristic ξ of the distribution depends on the measurement scale of the response Y and the scientiﬁc question to be answered. For binary or numeric variables, some function of the expectation may be appropriate, but also quantiles or expectiles may be interesting. The deﬁnition of ξ is determined by deﬁning a loss function ρ whose empirical risk is to be minimized under some algorithmic constraints (i.e., limited number of boosting iterations). The model is then ﬁtted using n p ( fˆ1 , . . . , fˆp ) = argmin ∑ wi ρ yi , ∑ f j (x) . ( f1 ,..., f p ) i=1 j=1 Here (yi , xi ), i = 1, . . . , n, are n training samples with responses yi and potentially high-dimensional feature vectors xi , and wi are some weights. The component-wise boosting algorithm starts with some offset for f and iteratively ﬁts residuals deﬁned by the negative gradient of the loss function evaluated at the current ﬁt by updating only the best model component in each iteration. The details have been described by Bühlmann and Yu (2003). Early stopping via resampling approaches or AIC leads to sparse models in the sense that only a subset of important model components f j deﬁnes the ﬁnal model. A more thorough introduction to boosting with applications in statistics based on version 1.0 of mboost is given by Bühlmann and Hothorn (2007). As of version 2.0, the package allows for ﬁtting models to binary, numeric, ordered and censored responses, that is, regression of the mean, robust regression, classiﬁcation (logistic and exponential loss), ordinal regression,1 quantile1 and expectile1 regression, censored regression (including Cox, Weibull1 , log-logistic1 or lognormal1 models) as well as Poisson and negative binomial regression1 for count data can be performed. Because the structure of the regression function f (x) can be chosen independently from the loss function ρ, interesting new models can be ﬁtted (e.g., in geoadditive regression, Kneib et al., 2009). 2. Design and Implementation The package incorporates an infrastructure for representing loss functions (so-called ‘families’), base-learners deﬁning the structure of the regression function and thus the model components f j , and a generic implementation of component-wise functional gradient descent. The main progress in version 2.0 is that only one implementation of the boosting algorithm is applied to all possible models (linear, additive, tree-based) and all families. Earlier versions were based on three implementations, one for linear models, one for additive models, and one for tree-based boosting. In comparison to the 1.0 series, the reduced code basis is easier to maintain, more robust and regression tests have been set-up in a more uniﬁed way. Speciﬁcally, the new code basis results in an enhanced and more user-friendly formula interface. In addition, convenience functions for hyperparameter selection, faster computation of predictions and improved visual model diagnostics are available. 1. Model family is new in version 2.0 or was added after the release of mboost 1.0. 2110 M ODEL - BASED B OOSTING 2.0 Currently implemented base-learners include component-wise linear models (where only one variable is updated in each iteration of the algorithm), additive models with quadratic penalties (e.g., for ﬁtting smooth functions via penalized splines, varying coefﬁcients or bi- and trivariate tensor product splines, Schmid and Hothorn, 2008), and trees. As a major improvement over the 1.0 series, computations on larger data sets (both with respect to the number of observations and the number of variables) are now facilitated by memory efﬁcient implementations of the base-learners, mostly by applying sparse matrix techniques (package Matrix, Bates and Mächler, 2009) and parallelization for a cross-validation-based choice of the number of boosting iterations (per default via package multicore, Urbanek, 2009). A more elaborate description of mboost 2.0 features is available from the mboost vignette.2 3. User Interface by Example We illustrate the main components of the user-interface by a small example on human body fat composition: Garcia et al. (2005) used a linear model for predicting body fat content by means of common anthropometric measurements that were obtained for n = 71 healthy German women. In addition, the women’s body composition was measured by Dual Energy X-Ray Absorptiometry (DXA). The aim is to describe the DXA measurements as a function of the anthropometric features. Here, we extend the linear model by i) an intrinsic variable selection via early stopping, ii) additional terms allowing for smooth deviations from linearity where necessary (by means of penalized splines orthogonalized to the linear effect, Kneib et al., 2009), iii) a possible interaction between two variables with known impact on body fat composition (hip and waist circumference) and iv) using a robust median regression approach instead of L2 risk. For the data (available as data frame bodyfat), the model structure is speciﬁed via a formula involving the base-learners corresponding to the different model components (linear terms: bols(); smooth terms: bbs(); interactions: btree()). The loss function (here, the check function for the 0.5 quantile) along with its negative gradient function are deﬁned by the QuantReg(0.5) family (Fenske et al., 2009). The model structure (speciﬁed using the formula fm), the data and the family are then passed to function mboost() for model ﬁtting:3 R> library(

3 0.98341858 65 jmlr-2010-Learning Translation Invariant Kernels for Classification

Author: Kamaledin Ghiasi-Shirazi, Reza Safabakhsh, Mostafa Shamsi

Abstract: Appropriate selection of the kernel function, which implicitly deﬁnes the feature space of an algorithm, has a crucial role in the success of kernel methods. In this paper, we consider the problem of optimizing a kernel function over the class of translation invariant kernels for the task of binary classiﬁcation. The learning capacity of this class is invariant with respect to rotation and scaling of the features and it encompasses the set of radial kernels. We show that how translation invariant kernel functions can be embedded in a nested set of sub-classes and consider the kernel learning problem over one of these sub-classes. This allows the choice of an appropriate sub-class based on the problem at hand. We use the criterion proposed by Lanckriet et al. (2004) to obtain a functional formulation for the problem. It will be proven that the optimal kernel is a ﬁnite mixture of cosine functions. The kernel learning problem is then formulated as a semi-inﬁnite programming (SIP) problem which is solved by a sequence of quadratically constrained quadratic programming (QCQP) sub-problems. Using the fact that the cosine kernel is of rank two, we propose a formulation of a QCQP sub-problem which does not require the kernel matrices to be loaded into memory, making the method applicable to large-scale problems. We also address the issue of including other classes of kernels, such as individual kernels and isotropic Gaussian kernels, in the learning process. Another interesting feature of the proposed method is that the optimal classiﬁer has an expansion in terms of the number of cosine kernels, instead of support vectors, leading to a remarkable speedup at run-time. As a by-product, we also generalize the kernel trick to complex-valued kernel functions. Our experiments on artiﬁcial and real-world benchmark data sets, including the USPS and the MNIST digit recognition data sets, show the usefulness of the proposed method. Keywords: kernel learning, translation invariant k

4 0.98322994 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression

Author: Jianing Shi, Wotao Yin, Stanley Osher, Paul Sajda

Abstract: ℓ1 -regularized logistic regression, also known as sparse logistic regression, is widely used in machine learning, computer vision, data mining, bioinformatics and neural signal processing. The use of ℓ1 regularization attributes attractive properties to the classiﬁer, such as feature selection, robustness to noise, and as a result, classiﬁer generality in the context of supervised learning. When a sparse logistic regression problem has large-scale data in high dimensions, it is computationally expensive to minimize the non-differentiable ℓ1 -norm in the objective function. Motivated by recent work (Koh et al., 2007; Hale et al., 2008), we propose a novel hybrid algorithm based on combining two types of optimization iterations: one being very fast and memory friendly while the other being slower but more accurate. Called hybrid iterative shrinkage (HIS), the resulting algorithm is comprised of a ﬁxed point continuation phase and an interior point phase. The ﬁrst phase is based completely on memory efﬁcient operations such as matrix-vector multiplications, while the second phase is based on a truncated Newton’s method. Furthermore, we show that various optimization techniques, including line search and continuation, can signiﬁcantly accelerate convergence. The algorithm has global convergence at a geometric rate (a Q-linear rate in optimization terminology). We present a numerical comparison with several existing algorithms, including an analysis using benchmark data from the UCI machine learning repository, and show our algorithm is the most computationally efﬁcient without loss of accuracy. Keywords: logistic regression, ℓ1 regularization, ﬁxed point continuation, supervised learning, large scale c 2010 Jianing Shi, Wotao Yin, Stanley Osher and Paul Sajda. S HI , Y IN , O SHER AND S AJDA

5 0.97389531 6 jmlr-2010-A Rotation Test to Verify Latent Structure

Author: Patrick O. Perry, Art B. Owen

Abstract: In multivariate regression models we have the opportunity to look for hidden structure unrelated to the observed predictors. However, when one ﬁts a model involving such latent variables it is important to be able to tell if the structure is real, or just an artifact of correlation in the regression errors. We develop a new statistical test based on random rotations for verifying the existence of latent variables. The rotations are carefully constructed to rotate orthogonally to the column space of the regression model. We ﬁnd that only non-Gaussian latent variables are detectable, a ﬁnding that parallels a well known phenomenon in independent components analysis. We base our test on a measure of non-Gaussianity in the histogram of the principal eigenvector components instead of on the eigenvalue. The method ﬁnds and veriﬁes some latent dichotomies in the microarray data from the AGEMAP consortium. Keywords: independent components analysis, Kronecker covariance, latent variables, projection pursuit, transposable data

6 0.97035462 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines

7 0.92355484 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

8 0.9019773 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

9 0.89522439 1 jmlr-2010-A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

10 0.88836801 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis

11 0.87870556 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

12 0.87656724 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

13 0.87618995 84 jmlr-2010-On Spectral Learning

14 0.87597477 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

15 0.87171078 40 jmlr-2010-Fast and Scalable Local Kernel Machines

16 0.86871946 63 jmlr-2010-Learning Instance-Specific Predictive Models

17 0.86545289 66 jmlr-2010-Linear Algorithms for Online Multitask Classification

18 0.8654108 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices

19 0.8651222 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

20 0.86122435 5 jmlr-2010-A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning