jmlr jmlr2011 jmlr2011-76 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ashwin Srinivasan, Ganesh Ramakrishnan
Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational
Reference: text
sentIndex sentText sentNum sentScore
1 Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. [sent-13, score-0.257]
2 Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). [sent-14, score-0.372]
3 This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. [sent-19, score-0.362]
4 Keywords: design inductive logic programming, parameter screening and optimisation, experimental ∗. [sent-21, score-0.316]
5 Here take up the questions of screening and optimisation of parameters directly with the only restrictions being that parameter and goodness values are quantitative in nature. [sent-41, score-0.386]
6 The system has some controllable factors, and some uncontrollable ones and the goals of an experiment could be to answer questions like: which of the controllable factors are most influential on y; and what levels should these factors be for y to reach an optimal value. [sent-46, score-0.291]
7 The studies demonstrate how, for a given set of inputs, parameter screening and selection using designed experiments yields a better model than simply using default values, or performing an exhaustive combination of pre-determined values for parameters. [sent-61, score-0.337]
8 The paper is accompanied by two appendices that provide standard material from the literature concerned with the construction of linear models, and with specific aspects of the optimisation method used here. [sent-66, score-0.222]
9 This is different to improving the optimisation procedure performed by the system itself. [sent-88, score-0.246]
10 Rather, it is concerned with enabling the existing optimisation procedure find better results, usually by changing the space of alternatives in some principled manner. [sent-89, score-0.222]
11 It is beyond the engineer’s remit to alter either the system’s inputs or its optimisation criterion as a means of improving system performance. [sent-90, score-0.246]
12 Of 100 experimental studies reported in papers presented between 1998 and 2008 to the principal conference in the area, none attempt any form of screening for relevant parameters. [sent-105, score-0.239]
13 Turning to the broader literature in ML, we have not been able to uncover any reports explicitly concerned with screening for relevant parameters. [sent-113, score-0.256]
14 That is, what can be done at best is to treat the ILP system is a black box (as we have done in the previous section) and empirically observe variations in its response to changes in the values of the parameters. [sent-122, score-0.236]
15 For tractability, these values are discretised a priori, and approach essentially performs a sub-optimal search through a finite space of what are called k-level full-factorial designs in this paper (the k refers to the number of discrete values: more on such designs in the next section). [sent-125, score-0.233]
16 In this paper, we use a exhaustive search through such a space as a baseline for comparison against a gradient-based optimisation method. [sent-126, score-0.23]
17 The response of the MILP solver is then obtained, from which a ML system is used to construct a model relating the response to parameter values. [sent-132, score-0.412]
18 This work can be seen as a case of exhaustive enumeration of responses in a k-level full factorial design, followed by a single stage of ad hoc non-linear regression-based parameter screening and optimisation. [sent-136, score-0.398]
19 The problem of screening and tuning of parameters to optimise a system’s performance has been studied extensively in areas of industrial engineering, using results obtained from the design and analysis of experiments. [sent-137, score-0.276]
20 In this paper, we will restrict ourselves to a single response variable and the analysis of experimental designs by multiple regression. [sent-144, score-0.301]
21 Further, by “experimental design” we will mean nothing more than a selection of points from the factor-space, in order that a statistically sound relationship between the factors and the response variable can be obtained. [sent-146, score-0.257]
22 In this, each factor is taken to have just two levels (encoded as “-1” and “+1”, say),3 and the effect observed on the response variable of changing the levels of each factor. [sent-152, score-0.226]
23 For example, with two factors, conducting a 22 full factorial design will result in a table such as the one shown in Figure 4 We are then able to construct a regression model relating the response variable to the factors: y = b0 + b1 x1 + b2 x2 + b3 x1 x2 . [sent-154, score-0.408]
24 Further, it is also the case that with a 2-level full factorial design only linear effects can be estimated (that is, the effect of terms like xi2 cannot be obtained: in general, a nth order polynomial will require n + 1 levels for each factor). [sent-158, score-0.269]
25 In this paper, we will use the coefficients of the regression model to guide the screening of parameters: that is, parameters with coefficients significantly different from 0 will be taken to be relevant (more on this in Appendix A). [sent-159, score-0.271]
26 Clearly, the number of experiments required in a full factorial design constitute a substantial computational burden, especially as the number of factors increase. [sent-160, score-0.302]
27 Figure 5 below illustrates a 2-level fractional factorial design for 3 factors that uses half the number of experiments to estimate the main effects (from Steppan et al. [sent-195, score-0.411]
28 Figure 5: A full 2-level factorial design for 3 factors (left) and a “half fraction” design (right). [sent-235, score-0.333]
29 However, if it is our intention—as it is in the screening stage—only to estimate the main effects (such models are also called “first-order” models), then we can ignore interactions (see Figure 6). [sent-244, score-0.257]
30 Main effects can be estimated with a table that is a fraction required by the full factorial design: for example, the half fraction in Figure 5 is sufficient to obtain a regression equation with just the main effects x1 , x2 and x3 . [sent-245, score-0.269]
31 For the purpose of estimating the main effects, the surface on the right is adequate, as it shows that x2 has a much bigger effect than x1 on the response y (we are assuming here that x1 and x2 represent coded values on the same scale). [sent-247, score-0.276]
32 We use the techniques and results described there to direct the screening of factors by focusing on a linear model that contains the main effects only: y = b0 + b1 x1 + b2 x2 + · · · + bk xk . [sent-252, score-0.336]
33 2 Optimisation Using the Response Surface Suppose screening in the manner just described yields a set of k relevant factors from a original set of n factors (which we will denote here as x1 , x2 , . [sent-262, score-0.382]
34 (recall that if stepwise regression procedure is used at the screening stage, then this is the model that would be obtained): k y = b0 + ∑ bi xi i=1 2 2 or a second-order model involving quadratic terms like x1 , x2 , . [sent-273, score-0.291]
35 The principal approach adopted in optimising using the response surface is a sequential one. [sent-283, score-0.271]
36 First, a local approximation to the true response surface is constructed, using a first-order model. [sent-284, score-0.239]
37 Next, factors are varied along a path that improves the response the most (more on this in a moment). [sent-285, score-0.257]
38 At this point, a new first-order response surface is constructed, and the process repeated until it is evident that a first-order model is inadequate (or no more increases are possible). [sent-287, score-0.263]
39 x2 Path of steepest ascent Region of fitted first−order response surface Contour of first−order response surface x1 Figure 7: Sequential optimisation of the response surface using the path of steepest ascent. [sent-290, score-0.971]
40 A firstorder response surface is obtained in the shaded region. [sent-291, score-0.239]
41 The factors are then changed to move along a direction that gives the maximum increase in the response variable. [sent-292, score-0.257]
42 The sequential optimisation of the response surface just described involves calculating the gradient of the first-order model at the centre, or origin, of the experimental design (x1 = x2 = · · · = 0). [sent-305, score-0.514]
43 Sequential response optimisation proceeds by starting at the origin and increasing the xi along ∇ f until increases in the response y is observed. [sent-315, score-0.538]
44 Devise a two-level fractional factorial design of Resolution III or higher, and obtain values of y for each experiment (or replicates of values of y, if so required). [sent-358, score-0.263]
45 Retain only those factors that are important, by examining the magnitude and significance of the coefficients of the xi in the regression model (alternatively, only those factors found by a stepwise regression procedure are retained: see Appendix A). [sent-361, score-0.286]
46 Construct a first-order response surface using the relevant factors only (this is not needed if stepwise regression was used at the screening stage). [sent-364, score-0.631]
47 If no adequate model is obtained, then return the combination of factor-values that gave the best response at the screening stage. [sent-365, score-0.388]
48 Progressively obtain new values for y by changing the relevant parameters along the gradient to the response surface. [sent-368, score-0.249]
49 A question is raised about what is to be done if the local maximum found in this manner is lower than a response value known already—for example, from an experiment from the screening stage. [sent-381, score-0.358]
50 Devise a full factorial design by combing each of the levels of the factors against those of the others. [sent-388, score-0.305]
51 This procedure, a multi-level full factorial design, is the basis of the wrapper-based optimisation method in Kohavi and John (1995), recast in the terminology of experimental design. [sent-392, score-0.351]
52 Suppose we always conduct a 2n−p -fractional design at the screening stage, and that this stage results in no more than r variables being selected as relevant. [sent-395, score-0.264]
53 It is evident that a procedure SO′ that employs ScreenFrac followed by OptimiseFact would always perform 2n−p + l r experiments (assuming, for simplicity, that all relevant factors are taken to have l levels during the optimisation stage). [sent-401, score-0.376]
54 Is the value of the response variable obtained after optimisation a good estimate of the performance of the ILP system? [sent-408, score-0.362]
55 1 Aims Our aim here is to demonstrate the utility of the screening and optimisation procedure SO that we have described in Section 4. [sent-428, score-0.368]
56 2 A LGORITHMS AND M ACHINES Experimental design and regression models for screening and the response surface are constructed by the procedures made available by the authors of Steppan et al. [sent-477, score-0.507]
57 3 to screen this set using a fractional factorial design of Resolution III or higher. [sent-488, score-0.281]
58 We propose to examine a two-level fractional factorial design, using the levels shown below (the column “Default” refers to the default values for the factors assigned by the Aleph system, and ±1 refers to the coded values of the factors): Factor C Nodes Minacc Minpos Levels Low (−1) 4 5000 0. [sent-521, score-0.463]
59 Additional experiments, and corresponding experimental performance values, will be needed in Step 3 to obtain values of the relevant parameters using the response surface. [sent-552, score-0.251]
60 This model is taken to approximate the local response surface: we then proceed to change levels of factors along the normal to this surface in the manner described in Figure 8. [sent-555, score-0.345]
61 4 Results and Discussion We present first the results concerned with screening for relevant factors. [sent-575, score-0.256]
62 Figure 9 show responses from the ILP system for the preliminary experiments conducted for screening using the fractional design described under “Methods”. [sent-576, score-0.435]
63 The sequence of experiments following this stage for optimising relevant parameter values using: (a) the response surface; and (b) a multi-level full factorial design are in Figures 10 and 11. [sent-577, score-0.496]
64 In this case, no further optimisation is performed, and all parameters are left at their default values. [sent-638, score-0.314]
65 The results suggest that default levels for factors need not yield optimal models for all problems, or even when the same problem is given different inputs (here, different background knowledge). [sent-640, score-0.239]
66 This means that using ILP systems just based on default values for parameters— the accepted practice at present—can give misleading estimates of the best response possible from the system. [sent-641, score-0.303]
67 539 (c) Carcinogenesis (Bmax ) Figure 10: Optimisation using the response surface (procedure OptimiseRSM in Section 4. [sent-720, score-0.239]
68 In each case, the response surface used is the first-order regression model found by stepwise regression at the screening stage (shown in Figure 9). [sent-722, score-0.574]
69 Parameters are varied along the path of steepest ascent of experimental performance values for the response variable. [sent-723, score-0.244]
70 No optimisation is performed for Carcinogenesis (Bmin ) since no adequate first-order response surface was found. [sent-725, score-0.455]
71 In each case, relevant factors are those obtained by screening (Figure 9). [sent-893, score-0.301]
72 A 5-level full factorial design is then used to find the best values for these factors, using experimental performance values for the response variable. [sent-894, score-0.394]
73 The screening results suggest that as inputs change, so can the relevance of factors (for example, when the background changes from Bmin to Bmax in Carcinogenesis, Minacc becomes a a relevant factor). [sent-925, score-0.342]
74 Optimisation, as proposed here, requires the selection of an appropriate step-size and specification of a stopping criterion for a sequential search conducted along the gradient to the response surface. [sent-934, score-0.233]
75 Even if a set of relevant factors are available for a given input, a multi-level full factorial design can be an expensive method to determine appropriate levels. [sent-938, score-0.318]
76 Of course, screening and optimisation experiments would have to be conducted for each system in turn, since the factors relevant to one system (and its levels) would typically have no relation to those of any of the others. [sent-1047, score-0.648]
77 Figure 17(a) shows that having subject both Toplog and Aleph to the same procedure for screening and optimisation (that is, SO), we find no significant difference in their performance. [sent-1054, score-0.368]
78 On the other hand, Figure 17(b) shows that performing screening and optimisation on one (Aleph), but not the other (Toplog), can lead to misleading results (that the performance of Aleph is significantly better than Toplog). [sent-1055, score-0.368]
79 Substantial effort has been expended in developing designs other than the fractional factorial designs used here. [sent-1123, score-0.422]
80 Response surface optimisation could also involve more complex models than the simple first-order models used here. [sent-1124, score-0.249]
81 Specifically, the quantity: F= SSR/k SSE/(N − 1 − k) ˆ is calculated, where where SSE refers to the sum of squared residuals (∑N (yk − yk ))2 , N being k=1 the number of data points); and SSR is the sum of squares of deviations of the model’s response from the mean response (∑N (y − y))2 ). [sent-1191, score-0.352]
82 A Note on Constructing and Optimising Response Surfaces In this section we describe some issues that are relevant to constructing and optimising response surfaces. [sent-1241, score-0.246]
83 Specifically, we are concerned with: (1) A procedure for obtaining a fractional experimental design that is suitable for estimating the main effects using the regression procedure described just previously; (2) The search procedure along the gradient to the response surface. [sent-1242, score-0.481]
84 1 Fractional Factorial Designs We begin by assuming that we have k main effects and that the response surface is approximated by a first-order model with main effects only. [sent-1244, score-0.329]
85 Two-level fractional designs are obtained by dividing the full factorial design of k factors by some number 2 p (1 ≤ p < k). [sent-1247, score-0.467]
86 Select any k − p factors and construct a twolevel full factorial design with these factors. [sent-1251, score-0.28]
87 Thus, suppose we initially had k = 4 factors (A, B,C, D say), and wanted to construct a 24−1 factorial design (that is p = 1). [sent-1254, score-0.263]
88 Thus, there are several 24−1 fractional factorial designs that could have been devised. [sent-1267, score-0.316]
89 The column vectors in the two-level full and fractional factorial designs satisfy some properties: (a) The sum of each column is zero; (b) The sum of products of each column is zero; and (c) The sum of squares of each column is equal to the number of experiments. [sent-1271, score-0.333]
90 Two-level fractional factorial designs are categorised by their resolution. [sent-1278, score-0.316]
91 The resolution R of a fractional factorial design can be computed as the smallest number of factors that are confounded with the generator I. [sent-1279, score-0.409]
92 Orthogonal designs result in minimal variance when estimating coefficients, and both full factorial designs and fractional designs in which main effects are not aliased with each other (that is, Resolution III or more) are known to be orthogonal for first-order models (Montgomery, 2005). [sent-1290, score-0.618]
93 Once again, full factorial designs and fractional designs of Resolution III or more are rotatable designs 2 2 for first-order models. [sent-1294, score-0.545]
94 In general, if there is a variation in response y even for fixed values of the factors, then we will need to perform several replicates of each experiment, and attempt to model the average response y. [sent-1300, score-0.352]
95 2 Gradient Ascent The primary device used in the paper is to seek local improvements in the response y by making small movements in the direction of the gradient to a response surface. [sent-1309, score-0.369]
96 Let us suppose that the response surface is given by a scalar field f defined on points that are some subset of ℜk , and whose values f (x1 , x2 , . [sent-1311, score-0.239]
97 In our case, we do not know the functional form of f : the first-order response surface is simply a local approximation to f that ceases to be appropriate after some value of δ. [sent-1329, score-0.239]
98 The simplest of these—and widely used in response surface methods (Neddermeijer et al. [sent-1331, score-0.239]
99 78 Figure 19: Data from steps of the gradient ascent used to estimate a polynomial regression model relating response (Acc) to step-size (δ). [sent-1350, score-0.256]
100 A framework for response surface methodology for simulation optimization. [sent-1550, score-0.239]
wordName wordTfidf (topN-words)
[('ilp', 0.778), ('optimisation', 0.186), ('screening', 0.182), ('response', 0.176), ('minacc', 0.146), ('factorial', 0.129), ('aleph', 0.124), ('minpos', 0.119), ('default', 0.11), ('alz', 0.108), ('designs', 0.106), ('amakrishnan', 0.097), ('rinivasan', 0.097), ('creening', 0.092), ('ptimisation', 0.092), ('toplog', 0.092), ('factors', 0.081), ('fractional', 0.081), ('srinivasan', 0.074), ('acc', 0.068), ('bmin', 0.065), ('carcinogenesis', 0.065), ('dsstox', 0.065), ('muggleton', 0.065), ('surface', 0.063), ('system', 0.06), ('mutagenesis', 0.059), ('stepwise', 0.058), ('bmax', 0.055), ('design', 0.053), ('signed', 0.05), ('engineer', 0.049), ('mut', 0.049), ('effects', 0.045), ('montgomery', 0.043), ('predicates', 0.043), ('logic', 0.042), ('relevant', 0.038), ('resolution', 0.037), ('optimised', 0.037), ('coded', 0.037), ('concerned', 0.036), ('regression', 0.033), ('landwehr', 0.032), ('screenfrac', 0.032), ('steppan', 0.032), ('optimising', 0.032), ('interactions', 0.03), ('ranks', 0.03), ('adequate', 0.03), ('ascent', 0.03), ('stage', 0.029), ('xk', 0.028), ('aliased', 0.028), ('minimise', 0.028), ('confounded', 0.028), ('acetyl', 0.027), ('amine', 0.027), ('favour', 0.027), ('fin', 0.027), ('fout', 0.027), ('maximise', 0.027), ('ssr', 0.027), ('tox', 0.027), ('levels', 0.025), ('evident', 0.024), ('background', 0.023), ('optimise', 0.023), ('exhaustive', 0.023), ('relational', 0.023), ('experiments', 0.022), ('accuracies', 0.022), ('carcin', 0.022), ('controllable', 0.022), ('optimisefact', 0.022), ('optimisersm', 0.022), ('vout', 0.022), ('walpole', 0.022), ('amongst', 0.021), ('search', 0.021), ('sst', 0.021), ('inductive', 0.02), ('acceptable', 0.02), ('conducted', 0.019), ('steepest', 0.019), ('experimental', 0.019), ('relevance', 0.018), ('exclusion', 0.018), ('clause', 0.018), ('experimentation', 0.018), ('mutagenic', 0.018), ('responses', 0.018), ('bi', 0.018), ('screen', 0.018), ('parameters', 0.018), ('ties', 0.018), ('cients', 0.017), ('full', 0.017), ('estimates', 0.017), ('gradient', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
Author: Ashwin Srinivasan, Ganesh Ramakrishnan
Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational
2 0.073334277 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
3 0.034635253 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
Author: Aad van der Vaart, Harry van Zanten
Abstract: We consider the quality of learning a response function by a nonparametric Bayesian approach using a Gaussian process (GP) prior on the response function. We upper bound the quadratic risk of the learning procedure, which in turn is an upper bound on the Kullback-Leibler information between the predictive and true data distribution. The upper bound is expressed in small ball probabilities and concentration measures of the GP prior. We illustrate the computation of the upper bound for the Mat´ rn and squared exponential kernels. For these priors the risk, and hence the e information criterion, tends to zero for all continuous response functions. However, the rate at which this happens depends on the combination of true response function and Gaussian prior, and is expressible in a certain concentration function. In particular, the results show that for good performance, the regularity of the GP prior should match the regularity of the unknown response function. Keywords: Bayesian learning, Gaussian prior, information rate, risk, Mat´ rn kernel, squared e exponential kernel
4 0.026942734 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
Author: Paramveer S. Dhillon, Dean Foster, Lyle H. Ungar
Abstract: We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MICG ROUP) and multi-task feature selection (MIC-M ULTI). MIC-G ROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-M ULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, T RANS F EAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems.1 Keywords: feature selection, minimum description length principle, multi-task learning
5 0.023389323 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
6 0.02327811 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
7 0.021741824 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
8 0.02169879 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
9 0.020736678 60 jmlr-2011-Locally Defined Principal Curves and Surfaces
10 0.020290237 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
11 0.019807851 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
12 0.018641066 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
13 0.016601356 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
14 0.016454961 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
15 0.016053181 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
16 0.015800856 55 jmlr-2011-Learning Multi-modal Similarity
17 0.015401429 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models
18 0.015271056 97 jmlr-2011-Union Support Recovery in Multi-task Learning
19 0.01495686 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
20 0.014953136 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
topicId topicWeight
[(0, 0.09), (1, -0.037), (2, -0.032), (3, 0.01), (4, -0.015), (5, -0.009), (6, -0.015), (7, 0.028), (8, -0.054), (9, 0.012), (10, 0.023), (11, -0.006), (12, 0.057), (13, 0.038), (14, 0.031), (15, -0.01), (16, 0.025), (17, 0.017), (18, 0.072), (19, 0.088), (20, -0.045), (21, 0.065), (22, 0.06), (23, 0.116), (24, 0.247), (25, -0.038), (26, -0.038), (27, 0.178), (28, -0.256), (29, 0.015), (30, 0.222), (31, 0.167), (32, 0.088), (33, -0.112), (34, 0.061), (35, -0.251), (36, 0.086), (37, 0.09), (38, -0.247), (39, -0.099), (40, 0.199), (41, -0.053), (42, -0.207), (43, -0.12), (44, 0.157), (45, -0.14), (46, 0.119), (47, -0.179), (48, -0.259), (49, 0.076)]
simIndex simValue paperId paperTitle
same-paper 1 0.96298999 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
Author: Ashwin Srinivasan, Ganesh Ramakrishnan
Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational
2 0.44025055 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
3 0.18438222 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
4 0.18164964 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
Author: Paramveer S. Dhillon, Dean Foster, Lyle H. Ungar
Abstract: We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MICG ROUP) and multi-task feature selection (MIC-M ULTI). MIC-G ROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-M ULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, T RANS F EAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems.1 Keywords: feature selection, minimum description length principle, multi-task learning
5 0.16633831 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
6 0.16075654 51 jmlr-2011-Laplacian Support Vector Machines Trained in the Primal
7 0.15603583 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
8 0.15057941 83 jmlr-2011-Scikit-learn: Machine Learning in Python
9 0.14492646 73 jmlr-2011-Online Learning in Case of Unbounded Losses Using Follow the Perturbed Leader Algorithm
10 0.14179304 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
11 0.14074564 16 jmlr-2011-Clustering Algorithms for Chains
12 0.13054992 32 jmlr-2011-Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation
13 0.12628716 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
14 0.11633539 60 jmlr-2011-Locally Defined Principal Curves and Surfaces
15 0.11259023 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
16 0.11222373 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
17 0.1080402 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
18 0.10258288 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
19 0.09774518 58 jmlr-2011-Learning from Partial Labels
20 0.093192004 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
topicId topicWeight
[(4, 0.037), (9, 0.028), (10, 0.039), (24, 0.034), (31, 0.075), (32, 0.033), (41, 0.042), (60, 0.016), (71, 0.014), (73, 0.025), (78, 0.053), (90, 0.041), (95, 0.446)]
simIndex simValue paperId paperTitle
same-paper 1 0.68525183 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
Author: Ashwin Srinivasan, Ganesh Ramakrishnan
Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational
2 0.25787038 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
Author: Ricardo Henao, Ole Winther
Abstract: In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identifiable Multivariate modeling), is validated and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identifiability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identifiable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identifiability, factor models, linear Bayesian networks
3 0.25627944 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
4 0.25473276 16 jmlr-2011-Clustering Algorithms for Chains
Author: Antti Ukkonen
Abstract: We consider the problem of clustering a set of chains to k clusters. A chain is a totally ordered subset of a finite set of items. Chains are an intuitive way to express preferences over a set of alternatives, as well as a useful representation of ratings in situations where the item-specific scores are either difficult to obtain, too noisy due to measurement error, or simply not as relevant as the order that they induce over the items. First we adapt the classical k-means for chains by proposing a suitable distance function and a centroid structure. We also present two different approaches for mapping chains to a vector space. The first one is related to the planted partition model, while the second one has an intuitive geometrical interpretation. Finally we discuss a randomization test for assessing the significance of a clustering. To this end we present an MCMC algorithm for sampling random sets of chains that share certain properties with the original data. The methods are studied in a series of experiments using real and artificial data. Results indicate that the methods produce interesting clusterings, and for certain types of inputs improve upon previous work on clustering algorithms for orders. Keywords: Lloyd’s algorithm, orders, preference statements, planted partition model, randomization testing
5 0.25420398 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
Author: Bruno Pelletier, Pierre Pudlo
Abstract: Following Hartigan (1975), a cluster is defined as a connected component of the t-level set of the underlying density, that is, the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the data set into the feature space, which establishes the strong consistency of our proposed algorithm. Keywords: spectral clustering, graph, unsupervised classification, level sets, connected components
6 0.25272861 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
7 0.25171092 48 jmlr-2011-Kernel Analysis of Deep Networks
8 0.2509976 12 jmlr-2011-Bayesian Co-Training
9 0.25008801 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments
10 0.25006893 91 jmlr-2011-The Sample Complexity of Dictionary Learning
11 0.24980596 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
12 0.24802265 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
13 0.247137 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
14 0.24702282 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
15 0.24680911 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
16 0.24658489 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
17 0.24620987 29 jmlr-2011-Efficient Learning with Partially Observed Attributes
18 0.24467938 104 jmlr-2011-X-Armed Bandits
19 0.24429172 59 jmlr-2011-Learning with Structured Sparsity
20 0.24416101 7 jmlr-2011-Adaptive Exact Inference in Graphical Models