nips nips2009 nips2009-124 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eric Garcia, Maya Gupta
Abstract: We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient implementation. Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. Experiments show that lattice regression can reduce mean test error by as much as 25% compared to Gaussian process regression (GPR) for digital color management of printers, an application for which linearly interpolating a look-up table is standard. Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice. Surprisingly, in some cases the proposed method — although motivated by computational efficiency — performs better than directly applying GPR with no lattice at all. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient implementation. [sent-7, score-1.034]
2 Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. [sent-8, score-2.829]
3 Experiments show that lattice regression can reduce mean test error by as much as 25% compared to Gaussian process regression (GPR) for digital color management of printers, an application for which linearly interpolating a look-up table is standard. [sent-9, score-1.352]
4 Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice. [sent-10, score-0.938]
5 Surprisingly, in some cases the proposed method — although motivated by computational efficiency — performs better than directly applying GPR with no lattice at all. [sent-11, score-0.798]
6 1 Introduction In high-throughput regression problems, the cost of evaluating test samples is just as important as the accuracy of the regression and most non-parametric regression techniques do not produce models that admit efficient implementation, particularly in hardware. [sent-12, score-0.455]
7 For functions with a known and bounded domain, a standard efficient approach to regression is to store a regular lattice of function values spanning the domain, then interpolate each test sample from the lattice vertices that surround it. [sent-14, score-1.755]
8 Evaluating the lattice is independent of the size of any original training set, but exponential in the dimension of the input space making it best-suited to low-dimensional applications. [sent-15, score-0.836]
9 For applications where one begins with training data and must learn the lattice, the standard approach is to first estimate a function that fits the training data, then evaluate the estimated function at the lattice points. [sent-17, score-0.856]
10 However, this is suboptimal because the effect of interpolation from the lattice nodes is not considered when estimating the function. [sent-18, score-1.033]
11 This begs the question: can we instead learn lattice outputs that accurately reproduce the training data upon interpolation? [sent-19, score-0.888]
12 Iterative post-processing solutions that update a given lattice to reduce the post-interpolation error have been proposed by researchers in geospatial analysis [4] and digital color management [5]. [sent-20, score-1.124]
13 In 1 this paper, we propose a solution that we term lattice regression, that jointly estimates all of the lattice outputs by minimizing the regularized interpolation error on the training data. [sent-21, score-1.835]
14 Experiments with randomly-generated functions, geospatial data, and two color management tasks show that lattice regression consistently reduces error over the standard approach of evaluating a fitted function at the lattice points, in some cases by as much as 25%. [sent-22, score-2.014]
15 More surprisingly, the proposed method can perform better than evaluating test points by Gaussian process regression using no lattice at all. [sent-23, score-0.962]
16 2 Lattice Regression The motivation behind the proposed lattice regression is to jointly choose outputs for lattice nodes that interpolate the training data accurately. [sent-24, score-1.908]
17 The key to this estimation is that the linear interpolation operation can be directly inverted to solve for the node outputs that minimize the squared error of the training data. [sent-25, score-0.34]
18 For these reasons, we add two forms of regularization to the minimization of the interpolation error. [sent-28, score-0.179]
19 In total, the proposed form of lattice regression trades off three terms: empirical risk, Laplacian regularization, and a global bias. [sent-29, score-0.932]
20 , xn and the training outputs yi ∈ Rp in the p × n matrix Y = y1 , . [sent-35, score-0.09]
21 Consider a lattice consisting of m nodes where d m = j=1 mj and mj is the number of nodes along dimension j. [sent-39, score-0.988]
22 For any x ∈ D, there are q = 2d nodes in the lattice that form a cell (hyper-rectangle) containing x from which an output will be interpolated; denote the indices of these nodes by c1 (x), . [sent-48, score-0.972]
23 For our purposes, we restrict the interpolation to be a linear combination {w1 (x), . [sent-52, score-0.155]
24 , wq (x)} of the ˆ surrounding node outputs {bc1 (x) , . [sent-55, score-0.091]
25 There are many interpolation methods that correspond to distinct weightings (for instance, in three dimensions: trilinear, pyramidal, or tetrahedral interpolation [6]). [sent-61, score-0.31]
26 Additionally, one might consider a higher-order interpolation technique such as tricubic interpolation, which expands the linear weighting to the nodes directly adjacent to this cell. [sent-62, score-0.292]
27 In our experiments we investigate only the case of d-linear interpolation (e. [sent-63, score-0.155]
28 bilinear/trilinear interpolation) because it is arguably the most popular variant of linear interpolation, can be implemented efficiently, and has the theoretical support of being the maximum entropy solution to the underdetermined linear interpolation equations [7]. [sent-65, score-0.171]
29 , wq (x)} corresponding to an interpolation of x, let W (x) be the m × 1 sparse vector with cj (x)th entry wj (x) for j = 1, . [sent-69, score-0.171]
30 The lattice outputs B ∗ that minimize the total squared- 2 distortion between the lattice-interpolated training outputs BW and the given training outputs Y are B ∗ = arg min tr BW − Y BW − Y T . [sent-80, score-1.018]
31 As a form of regularization, we propose to penalize the average squared difference of the output on adjacent lattice nodes using Laplacian regularization. [sent-83, score-0.931]
32 The graph Laplacian [8] of the lattice is fully defined by the m×m lattice adjacency matrix E where Eij = 1 for nodes directly adjacent to one another and 0 otherwise. [sent-85, score-1.705]
33 The average squared error between adjacent lattice outputs can be compactly represented as p tr BLB T 1 ij Eij = k=1 (Bki − Bkj )2 . [sent-88, score-0.913]
34 When the training data samples do not span all of the grid cells, the lattice node outputs reconstruct a clipped function. [sent-93, score-0.936]
35 In order to endow the algorithm with an improved ability to extrapolate and regularize towards trends in the data, we also include a global bias term in the lattice regression optimization. [sent-94, score-1.061]
36 The global bias ˜ term penalizes the divergence of lattice node outputs from some global function f : Rd → Rp that approximates the training data and this can be learned using any regression technique. [sent-95, score-1.178]
37 ˜ ˜ Given f , we bias the lattice regression nodes towards f ’s predictions for the lattice nodes by minimizing the average squared deviation: 1 tr m ˜ ˜ B − f (A) B − f (A))T . [sent-96, score-2.021]
38 ˜ We hypothesized that the lattice regression performance would be better if the f was itself a good ˜ ˜ regression of the training data. [sent-97, score-1.089]
39 Surprisingly, experiments comparing an accurate f , an inaccurate f , and no bias at all showed little difference in most cases (see Section 3 for details). [sent-98, score-0.114]
40 4 Lattice Regression Objective Function Combined, the empirical risk minimization, Laplacian regularization, and global bias form the proposed lattice regression objective. [sent-100, score-1.071]
41 In order to adapt an appropriate mixture of these terms, the regularization parameters α and γ trade-off the first-order smoothness and the divergence from the bias function, relative to the empirical risk. [sent-101, score-0.138]
42 The combined objective solves for the lattice node outputs B ∗ that minimize arg min tr B 1 BW − Y n BW − Y T + αBLB T + γ ˜ ˜ B − f (A) B − f (A))T , m which has the closed form solution B∗ = 1 γ ˜ Y W T + f (A) n m 1 γ W W T + αL + I n m −1 , (2) where I is the m × m identity matrix. [sent-102, score-0.859]
43 Note that this is a joint optimization over all lattice nodes simultaneously. [sent-103, score-0.878]
44 2 For a given row, the only possible non-zero entries of W W T correspond to nodes that are adjacent in one or more dimensions and these non-zero entries overlap with those of L. [sent-107, score-0.123]
45 3 3 Experiments The effectiveness of the proposed method was analyzed with simulations on randomly-generated functions and tested on a real-data geospatial regression problem as well as two real-data color management tasks. [sent-108, score-0.407]
46 For all experiments, we compared the proposed method to Gaussian process regression (GPR) applied directly to the final test points (no lattice), and to estimating test points by interpolating a lattice where the lattice nodes are learned by the same GPR. [sent-109, score-1.865]
47 For the color management task, we also compared a state-of-the art regression method used previously for this application: local ridge regression using the enclosing k-NN neighborhood [10]. [sent-110, score-0.512]
48 The starting point for this search was set at the default parameter setting for each algorithm: λ = 1 for local ridge regression4 and α = 1, γ = 1 for lattice regression. [sent-119, score-0.819]
49 We considered the very coarse m = 2d lattice formed by the corner vertices of the original lattice and ˜ solved (1) for this one-cell lattice, using the result to interpolate the full set of lattice nodes, forming f (A). [sent-135, score-2.381]
50 4 Note that no locality parameter is needed for this local ridge regression as the neighborhood size is automatically determined by enclosing k-NN [10]. [sent-136, score-0.192]
51 , zm } were used to evaluate the accuracy of each regression method in fitting these functions. [sent-147, score-0.127]
52 For the rth randomly-generated function fr , denote the estimate of the jth test sample by a regression method as (ˆj )r . [sent-148, score-0.158]
53 For each of the 100 functions and each regression method we computed the root meany squared errors (RMSE) where the mean is over the m = 10, 000 test samples: er = 1 m m fr (zj ) − (ˆj )r y 2 1/2 . [sent-149, score-0.204]
54 2 for lattice resolutions of 5, 9 and 17 nodes per dimension. [sent-155, score-0.935]
55 Consistently across the random functions, and in all 12 experiments, lattice regression with a GPR bias performs better than applying GPR to the nodes of the lattice. [sent-162, score-1.119]
56 At coarser lattice resolutions, the choice of bias function does not appear to be as important: in 7 of the 12 experiments (all at the low end of grid resolution) lattice regression using no bias does as well or better than that using a GPR bias. [sent-163, score-1.941]
57 Interestingly, in 3 of the 12 experiments, lattice regression with a GPR bias achieves statistically significantly lower errors (albeit by a marginal average amount) than applying GPR directly to the random functions. [sent-164, score-1.089]
58 This surprising behavior is also demonstrated on the real-world datasets in the following sections and is likely due to large extrapolations made by GPR and in contrast, interpolation from the lattice regularizes the estimate which reduces the overall error in these cases. [sent-165, score-0.961]
59 2 Geospatial Interpolation Interpolation from a lattice is a common representation for storing geospatial data (measurements tied to geographic coordinates) such as elevation, rainfall, forest cover, wind speed, etc. [sent-167, score-0.869]
60 05) of the differences in squared error on the 367 test samples was computed for each pair of techniques. [sent-173, score-0.1]
61 In contrast to the previous section in which significance was computed on the RMSE across 100 randomly drawn functions, significance in this section indicates that one technique produced consistently lower squared error across the individual test samples. [sent-174, score-0.104]
62 Legend RGPR direct GPR lattice Ranking by Statistical Significance R LR GPR bias LR d-linear bias LR no bias R R R R RMSE 100 50 0 5 9 17 33 Lattice Nodes Per Dimension 65 Figure 3: Shown is the RMSE of the estimates given by each method for the SIC97 test samples. [sent-175, score-1.157]
63 Compared with GPR applied to a lattice, lattice regression with a GPR bias again produces a lower RMSE on all five lattice resolutions. [sent-178, score-1.809]
64 However, for four of the five lattice resolutions, there is no performance improvement as judged by the statistical significance of the individual test errors. [sent-179, score-0.846]
65 In comparing the effectiveness of the bias term, we see that on four of five lattice resolutions, using no bias and using the d-linear bias produce consistently lower errors than both the GPR bias and GPR applied to a lattice. [sent-180, score-1.289]
66 Additionally, for finer lattice resolutions (≥ 17 nodes per dimension) lattice regression either outperforms or is not significantly worse than GPR applied directly to the test points. [sent-181, score-1.891]
67 Inspection of the 6 maximal errors confirms the behavior posited in the previous section: that interpolation from the lattice imposes a helpful regularization. [sent-182, score-0.961]
68 The range of values produced by applying GPR directly lies within [1, 552] while those produced by lattice regression (regardless of bias) lie in the range [3, 521]; the actual values at the test samples lie in the range [0, 517]. [sent-183, score-0.979]
69 For nonlinear devices such as printers, the color mapping is commonly estimated empirically by printing a page of color patches for a set of input RGB values and measuring the printed colors with a spectrophotometer. [sent-187, score-0.387]
70 From these training pairs of (Lab, RGB) colors, one estimates the inverse mapping f : Lab → RGB that specifies what RGB inputs to send to the printer in order to reproduce a desired Lab color. [sent-188, score-0.212]
71 Furthermore, millions of pixels must be processed in approximately real-time for every image without adding undue costs for hardware, which explains the popularity of using a lattice representation for color management in both hardware and software imaging systems. [sent-192, score-0.996]
72 L a b R Learned Device G Characterization B 1D LUT R' G' 1D LUT B' 1D LUT Printer ˆ L a ˆ ˆ b Figure 4: A color-managed printer system. [sent-193, score-0.138]
73 The proposed lattice regression was tested on an HP Photosmart D7260 ink jet printer and a Samsung CLP-300 laser printer. [sent-195, score-1.095]
74 As a baseline, we compared to a state-of-the-art color regression technique used previously in this application [10]: local ridge regression (LRR) using the enclosing k-NN neighborhood. [sent-196, score-0.441]
75 18 RGB image, which has 918 color patches that span the RGB colorspace. [sent-198, score-0.136]
76 We then measured the printed color patches with an X-Rite iSis spectrophotometer using D50 illuminant at a 2◦ observer angle and UV filter. [sent-199, score-0.164]
77 4 and as is standard practice for this application, the data for each printer is first gray-balanced using 1D calibration look-up-tables (1D LUTs) for each color channel (see [10, 13] for details). [sent-201, score-0.26]
78 We use the same 1D LUTs for all the methods compared in the experiment and these were learned for each printer using direct GPR on the training data. [sent-202, score-0.174]
79 The test errors for the regression methods the two printers are reported in Tables 1 and 2. [sent-204, score-0.283]
80 As is common in color management, we report ∆E76 error, which is the Euclidean distance between the desired test Lab color and the Lab color that results from printing the estimated RGB output of the regression (see Fig. [sent-205, score-0.57]
81 For both printers, the lattice regression methods performed best in terms of mean, median and 95 %-ile error. [sent-207, score-0.928]
82 Additionally, according to a one-sided Wilcoxon test of statistical significance with 5 We drew 918 samples iid uniformly over the RGB cube, printed these, and measured the resulting Lab values; these Lab values were used as test samples. [sent-208, score-0.147]
83 This is a standard approach to assuring that the test samples are Lab colors that are in the achievable color gamut of the printer [10]. [sent-209, score-0.343]
84 7 Table 1: Samsung CLP-300 laser printer Euclidean Lab Error Mean Median 95 %-ile Local Ridge Regression (to fit lattice nodes) 4. [sent-210, score-0.952]
85 45 Table 2: HP Photosmart D7260 inkjet printer Euclidean Lab Error Mean Median 95 %-ile Local Ridge Regression (to fit lattice nodes) 3. [sent-234, score-0.945]
86 36 GPR (to fit lattice nodes) Lattice Regression (GPR bias) 2. [sent-243, score-0.784]
87 51 The bold face indicates that the individual errors are statistically significantly lower than the others as judged by a one-sided Wilcoxon significance test (p=0. [sent-258, score-0.112]
88 First, the two printers have rather different nonlinearities because the underlying physical mechanisms differ substantially (one is a laser printer and the other is an inkjet printer), so it is a nod towards the generality of the lattice regression that it performs best in both cases. [sent-264, score-1.205]
89 Second, the lattice is used for computationally efficiency, and we were surprised to see it perform better than directly estimating the test samples using the function estimated with GPR directly (no lattice). [sent-265, score-0.866]
90 Third, we hypothesized (incorrectly) that better performance would result from using the more accurate global bias term formed by GPR than using the very coarse fit provided by the global trilinear bias or no bias at all. [sent-266, score-0.445]
91 4 Conclusions In this paper we noted that low-dimensional functions can be efficiently implemented as interpolation from a regular lattice and we argued that an optimal approach to learning this structure from data should take into account the effect of this interpolation. [sent-267, score-0.939]
92 We showed that, in fact, one can directly estimate the lattice nodes to minimize the empirical interpolated training error and added two regularization terms to attain smoothness and extrapolation. [sent-268, score-1.013]
93 It should be noted that, in the experiments, extrapolation beyond the training data was not directly tested: test samples for the simulated and real-data experiments were drawn mainly from within the interior of the training data. [sent-269, score-0.16]
94 Real-data experiments showed that mean error on a practical digital color management problem could be reduced by 25% using the proposed lattice regression, and that the improvement was statistically significant. [sent-270, score-1.095]
95 Simulations also showed that lattice regression was statistically significantly better than the standard approach of first fitting a function then evaluating it at the lattice points. [sent-271, score-1.743]
96 Surprisingly, although the lattice architecture is motivated by computational efficiency, both our simulated and real-data experiments showed that the proposed lattice regression can work better than state-of-the-art regression of test samples without a lattice. [sent-272, score-1.896]
97 Bala, “Iterative technique for refining color correction look-up tables,” United States Patent 5,649,072, 1997. [sent-293, score-0.122]
98 Olshen, “Nonparametric supervised learning by linear interpolation with maximum entropy,” IEEE Trans. [sent-304, score-0.155]
99 Chin, “Adaptive local linear regression with application to printer color management,” IEEE Trans. [sent-321, score-0.387]
100 Dubois, “Spatial interpolation comparison 1997: Foreword and introduction,” Special Issue of the Journal of Geographic Information and Descision Analysis, vol. [sent-327, score-0.155]
wordName wordTfidf (topN-words)
[('lattice', 0.784), ('gpr', 0.402), ('interpolation', 0.155), ('printer', 0.138), ('regression', 0.127), ('color', 0.122), ('bias', 0.114), ('printers', 0.103), ('nodes', 0.094), ('lab', 0.081), ('cance', 0.08), ('rgb', 0.073), ('management', 0.071), ('digital', 0.068), ('geospatial', 0.057), ('rmse', 0.056), ('outputs', 0.054), ('lr', 0.052), ('bw', 0.046), ('icc', 0.046), ('printing', 0.046), ('trilinear', 0.046), ('resolutions', 0.043), ('lut', 0.04), ('interpolated', 0.039), ('device', 0.037), ('laplacian', 0.036), ('training', 0.036), ('ridge', 0.035), ('bala', 0.034), ('crc', 0.034), ('rainfall', 0.034), ('samsung', 0.034), ('test', 0.031), ('judged', 0.031), ('enclosing', 0.03), ('laser', 0.03), ('wilcoxon', 0.03), ('colors', 0.029), ('interpolate', 0.029), ('adjacent', 0.029), ('statistically', 0.028), ('printed', 0.028), ('eij', 0.028), ('geographic', 0.028), ('consistently', 0.027), ('gupta', 0.026), ('devices', 0.026), ('risk', 0.025), ('handbook', 0.024), ('inputs', 0.024), ('squared', 0.024), ('regularization', 0.024), ('legend', 0.023), ('blb', 0.023), ('inkjet', 0.023), ('photosmart', 0.023), ('rgpr', 0.023), ('samples', 0.023), ('errors', 0.022), ('chapter', 0.022), ('error', 0.022), ('ranking', 0.022), ('hp', 0.022), ('node', 0.021), ('global', 0.021), ('simulated', 0.02), ('pro', 0.02), ('seattle', 0.02), ('lattices', 0.02), ('luts', 0.02), ('garcia', 0.02), ('evaluating', 0.02), ('rp', 0.02), ('surprisingly', 0.019), ('imaging', 0.019), ('rasmussen', 0.019), ('splines', 0.018), ('grid', 0.018), ('median', 0.017), ('iid', 0.017), ('drew', 0.017), ('signi', 0.017), ('dimension', 0.016), ('consortium', 0.016), ('williams', 0.016), ('underdetermined', 0.016), ('wq', 0.016), ('tested', 0.016), ('si', 0.015), ('hypothesized', 0.015), ('extrapolate', 0.015), ('reproduce', 0.014), ('wa', 0.014), ('inverted', 0.014), ('simulations', 0.014), ('directly', 0.014), ('per', 0.014), ('characterization', 0.014), ('patches', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 124 nips-2009-Lattice Regression
Author: Eric Garcia, Maya Gupta
Abstract: We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient implementation. Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. Experiments show that lattice regression can reduce mean test error by as much as 25% compared to Gaussian process regression (GPR) for digital color management of printers, an application for which linearly interpolating a look-up table is standard. Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice. Surprisingly, in some cases the proposed method — although motivated by computational efficiency — performs better than directly applying GPR with no lattice at all. 1
2 0.11678847 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares
Author: Jacob Goldberger, Amir Leshem
Abstract: This paper proposes a new algorithm for the linear least squares problem where the unknown variables are constrained to be in a finite set. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, applying the Belief Propagation (BP) algorithm yields very poor results. The algorithm described here is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. It is shown that even though the approximation is not directly applied to the exact discrete distribution, applying the BP algorithm to the modified factor graph outperforms current methods in terms of both performance and complexity. The improved performance of the proposed algorithm is demonstrated on the problem of MIMO detection.
3 0.077194102 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
Author: Roy Anati, Kostas Daniilidis
Abstract: We present a system which constructs a topological map of an environment given a sequence of images. This system includes a novel image similarity score which uses dynamic programming to match images using both the appearance and relative positions of local features simultaneously. Additionally, an MRF is constructed to model the probability of loop-closures. A locally optimal labeling is found using Loopy-BP. Finally we outline a method to generate a topological map from loop closure data. Results, presented on four urban sequences and one indoor sequence, outperform the state of the art. 1
4 0.064747714 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
Author: Samuel Gershman, Ed Vul, Joshua B. Tenenbaum
Abstract: While many perceptual and cognitive phenomena are well described in terms of Bayesian inference, the necessary computations are intractable at the scale of realworld tasks, and it remains unclear how the human mind approximates Bayesian computations algorithmically. We explore the proposal that for some tasks, humans use a form of Markov Chain Monte Carlo to approximate the posterior distribution over hidden variables. As a case study, we show how several phenomena of perceptual multistability can be explained as MCMC inference in simple graphical models for low-level vision. 1
5 0.057172664 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
Author: Kwang I. Kim, Florian Steinke, Matthias Hein
Abstract: Semi-supervised regression based on the graph Laplacian suffers from the fact that the solution is biased towards a constant and the lack of extrapolating power. Based on these observations, we propose to use the second-order Hessian energy for semi-supervised regression which overcomes both these problems. If the data lies on or close to a low-dimensional submanifold in feature space, the Hessian energy prefers functions whose values vary linearly with respect to geodesic distance. We first derive the Hessian energy for smooth manifolds and continue to give a stable estimation procedure for the common case where only samples of the underlying manifold are given. The preference of ‘’linear” functions on manifolds renders the Hessian energy particularly suited for the task of semi-supervised dimensionality reduction, where the goal is to find a user-defined embedding function given some labeled points which varies smoothly (and ideally linearly) along the manifold. The experimental results suggest superior performance of our method compared with semi-supervised regression using Laplacian regularization or standard supervised regression techniques applied to this task. 1
6 0.0497519 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs
7 0.046803419 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints
8 0.044878431 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
9 0.03752907 196 nips-2009-Quantification and the language of thought
10 0.034240924 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
11 0.033780072 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
12 0.031709306 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
13 0.031677864 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
14 0.031108985 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
15 0.030340301 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
16 0.028944703 128 nips-2009-Learning Non-Linear Combinations of Kernels
17 0.028613295 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
18 0.02738848 237 nips-2009-Subject independent EEG-based BCI decoding
19 0.027344825 55 nips-2009-Compressed Least-Squares Regression
20 0.02728473 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
topicId topicWeight
[(0, -0.099), (1, 0.02), (2, -0.021), (3, 0.032), (4, -0.025), (5, -0.004), (6, 0.017), (7, -0.013), (8, 0.009), (9, -0.041), (10, -0.004), (11, 0.013), (12, 0.032), (13, -0.016), (14, 0.043), (15, 0.024), (16, -0.023), (17, 0.039), (18, -0.017), (19, -0.037), (20, -0.044), (21, -0.034), (22, 0.077), (23, -0.072), (24, 0.038), (25, 0.089), (26, 0.04), (27, 0.05), (28, -0.018), (29, 0.035), (30, 0.002), (31, 0.056), (32, -0.005), (33, -0.01), (34, -0.053), (35, 0.009), (36, 0.136), (37, 0.043), (38, 0.081), (39, 0.101), (40, -0.081), (41, -0.001), (42, -0.034), (43, -0.067), (44, -0.083), (45, 0.028), (46, -0.081), (47, 0.058), (48, 0.105), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.91466159 124 nips-2009-Lattice Regression
Author: Eric Garcia, Maya Gupta
Abstract: We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient implementation. Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. Experiments show that lattice regression can reduce mean test error by as much as 25% compared to Gaussian process regression (GPR) for digital color management of printers, an application for which linearly interpolating a look-up table is standard. Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice. Surprisingly, in some cases the proposed method — although motivated by computational efficiency — performs better than directly applying GPR with no lattice at all. 1
2 0.57782906 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares
Author: Jacob Goldberger, Amir Leshem
Abstract: This paper proposes a new algorithm for the linear least squares problem where the unknown variables are constrained to be in a finite set. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, applying the Belief Propagation (BP) algorithm yields very poor results. The algorithm described here is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. It is shown that even though the approximation is not directly applied to the exact discrete distribution, applying the BP algorithm to the modified factor graph outperforms current methods in terms of both performance and complexity. The improved performance of the proposed algorithm is demonstrated on the problem of MIMO detection.
3 0.44881314 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
Author: Roy Anati, Kostas Daniilidis
Abstract: We present a system which constructs a topological map of an environment given a sequence of images. This system includes a novel image similarity score which uses dynamic programming to match images using both the appearance and relative positions of local features simultaneously. Additionally, an MRF is constructed to model the probability of loop-closures. A locally optimal labeling is found using Loopy-BP. Finally we outline a method to generate a topological map from loop closure data. Results, presented on four urban sequences and one indoor sequence, outperform the state of the art. 1
4 0.43829784 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints
Author: Xiaolin Yang, Seyoung Kim, Eric P. Xing
Abstract: Multitask learning addresses the problem of learning related tasks that presumably share some commonalities on their input-output mapping functions. Previous approaches to multitask learning usually deal with homogeneous tasks, such as purely regression tasks, or entirely classification tasks. In this paper, we consider the problem of learning multiple related tasks of predicting both continuous and discrete outputs from a common set of input variables that lie in a highdimensional feature space. All of the tasks are related in the sense that they share the same set of relevant input variables, but the amount of influence of each input on different outputs may vary. We formulate this problem as a combination of linear regressions and logistic regressions, and model the joint sparsity as L1 /L∞ or L1 /L2 norm of the model parameters. Among several possible applications, our approach addresses an important open problem in genetic association mapping, where the goal is to discover genetic markers that influence multiple correlated traits jointly. In our experiments, we demonstrate our method in this setting, using simulated and clinical asthma datasets, and we show that our method can effectively recover the relevant inputs with respect to all of the tasks. 1
5 0.42986849 81 nips-2009-Ensemble Nystrom Method
Author: Sanjiv Kumar, Mehryar Mohri, Ameet Talwalkar
Abstract: A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. We introduce a new family of algorithms based on mixtures of Nystr¨ m o approximations, ensemble Nystr¨ m algorithms, that yield more accurate low-rank o approximations than the standard Nystr¨ m method. We give a detailed study of o variants of these algorithms based on simple averaging, an exponential weight method, or regression-based methods. We also present a theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nystr¨ m method. Finally, we report results of extensive o experiments with several data sets containing up to 1M points demonstrating the significant improvement over the standard Nystr¨ m approximation. o 1
6 0.41816631 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
7 0.39152715 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
8 0.38650891 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
9 0.37285367 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
10 0.37235099 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
11 0.37050942 39 nips-2009-Bayesian Belief Polarization
12 0.34934273 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs
13 0.34197932 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
14 0.33964843 187 nips-2009-Particle-based Variational Inference for Continuous Systems
15 0.33668104 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
16 0.33414844 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
17 0.31633809 91 nips-2009-Fast, smooth and adaptive regression in metric spaces
18 0.31038859 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
19 0.30763376 237 nips-2009-Subject independent EEG-based BCI decoding
20 0.30193987 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
topicId topicWeight
[(24, 0.036), (25, 0.041), (35, 0.064), (36, 0.083), (39, 0.07), (55, 0.01), (58, 0.086), (61, 0.023), (71, 0.039), (74, 0.308), (81, 0.029), (86, 0.074), (91, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.76701903 124 nips-2009-Lattice Regression
Author: Eric Garcia, Maya Gupta
Abstract: We present a new empirical risk minimization framework for approximating functions from training samples for low-dimensional regression applications where a lattice (look-up table) is stored and interpolated at run-time for an efficient implementation. Rather than evaluating a fitted function at the lattice nodes without regard to the fact that samples will be interpolated, the proposed lattice regression approach estimates the lattice to minimize the interpolation error on the given training samples. Experiments show that lattice regression can reduce mean test error by as much as 25% compared to Gaussian process regression (GPR) for digital color management of printers, an application for which linearly interpolating a look-up table is standard. Simulations confirm that lattice regression performs consistently better than the naive approach to learning the lattice. Surprisingly, in some cases the proposed method — although motivated by computational efficiency — performs better than directly applying GPR with no lattice at all. 1
2 0.49503708 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
Author: Ilya Sutskever, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data.
3 0.49308971 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
Author: Piyush Rai, Hal Daume
Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1
4 0.49120092 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling
Author: Lei Shi, Thomas L. Griffiths
Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1
5 0.49005869 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
Author: Keith Bush, Joelle Pineau
Abstract: Interesting real-world datasets often exhibit nonlinear, noisy, continuous-valued states that are unexplorable, are poorly described by first principles, and are only partially observable. If partial observability can be overcome, these constraints suggest the use of model-based reinforcement learning. We experiment with manifold embeddings to reconstruct the observable state-space in the context of offline, model-based reinforcement learning. We demonstrate that the embedding of a system can change as a result of learning, and we argue that the best performing embeddings well-represent the dynamics of both the uncontrolled and adaptively controlled system. We apply this approach to learn a neurostimulation policy that suppresses epileptic seizures on animal brain slices. 1
6 0.48884109 70 nips-2009-Discriminative Network Models of Schizophrenia
7 0.4877547 104 nips-2009-Group Sparse Coding
8 0.48606819 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
9 0.48535436 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
10 0.48350626 3 nips-2009-AUC optimization and the two-sample problem
11 0.48322928 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
12 0.48270857 148 nips-2009-Matrix Completion from Power-Law Distributed Samples
13 0.48193008 113 nips-2009-Improving Existing Fault Recovery Policies
14 0.48148161 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
15 0.48090038 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
16 0.48081025 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
17 0.47989058 100 nips-2009-Gaussian process regression with Student-t likelihood
18 0.47966605 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity
19 0.47956318 137 nips-2009-Learning transport operators for image manifolds
20 0.47938085 169 nips-2009-Nonlinear Learning using Local Coordinate Coding