nips nips2012 nips2012-304 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar
Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. [sent-6, score-0.622]
2 We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. [sent-8, score-1.008]
3 These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. [sent-9, score-0.625]
4 We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. [sent-10, score-0.511]
5 In the first, features are greedily selected one by one up to the pre-specified budget k; the Forward or Backward greedy methods[19] fall into this type. [sent-18, score-0.32]
6 In the second, the feature selection process is intimately coupled with the regression objective itself by adding a (usually convex) regularizer. [sent-19, score-0.239]
7 For example, the Lasso [20] uses the 1 -norm of the coefficients as a regularizer to promote sparsity. [sent-20, score-0.218]
8 In this work we consider the feature selection problem of choosing the best set of features for predicting a specified target, coupled with the desire to choose as “diverse” features as possible; our goal will be to construct a regularizer that can capture diversity. [sent-21, score-0.543]
9 Secondly, as we show, the right notion of diversity can also make the feature selection task resistant to noise in the data. [sent-26, score-0.462]
10 1 Unfortunately, the traditional greedy and 1 -relaxation approaches to feature-selection do not explictly address feature diversity1 . [sent-32, score-0.258]
11 In this paper, we address this problem of diverse feature selection using an approach that falls between that of greedy methods and convex-regularization methods. [sent-33, score-0.502]
12 In particular, we construct regularizers that capture a notion of diversity — unlike regularizers such as Lasso, our regularizers are set functions as opposed to functions of the regression coefficient vector. [sent-34, score-0.972]
13 We then design provable approximation algorithms for such objectives using a combination of greedy and local search techniques. [sent-36, score-0.31]
14 By defining diversity to be a carefully chosen function of the spectrum of the chosen features, we tap into notions of submodularity and consequently into the rich literature for maximizing submodular functions [5, 9, 14]. [sent-38, score-0.878]
15 Our contributions are as follows: (i) We formulate an optimization problem for diverse feature selection and construct a family of submodular spectral regularizers that capture diversity notions. [sent-39, score-1.196]
16 (ii) We use a novel approach of combining the diversity regularizers with the optimization objective to obtain (approximately) submodular maximization problems, and optimize them using greedy and local search algorithms with provable guarantees. [sent-40, score-1.087]
17 2 Related work Feature selection and the closely related problems of sparse approximation/recovery have been extensively studied using two broad classes of methods: greedy [5, 19, 21, 11, 24] and convex relaxation [20, 25, 3, 22, 8]. [sent-42, score-0.271]
18 None of these methods, however, takes feature diversity into the account during selection. [sent-43, score-0.333]
19 However, they do not incorporate any notion of feature diversity; they also require monotonicity, which does not hold for several regularizers we construct. [sent-45, score-0.311]
20 [12], who address the unstable behavior of Lasso in the presence of correlated features and propose adding a trace norm regularizer to the error objective. [sent-47, score-0.359]
21 Previous work on diverse feature selection includes greedy heuristics for trading-off informationtheoretic feature relevance and feature redundancy criteria when selecting features [7, 23]. [sent-50, score-0.794]
22 There has been some work on selecting a diverse set of features to maximize the mutual information or the entropy of a set of variables [13, 17]. [sent-52, score-0.342]
23 But, the problem definition in these works does not specify a target prediction vector or variable; the goal instead is to select diverse features regardless of whether the features are relevant for predicting a particular target variable. [sent-53, score-0.423]
24 On the other hand, our work requires us to simultaneously optimize for both feature selection and diversity objectives. [sent-54, score-0.418]
25 For a n-dimensional Gaussian random vector v with covariance matrix C, we use H(v) = 1 n 2 log((2πe) det(C)) to denote the differential entropy of v. [sent-71, score-0.184]
26 Note that diversity is not a uniquely-defined notion, however, we call a regularizer f to be diversitypromoting if the following two conditions are satisfied: for a fixed k, f (S) is maximized when S is an orthogonal set of vectors and is minimized when S has the lowest rank, where |S| ≤ k. [sent-75, score-0.523]
27 Das and Kempe [5] introduced the notion of submodularity ratio for a general set function to capture how close is the function to being submodular. [sent-84, score-0.274]
28 submodularity ratio of f with respect to a set U and a parameter k ≥ x∈S f (L ∪ {x}) − f (L) . [sent-86, score-0.23]
29 In particular, [5] defines the submodularity ratio for the R2 function and relates it to the smallest eigenvalue of the covariance matrix of the data. [sent-88, score-0.284]
30 They also show that, in practice, the submodularity ratio for R2 is often quite close to 1, and hence a greedy algorithm is a good approximation to maximizing R2 subject to a cardinality constraint. [sent-89, score-0.507]
31 1 Robustness to perturbations As mentioned earlier, in addition to providing better interpretability, another benefit of diverse feature selection is robustness to feature and label perturbations. [sent-93, score-0.454]
32 Given a selected subset S, we now obtain a connection between the robustness of the estimated regression coefficients and the spectrum of CS , in the presence of noise. [sent-94, score-0.253]
33 We show the following perturbation result relating the differential entropy of the perturbation error in the regression coefficients to the spectrum of CS . [sent-107, score-0.456]
34 k Thus the perturbation error entropy is minimized by maximizing i=1 log(λi (CS )), which motivates the smoothed differential-entropy regularizer used in Section 5. [sent-114, score-0.5]
35 We can also show (supplementary material) that the two-norm of the perturbation error in the regression coefficients is also related to the spectrum of CS : the expected noise in the regression −1 1 coefficients depends on the sum of the eigenvalues of CS . [sent-116, score-0.367]
36 This suggests the use of − i λi (CS ) as a diversity-promoting regularizer in Definition 1. [sent-117, score-0.218]
37 Unfortunately, this regularization function is not submodular and is thus hard to use directly. [sent-118, score-0.371]
38 3, there are other related spectral functions that are indeed submodular and can thus be used as efficient regularizers. [sent-121, score-0.424]
39 4 Algorithms In this section we present a greedy and local-search based (GLS) approximation algorithm for solving (1) when f (S) is a non-negative (but not necessarily monotone) submodular function (w. [sent-122, score-0.573]
40 Next, we modify a local-search based algorithm for unconstrained submodular maximization to give an approximation of argmaxS g(S) (Theorem 7). [sent-129, score-0.449]
41 The greedy Forward Regression (FR) algorithm is the following. [sent-131, score-0.186]
42 Theorem 5 For any set T such that |T | ≤ k, the set S selected by the greedy FR algorithm satisfies γS,2k 2 g(S) = RZ (S) + f (S) ≥ (1 − e− 2 )g(S ∪ T ). [sent-139, score-0.236]
43 n We now present the greedy and local search (GLS) algorithm for solving (1) for any submodular, non-monotone, non-negative regularizer. [sent-161, score-0.21]
44 Using the submodularity of f and the monotonicity of RZ (S), we obtain g(S1 ∪C)+g(S2 ∪ 2 2 2 C ) = RZ (S1 ∪ C) + RZ (S2 ∪ C ) + f (S1 ∪ C) + f (S2 ∪ C ) ≥ RZ (C) + f (S1 ∪ S2 ∪ C) + f (C ). [sent-172, score-0.243]
45 Theorem 9 If f is non-negative and submodular and < gives a γ˜ S,2k 1−e− 2 γ˜ S,2k 2+(1−e− 2 )(4+4 ≥ /n) γ˜ S,2k 1−e− 2 7 n ˜ 4 , the set S selected by the GLS algorithm approximation for solving argmaxS:|S|≤k g(S). [sent-175, score-0.437]
46 2 When f (S) is a monotone, non-negative, submodular function, the problem becomes much easier due to the proposition below that follows directly from the definition of the submodularity ratio. [sent-186, score-0.53]
47 2 Proposition 10 For any submodular set function f (S), the function g(S) = RZ (S)+f (S) satisfies γU,k (g) ≥ γU,k (R2 ) for any U and k. [sent-187, score-0.337]
48 Thus, since g(S) is monotone and approximately submodular, we can directly apply [4, Theorem 3] ˜ to show that the greedy FR algorithm gives a (1 − e−γS,k (f ) )-approximation. [sent-188, score-0.327]
49 5 Spectral regularizers for diversity In this section we propose a number of diversity-promoting regularizers for the feature selection problem. [sent-189, score-0.808]
50 We then prove that our algorithms in the previous section can obtain provable guarantees for each of the corresponding regularized feature selection problems. [sent-190, score-0.229]
51 Most of our analysis requires the notion of operator antitone function [1] and its connection with submodularity that was recently obtained by Friedland and Gaubert [10]. [sent-191, score-0.379]
52 If the derivative of f is operator antitone on the interior of Γ, then for every n × n Hermitian matrix C with n spectrum in Γ, the set function (from 2n −→ R) tr(f (S)) = i=1 f (λi (CS )) is submodular. [sent-194, score-0.244]
53 1 Smoothed differential entropy regularizer For any set S with the corresponding covariance matrix CS , we define the smoothed differential |S| entropy regularizer as fde (S) = i=1 log2 (δ + λi (CS )) − 3k log2 δ, where δ > 0 is the smoothing constant. [sent-199, score-1.032]
54 This is a smoothed version of the log-determinant function fld (S) = log(det(CS )) = |S| i=1 log(λi (CS )), that is also normalized by an additive term of 3k log2 δ in order to make the regularizer non-negative 3 . [sent-200, score-0.367]
55 As shown in Lemma 4, this regularizer also helps improve the robustness of the regression model to noise since maximizing fld (S) minimizes the entropy of the perturbation error. [sent-201, score-0.627]
56 For a multivariate Gaussian distribution, fld (S) also equivalent (up to an additive |S| factor) to the differential entropy of S. [sent-202, score-0.218]
57 However, fld (S) is undefined if S is rank-deficient and might also take negative values; the smoothed version fde (S) overcomes these issues. [sent-203, score-0.37]
58 It is also easy to show that fde (S) is a diversitypromoting regularizer. [sent-204, score-0.265]
59 We now show that the GLS algorithm to solve (1) with f (S) = fde (S) gives a constant-factor approximation algorithm. [sent-205, score-0.271]
60 − γ˜ S,2k ˜ Theorem 14 The set S selected by the GLS algorithm gives a 1−e 7 2 multiplicative approximation guarantee for (1) using the smoothed differential entropy regularizer fde (S). [sent-206, score-0.793]
61 We first prove that fde (S) is non-negative and submodular. [sent-208, score-0.243]
62 Since h(t) is the derivative of f (t), a straightforward application of Theorem 12 gives us that fde (S) is submodular. [sent-216, score-0.247]
63 By Proposition 10, we obtain that g(S) is approximately submodular, with submodularity ratio at least γS,k (R2 ) . [sent-217, score-0.258]
64 Notice that since fde (S) is not monotone in general [13], we cannot use Theorem 3. [sent-219, score-0.334]
65 However, in the case when δ ≥ 1, a simple application of Lemma 13 shows that fde (S) becomes monotonically increasing and we can then use Theorem 3 to obtain a tighter approximation bound. [sent-220, score-0.298]
66 2 Generalized rank regularizer For any set S with covariance matrix CS , and constant α such that 0 ≤ α ≤ 1, we define the gen|S| eralized rank regularizer as fgr (S) = i=1 λi (CS )α . [sent-222, score-0.805]
67 The rank function however, does not discriminate between a full-rank matrix and an orthogonal matrix, and hence we define fgr (S) as a generalization of the rank function. [sent-224, score-0.345]
68 It is easy to show that fgr (S) is diversity-promoting. [sent-225, score-0.199]
69 We prove that fgr (S) is also monotone and submodular, and hence obtain approximation guarantees for the greedy FR algorithm for (1) with f (S) = fgr (S). [sent-226, score-0.769]
70 2 ˜ ˜ Theorem 15 The set S selected by the greedy FR algorithm gives a (1 − e−γS,k (R ) ) multiplicative approximation guarantee for (1) using the generalized rank regularizer fgr (S). [sent-227, score-0.824]
71 3 we need this regularizer to be non-negative for sets of size up to 3k, because of the use of f (S1 ∪ S2 ∪ C) in the proof of Lemma 8 6 ˜ Proof. [sent-228, score-0.218]
72 Thus, Theorem 12 gives us that fgr (S) is submodular. [sent-231, score-0.199]
73 the derivative of f Hence, by applying Lemma 10, we obtain that g(S) is an approximately submodular function, with ˜ submodularity ratio at least γS,k (R2 ) . [sent-232, score-0.621]
74 ˜ Thus, using Lemma 13, we get that fgr (S) and consequently g(S) is a monotonically increasing set function. [sent-234, score-0.226]
75 3 Spectral variance regularizer For a set S with covariance matrix CS , we define the spectral variance regularizer as |S| − i=1 (λi (CS ) − 1)2 . [sent-237, score-0.623]
76 For non-negativity, we add a constant 9k 2 term4 to the regularizer and define fsv (S) = |S| 9k 2 − i=1 (λi (CS ) − 1)2 . [sent-239, score-0.284]
77 As with fde (S), we can show (proof relegated to the supplementary material) that fsv (S) is submodular, but it is not monotonically increasing in general. [sent-240, score-0.314]
78 γ˜ S,2k − ˜ Theorem 16 The set S selected by the GLS algorithm gives a 1−e 7 2 tion guarantee for (1) using the spectral variance regularizer fsv (S). [sent-242, score-0.474]
79 We compare our approach against two baselines: Lasso and greedy FR. [sent-244, score-0.186]
80 We use two baselines: lasso and no-reg, the greedy FR with no regularizer. [sent-253, score-0.347]
81 If S is the set of features selected, then the η −1 T unperturbed regression coefficients are defined as α = CS XS Z, and the perturbed coefficients as −1 T α = CS XS Z . [sent-264, score-0.236]
82 Here again, the perturbed regression coefficients −1 T are α = CS XS y where CS = (XS )T XS and the error is measured as α − α 2 . [sent-269, score-0.176]
83 For clarity of presentation, we have only shown the results of greedy FR for monotone regularizers (ld and gr) and GLS for nonmonotone (ld-0. [sent-274, score-0.494]
84 55 10 90 lasso no−reg logdet 20 30 Number of features selected 0. [sent-304, score-0.439]
85 65 lasso no−reg spec−var 20 30 40 50 60 Number of features selected 70 80 90 10 lasso no−reg gen−rank 20 30 40 50 60 Number of features selected Figure 1: All plots on mnist data. [sent-326, score-0.633]
86 We chose the regularization constant (ν) to be the maximum subject to the condition that the R2 value for that solution should be greater than that obtained by the lasso solution with corresponding sparsity. [sent-339, score-0.195]
87 This ensures we are not sacrificing diversity for solution quality. [sent-340, score-0.261]
88 As is obvious from the figure, the coefficient vector obtained by lasso is very susceptible to perturbation, and the effect of perturbation increases with the number of features used by lasso. [sent-342, score-0.332]
89 This indicates that as lasso starts incorporating more features, it does not ensure that the features are diverse enough so as to be robust to perturbation. [sent-343, score-0.404]
90 Greedy with no regularization seems more stable than lasso but still shows an increasing trend. [sent-344, score-0.195]
91 On the other hand, the errors obtained by perturbing is much less for any of the regularizers, and is only very mildly increasing with k: it does not seem to matter which regularizer we employ. [sent-345, score-0.282]
92 In both cases, using any of the regularizers we are able to pick a set of features that are more robust to perturbation. [sent-347, score-0.279]
93 Figures 1(c)- 1(f) show that our features are also more diverse than the ones obtained by both lasso and no-reg. [sent-348, score-0.404]
94 Since there is no one definition of diversity, in each of the plots, we take one of the definitions of diversity value corresponding to the four regularizers we use. [sent-349, score-0.456]
95 In order to be able to compare, the regularizer values for each k are normalized by the maximum value possible for that k. [sent-350, score-0.218]
96 For each of the plots we show the values of the diversity value for solutions at different levels of sparsity. [sent-351, score-0.261]
97 It is obvious that we get more diverse solutions than both lasso and no-reg. [sent-352, score-0.32]
98 7 Conclusions In this paper we proposed submodular spectral regularizers for diverse feature selection and obtained efficient approximation algorithms using greedy and local search algorithms. [sent-354, score-1.195]
99 Minimum redundancy feature selection from microarray gene expression data. [sent-396, score-0.219]
100 Adaptive forward-backward greedy algorithm for sparse learning with linear models. [sent-494, score-0.186]
wordName wordTfidf (topN-words)
[('cs', 0.341), ('submodular', 0.337), ('diversity', 0.261), ('fde', 0.221), ('regularizer', 0.218), ('fgr', 0.199), ('regularizers', 0.195), ('submodularity', 0.193), ('greedy', 0.186), ('rz', 0.186), ('gls', 0.177), ('lasso', 0.161), ('diverse', 0.159), ('logdet', 0.144), ('fr', 0.118), ('monotone', 0.113), ('ls', 0.102), ('argmaxs', 0.097), ('gen', 0.097), ('reg', 0.096), ('antitone', 0.088), ('fld', 0.088), ('perturbation', 0.087), ('spectral', 0.087), ('selection', 0.085), ('features', 0.084), ('regression', 0.082), ('opt', 0.079), ('spec', 0.078), ('feature', 0.072), ('perturbed', 0.07), ('entropy', 0.069), ('fsv', 0.066), ('perturbing', 0.064), ('hermitian', 0.064), ('differential', 0.061), ('smoothed', 0.061), ('lemma', 0.061), ('das', 0.059), ('rank', 0.058), ('cients', 0.056), ('xs', 0.055), ('coef', 0.055), ('operator', 0.054), ('theorem', 0.051), ('provable', 0.05), ('monotonicity', 0.05), ('selected', 0.05), ('approximation', 0.05), ('interpretability', 0.049), ('target', 0.048), ('det', 0.046), ('eigenvalues', 0.046), ('spectrum', 0.046), ('notion', 0.044), ('clash', 0.044), ('cpam', 0.044), ('diversitypromoting', 0.044), ('mnist', 0.043), ('robustness', 0.042), ('maximizing', 0.041), ('kempe', 0.039), ('friedland', 0.039), ('grave', 0.039), ('bs', 0.037), ('ratio', 0.037), ('anirban', 0.036), ('redundancy', 0.034), ('regularization', 0.034), ('maximization', 0.034), ('correlated', 0.033), ('multiplicative', 0.033), ('subset', 0.033), ('perturb', 0.032), ('selecting', 0.03), ('matrix', 0.03), ('guarantee', 0.03), ('approximately', 0.028), ('xj', 0.028), ('mountain', 0.028), ('microarray', 0.028), ('unconstrained', 0.028), ('monotonically', 0.027), ('ld', 0.026), ('gr', 0.026), ('nition', 0.026), ('semide', 0.026), ('derivative', 0.026), ('sv', 0.026), ('material', 0.025), ('si', 0.025), ('perturbations', 0.024), ('error', 0.024), ('yahoo', 0.024), ('search', 0.024), ('covariance', 0.024), ('cov', 0.023), ('var', 0.023), ('variance', 0.023), ('prove', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 304 nips-2012-Selecting Diverse Features via Spectral Regularization
Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar
Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1
2 0.22361772 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes
Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar
Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1
3 0.2000989 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
Author: Aaron Defazio, Tibério S. Caetano
Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.
4 0.19883104 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications
Author: Rishabh Iyer, Jeff A. Bilmes
Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1
Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown
Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.
6 0.10041688 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover
7 0.099266768 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
8 0.09449178 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem
9 0.084749252 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
10 0.083783254 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing
11 0.083247669 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
12 0.08095821 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
13 0.074028097 319 nips-2012-Sparse Prediction with the $k$-Support Norm
14 0.072327837 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
15 0.067755565 16 nips-2012-A Polynomial-time Form of Robust Regression
16 0.065943733 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
17 0.065908588 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent
18 0.065832011 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
19 0.063635632 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
20 0.063518435 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
topicId topicWeight
[(0, 0.184), (1, 0.046), (2, 0.078), (3, -0.108), (4, 0.059), (5, 0.069), (6, -0.018), (7, -0.034), (8, -0.018), (9, -0.014), (10, -0.014), (11, 0.011), (12, -0.021), (13, 0.072), (14, 0.136), (15, 0.018), (16, 0.005), (17, -0.026), (18, -0.044), (19, -0.058), (20, 0.171), (21, 0.132), (22, 0.087), (23, 0.004), (24, 0.308), (25, 0.122), (26, -0.113), (27, -0.005), (28, -0.008), (29, -0.035), (30, -0.028), (31, 0.057), (32, 0.059), (33, 0.036), (34, -0.13), (35, 0.151), (36, -0.008), (37, 0.006), (38, -0.022), (39, 0.045), (40, 0.034), (41, -0.062), (42, -0.034), (43, -0.039), (44, 0.021), (45, -0.018), (46, 0.008), (47, -0.042), (48, -0.016), (49, -0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.91124618 304 nips-2012-Selecting Diverse Features via Spectral Regularization
Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar
Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1
2 0.8477686 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes
Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar
Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1
3 0.81927639 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications
Author: Rishabh Iyer, Jeff A. Bilmes
Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1
4 0.63407069 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover
Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov
Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1
5 0.61635464 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
Author: Aaron Defazio, Tibério S. Caetano
Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.
6 0.49974272 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem
7 0.43178809 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
8 0.42337477 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
9 0.40451664 34 nips-2012-Active Learning of Multi-Index Function Models
10 0.39644754 221 nips-2012-Multi-Stage Multi-Task Feature Learning
11 0.39437261 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
12 0.38253438 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
13 0.36561489 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection
14 0.36545563 86 nips-2012-Convex Multi-view Subspace Learning
15 0.36252829 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning
16 0.35966742 359 nips-2012-Variational Inference for Crowdsourcing
17 0.3529869 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means
18 0.34518543 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
19 0.34339949 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
20 0.34208196 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
topicId topicWeight
[(0, 0.059), (3, 0.223), (21, 0.034), (38, 0.168), (39, 0.022), (42, 0.036), (54, 0.019), (55, 0.016), (68, 0.011), (74, 0.082), (76, 0.135), (80, 0.081), (92, 0.035)]
simIndex simValue paperId paperTitle
1 0.81488299 210 nips-2012-Memorability of Image Regions
Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1
same-paper 2 0.81290281 304 nips-2012-Selecting Diverse Features via Spectral Regularization
Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar
Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1
3 0.73981237 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto
Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of infinite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1
4 0.73838764 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity
Author: Odalric-ambrym Maillard
Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1
5 0.73694092 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
6 0.73525178 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
7 0.7335034 69 nips-2012-Clustering Sparse Graphs
8 0.73342043 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes
9 0.7329064 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
10 0.73281825 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
11 0.73252016 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
12 0.73150492 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
13 0.73088777 260 nips-2012-Online Sum-Product Computation Over Trees
14 0.72981662 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
15 0.72847331 38 nips-2012-Algorithms for Learning Markov Field Policies
16 0.72818899 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
17 0.72810972 162 nips-2012-Inverse Reinforcement Learning through Structured Classification
18 0.72717106 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
19 0.72693199 187 nips-2012-Learning curves for multi-task Gaussian process regression
20 0.72610068 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes