nips nips2012 nips2012-290 knowledge-graph by maker-knowledge-mining

290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming


Source: pdf

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. [sent-4, score-1.261]

2 The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. [sent-5, score-0.498]

3 We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. [sent-6, score-0.91]

4 As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. [sent-7, score-0.436]

5 A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. [sent-8, score-0.88]

6 We then develop a penalized version for the noisy setting which can be solved using second order cone programs. [sent-9, score-0.284]

7 The proposed method outperforms known rescaling heuristics based on 1 norm. [sent-10, score-0.213]

8 As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. [sent-11, score-0.522]

9 1 Introduction We consider optimization problems of the following form, p∗ = min x∈C, 1T x=1, x≥0 f (x) + λcard(x) where f is a convex function, C is a convex set, card(x) denotes the number of nonzero elements of x and λ ≥ 0 is a given tradeoff parameter for adjusting desired sparsity. [sent-12, score-0.57]

10 Since the cardinality penalty is inherently of combinatorial nature, these problems are in general not solvable in polynomial-time. [sent-13, score-0.525]

11 In recent years 1 norm penalization as a proxy for penalizing cardinality has attracted a great deal of attention in machine learning, statistics, engineering and applied mathematics [1], [2], [3], [4]. [sent-14, score-0.618]

12 However the aforementioned types of sparse probability optimization problems are not amenable to the 1 heuristic since x 1 = 1T x = 1 is constant on the probability simplex. [sent-15, score-0.431]

13 Numerous problems in machine learning, statistics, finance and signal processing fall into this category however to the authors’ knowledge there is no known general convex optimization strategy for such problems constrained on the probability simplex. [sent-16, score-0.408]

14 , maxi xi can be used as a convex heuristic for penalizing cardinality on the probability simplex and the resulting relaxations can be solved via convex optimization. [sent-19, score-1.638]

15 Figure 1(a) and 1(b) depict the level sets and an example of a sparse probability measure which has maximal infinity norm. [sent-20, score-0.212]

16 In the following sections we expand our discussion by exploring two specific problems: recovering a measure from given moments where f = 0 and C is affine, and convex clustering where f is a log-likelihood and C = R. [sent-21, score-0.531]

17 For the former case we give a sufficient condition for this convex relaxation to exactly recover the minimal cardinality solution of p∗ . [sent-22, score-0.965]

18 We then present numerical simulations for the both problems which suggest that the proposed scheme offers a very efficient convex relaxation for penalizing cardinality on the probability simplex. [sent-23, score-1.012]

19 When constrained to the probability simplex, the lower-bound for the cardinality simply becomes 1 maxi xi ≤ card(x). [sent-25, score-0.805]

20 However below we show that the above problem can be exactly solved using convex programming. [sent-27, score-0.272]

21 The lower-bounding problem defined by p∗ can be globally solved using the ∞ following n convex programs in n + 1 dimensions: p∗ ≥ p∗ = min ∞ i=1,. [sent-30, score-0.467]

22 ,n min x∈C, 1T x=1, x≥0, t≥0 f (x) + t : xi ≥ λ/t . [sent-33, score-0.211]

23 (2) Note that the constraint xi ≥ λ/t is jointly convex since 1/t is convex in t ∈ R+ , and they can be handled in most of the general purpose convex optimizers, e. [sent-34, score-0.712]

24 cvx, using either the positive inverse function or rotated cone constraints. [sent-36, score-0.054]

25 p∗ ∞ = min x∈C, 1T x=1, x≥0 = min = min i i λ xi λ f (x) + xi f (x) + min min x∈C, 1T x=1, x≥0 min x∈C, 1T x=1, (3) i f (x) + t s. [sent-38, score-0.83]

26 x≥0,t≥0 (4) λ ≤t xi (5) The above formulation can be used to efficiently approximate the original cardinality constrained problem by lower-bounding for arbitrary convex f and C. [sent-40, score-0.808]

27 In the next section we show how to compute the quality of approximation. [sent-41, score-0.041]

28 1 Computing a bound on the quality of approximation By the virtue of being a relaxation to the original cardinality problem, we have the following remarkable property. [sent-43, score-0.648]

29 , we find a solution for the original cardinality penalized problem, if the following holds: f (ˆ) + λcard(ˆ) = p∗ x x ∞ It should be noted that for general cardinality penalized problems, using 1 heuristic does not yield such a quality bound, since it is not a lower or upper bound in general. [sent-47, score-1.404]

30 Moreover most of the known equivalence conditions for 1 heuristics such as Restricted Isometry Property and variants are NPhard to check. [sent-48, score-0.036]

31 Therefore a remarkable property of the proposed scheme is that it comes with a simple computable bound on the quality of approximation. [sent-49, score-0.153]

32 3 Recovering a Sparse Measure Suppose that µ is a discrete probability measure and we would like to know the sparsest measure satisfying some arbitrary moment constraints: p∗ = min card(µ) : Eµ [Xi ] = bi , i = 1, . [sent-50, score-0.486]

33 , m µ where Xi ’s are random variables and Eµ denotes expectation with respect to the measure µ. [sent-53, score-0.069]

34 One motivation for the above problem is the fact that it upper-bounds the minimum entropy power problem: p∗ ≥ min exp H(µ) : Eµ [Xi ] = bi , i = 1, . [sent-54, score-0.206]

35 Both of the above problems are non-convex and in general very hard to solve. [sent-58, score-0.033]

36 Below we show how the proposed scheme applies to this problem. [sent-66, score-0.072]

37 Using general form in (2), the proposed relaxation is given by the following, (p∗ )−1 ≤ (p∗ )−1 = max ∞ i=1,. [sent-67, score-0.148]

38 (10) which can be solved very efficiently by solving n linear programs in n variables. [sent-71, score-0.164]

39 It’s easy to check that strong duality holds and the dual problems are given by the following: (p∗ )−1 = max ∞ min wT b + λ : AT w + λ1 ≥ ei . [sent-73, score-0.246]

40 ,n w, λ (11) where 1 is the all ones vector and ei is all zeros with a one in only i’th coordinate. [sent-77, score-0.035]

41 Define xi : ˆ = arg max xmin : ˆ = arg min card(ˆi ) x 1T x=1, x≥0 xi : Ax = b i=1,. [sent-80, score-0.48]

42 ,n (12) (13) The following theorem gives a sufficient condition for the recovery of a sparse measure using the above method. [sent-83, score-0.381]

43 Assume that the solution to p∗ in (7) is unique and given by x∗ . [sent-86, score-0.101]

44 AS x = AS c y > 0 1T x=1, y≥0, 1T y=1 ∗ where b = Ax and AS is the submatrix containing columns of A corresponding to non-zero elements of x∗ and AS c is the submatrix of remaining columns, then the convex linear program max 1T x=1, x≥0 xi : Ax = b has a unique solution given by x∗ . [sent-89, score-0.593]

45 , am ) denote the convex hull of the m vectors {a1 , . [sent-93, score-0.201]

46 The following corollary depicts a geometric condition for recovery. [sent-97, score-0.088]

47 If Conv(AS c ) does not intersect an extreme point of Conv(AS ) then xmin = x∗ , ˆ i. [sent-100, score-0.171]

48 we recover the minimum cardinality solution using n linear programs. [sent-102, score-0.638]

49 Proof Outline: Consider k’th inner linear program defined in the problem p∗ . [sent-103, score-0.052]

50 Since the solution of p∗ is unique and has minimum cardinality, we conclude that x∗ is indeed the unique solution to the k’th linear program. [sent-105, score-0.273]

51 Applying Farkas’ lemma and duality theory we arrive at the conditions defined in Theorem 3. [sent-106, score-0.07]

52 The corollary follows by first observing that the condition of Theorem 3. [sent-108, score-0.088]

53 1 is satisfied if Conv(AS c ) does not intersect an extreme point of Conv(AS ). [sent-109, score-0.053]

54 Finally observe that if any of the n linear programs recover the minimal cardinality solution then xmin = x∗ , since card(ˆmin ) ≤ card(ˆk ), ∀k. [sent-110, score-0.816]

55 2 Noisy measure recovery When the data contains noise and inaccuracies, such as the case when using empirical moments instead of exact moments, we propose the following noise-aware robust version, which follows from the general recipe given in the first section: min min i=1,. [sent-112, score-0.595]

56 ,n 1T x=1, x≥0,t≥0 Ax − b 2 2 + t : xi ≥ λ/t . [sent-115, score-0.109]

57 (16) where λ ≥ 0 is a penalty parameter for encouraging sparsity. [sent-116, score-0.031]

58 The above problem can be solved using n second-order cone programs in n + 1 variables, hence has O(n4 ) worst case complexity. [sent-117, score-0.218]

59 The proposed measure recovery algorithms are investigated and compared with a known suboptimal heuristic in Section 6. [sent-118, score-0.439]

60 4 Convex Clustering In this section we base our discussion on the exemplar based convex clustering framework of [8]. [sent-119, score-0.321]

61 For the standard multivariate Nor2 mal distribution we have f (zi ; mj ) = e−β zi −mj 2 for some parameter β > 0. [sent-124, score-0.151]

62 As in [8] we’ll further assume that the mean parameter mj is one of the examples zi which is unknown a-priori. [sent-125, score-0.151]

63 This assumption helps to simply the log-likelihood whose data dependence is now only through a 2 kernel matrix Kij := e−β zi −zj 2 as follows   n k 2 1 log  xj e−β zi −zj 2  (17) L = n i=1 j=1   n k 1 = log  xj Kij  (18) n i=1 j=1 Partitioning the data {z1 , . [sent-126, score-0.242]

64 , zn } into few clusters is equivalent to have a sparse mixture x, i. [sent-129, score-0.242]

65 , each example is assigned to few centers (which are some other examples). [sent-131, score-0.037]

66 1 Algorithms Exponentiated Gradient Exponentiated gradient [7] is a proximal algorithm to optimize over the probability simplex which i employs the Kullback-Leibler divergence D(x, y) = i xi log xi between two probability distriy butions. [sent-135, score-0.538]

67 1 Numerical Results Recovering a Measure from Gaussian Measurements Here we show that the proposed recovery scheme is able to recover a sparse measure exactly with overwhelming probability, when the matrix A ∈ Rm×n is chosen from the independent Gaussian ensemble, i. [sent-138, score-0.448]

68 As an alternative method we consider a commonly employed simple heuristic to optimize over a probability measure which first drops the constraint 1T x = 1 and solves the corresponding 1 penalized problem. [sent-142, score-0.443]

69 And finally rescales the optimal x such that 1T x = 1. [sent-143, score-0.078]

70 In the worst case, this procedure recovers the true solution whenever minimizing 1 -norm recovers the solution, i. [sent-144, score-0.172]

71 This is clearly a suboptimal approach and we will refer it as the rescaling heuristic. [sent-147, score-0.209]

72 We set n = 50 and randomly pick a probability vector x∗ which is k sparse, let b = Ax∗ be m noiseless measurements, then check the probability of recovery, i. [sent-148, score-0.185]

73 x = x∗ where x is the solution to, ˆ ˆ max i=1,. [sent-150, score-0.1]

74 (22) Figure 2(a) shows the probability of exact recovery as a function of m, the number of measurements, in 100 independent realizations of A for the proposed LP formulation and the rescaling heuristic. [sent-154, score-0.526]

75 As it can be seen in Figure 2(a), the proposed method recovers the correct measure with probability almost 1 when m ≥ 5. [sent-155, score-0.197]

76 Quite interestingly the rescaling heuristic doesn’t succeed to recover the true measure with high probability even for a cardinality 2 vector. [sent-156, score-0.977]

77 ,n min 1T x=1, x≥0,t≥0 Ax − b 2 2 + t : xi ≥ λ/t . [sent-161, score-0.211]

78 (23) We compare the above approach by the corresponding rescaling heuristic, which first solves a nonnegative Lasso, min x≥0 Ax − b 2 2 +λ x 1 (24) then rescales x such that 1T x = 1. [sent-162, score-0.393]

79 For each realization of A and measurement noise we run both methods using a primal-dual interior point solver for 30 equally spaced values of λ ∈ [0, 10] and record the minimum error x − x∗ 1 . [sent-163, score-0.071]

80 The average error over 100 realizations are shown in Figure ˆ 2(b). [sent-164, score-0.041]

81 Is it can be seen in the figure the proposed scheme clearly outperforms the rescaling heuristic since it can utilize the fact that x is on the probability simplex, without trivializing it’s complexity regularizer. [sent-165, score-0.471]

82 1 0 1 2 3 4 5 6 m − number of measurements (moment constraints) 7 8 9 (a) Probability of exact recovery as a function of m Averaged error of estimating the true measure : ||x−xt||1 2 Rescaling L1 Heuristic Proposed relaxation 1. [sent-175, score-0.478]

83 2 Convex Clustering We generate synthetic data using a Gaussian mixture of 10 components with identity covariances and cluster the data using the proposed method, the resulting clusters given by the mixture density is presented in Figure 3. [sent-184, score-0.213]

84 The centers of the circles represent the means of the mixture components and the radii are proportional to the respective mixture weights. [sent-185, score-0.228]

85 We then repeat the clustering procedure using the well known soft k-means algorithm and present the results in Figure 4. [sent-186, score-0.171]

86 As it can be seen from the figures the proposed convex relaxation is able to penalize the cardinality on the mixture probability vector and produce clusters significantly better than soft k-means algorithm. [sent-187, score-1.06]

87 Note that soft k-means is a non-convex procedure whose performance depends heavily on the initialization. [sent-188, score-0.086]

88 The proposed approach is convex hence insensitive to the initializations. [sent-189, score-0.201]

89 Note that in [8] the number of clusters are adjusted indirectly by varying the β parameter of the distribution. [sent-190, score-0.057]

90 Hence when the number of clusters is unknown, choosing a value of λ is usually easier than specificying a value of k for the k-means algorithms. [sent-192, score-0.057]

91 7 Conclusions and Future Directions We presented a convex cardinality penalization scheme for problems constrained on the probability simplex. [sent-193, score-0.922]

92 We then derived a sufficient condition for recovering the sparsest probability measure in an affine space using the proposed method. [sent-194, score-0.366]

93 An open theoretical question is to analyze the probability of exact recovery for a normally distributed A. [sent-196, score-0.308]

94 Another interesting direction is to extend the recovery analysis to the noisy setting and arbitrary functions such as the log-likelihood in the clustering example. [sent-197, score-0.315]

95 There might also be other problems where proposed approach could be practically useful such as portfolio optimization, where a sparse convex combination of assets is sought or sparse multiple kernel learn- 1. [sent-198, score-0.413]

96 5 (d) λ = 45 Figure 3: Proposed convex clustering scheme 1. [sent-236, score-0.358]

97 ”From sparse solutions of systems of equations to sparse modeling of signals and images”. [sent-291, score-0.144]

98 Yeredor, ”On the use of sparsity for recovering discrete probability distributions from their moments”. [sent-311, score-0.162]

99 Golland, ”Convex clustering with exemplar-based models”, in NIPS, 2008. [sent-319, score-0.085]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cardinality', 0.461), ('card', 0.302), ('ax', 0.289), ('convex', 0.201), ('recovery', 0.187), ('simplex', 0.178), ('rescaling', 0.177), ('heuristic', 0.151), ('exponentiated', 0.131), ('kij', 0.127), ('maxi', 0.127), ('conv', 0.12), ('xmin', 0.118), ('penalized', 0.116), ('xk', 0.111), ('xi', 0.109), ('relaxation', 0.106), ('min', 0.102), ('programs', 0.093), ('recovering', 0.091), ('soft', 0.086), ('clustering', 0.085), ('moments', 0.085), ('sparsest', 0.082), ('mj', 0.08), ('mixture', 0.078), ('rescales', 0.078), ('scheme', 0.072), ('sparse', 0.072), ('solved', 0.071), ('minimum', 0.071), ('zi', 0.071), ('probability', 0.071), ('measure', 0.069), ('penalizing', 0.068), ('measurements', 0.066), ('moment', 0.062), ('solution', 0.058), ('recovers', 0.057), ('clusters', 0.057), ('chandrasekaran', 0.055), ('cone', 0.054), ('condition', 0.053), ('intersect', 0.053), ('program', 0.052), ('xj', 0.05), ('exact', 0.05), ('allerton', 0.049), ('recover', 0.048), ('reciprocal', 0.047), ('donoho', 0.047), ('penalization', 0.047), ('th', 0.045), ('submatrix', 0.044), ('noiseless', 0.043), ('shannon', 0.043), ('unique', 0.043), ('noisy', 0.043), ('proxy', 0.042), ('max', 0.042), ('quality', 0.041), ('realizations', 0.041), ('af', 0.04), ('remarkable', 0.04), ('doesn', 0.039), ('california', 0.038), ('minimal', 0.038), ('ri', 0.037), ('zj', 0.037), ('centers', 0.037), ('constrained', 0.037), ('solves', 0.036), ('arrive', 0.036), ('heuristics', 0.036), ('corollary', 0.035), ('ei', 0.035), ('zn', 0.035), ('lp', 0.035), ('exemplar', 0.035), ('convexi', 0.035), ('golland', 0.035), ('lashkari', 0.035), ('assets', 0.035), ('farkas', 0.035), ('nphard', 0.035), ('pasadena', 0.035), ('radii', 0.035), ('ssp', 0.035), ('duality', 0.034), ('problems', 0.033), ('entropy', 0.033), ('programming', 0.033), ('optimization', 0.033), ('suboptimal', 0.032), ('optimizers', 0.032), ('bruckstein', 0.032), ('nasa', 0.032), ('hinted', 0.032), ('satisfying', 0.031), ('penalty', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

2 0.15901363 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

3 0.14988825 227 nips-2012-Multiclass Learning with Simplex Coding

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the first provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

4 0.14739044 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is defined on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1

5 0.13886219 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

6 0.12517273 300 nips-2012-Scalable nonconvex inexact proximal splitting

7 0.12388299 69 nips-2012-Clustering Sparse Graphs

8 0.11000144 282 nips-2012-Proximal Newton-type methods for convex optimization

9 0.10366789 27 nips-2012-A quasi-Newton proximal splitting method

10 0.10217512 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

11 0.10085572 16 nips-2012-A Polynomial-time Form of Robust Regression

12 0.10011719 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

13 0.09745691 292 nips-2012-Regularized Off-Policy TD-Learning

14 0.097443849 319 nips-2012-Sparse Prediction with the $k$-Support Norm

15 0.095643692 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

16 0.094512448 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

17 0.093160279 204 nips-2012-MAP Inference in Chains using Column Generation

18 0.09301389 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

19 0.09118598 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

20 0.090171792 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.228), (1, 0.065), (2, 0.16), (3, -0.112), (4, 0.062), (5, 0.088), (6, -0.016), (7, -0.075), (8, 0.082), (9, 0.002), (10, 0.04), (11, 0.006), (12, -0.002), (13, 0.001), (14, 0.057), (15, -0.029), (16, -0.037), (17, 0.018), (18, 0.008), (19, -0.012), (20, -0.093), (21, 0.037), (22, -0.066), (23, 0.069), (24, -0.046), (25, 0.056), (26, -0.037), (27, -0.038), (28, 0.113), (29, 0.049), (30, 0.023), (31, -0.065), (32, 0.016), (33, 0.035), (34, -0.007), (35, 0.063), (36, 0.099), (37, 0.032), (38, 0.001), (39, -0.009), (40, -0.111), (41, 0.041), (42, -0.033), (43, -0.011), (44, -0.058), (45, -0.135), (46, 0.093), (47, -0.01), (48, 0.106), (49, -0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95056605 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

2 0.65328103 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

3 0.65194398 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is defined on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1

4 0.61675334 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

Author: Henrik Ohlsson, Allen Yang, Roy Dong, Shankar Sastry

Abstract: While compressive sensing (CS) has been one of the most vibrant research fields in the past few years, most development only applies to linear models. This limits its application in many areas where CS could make a difference. This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique – CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation. 1

5 0.59209621 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

Author: Xinhua Zhang, Dale Schuurmans, Yao-liang Yu

Abstract: Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees accuracy within O(1/ ) iterations. Performance is further accelerated by interlacing boosting with fixed-rank local optimization—exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weak-oracle. 1

6 0.59160483 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

7 0.59005755 27 nips-2012-A quasi-Newton proximal splitting method

8 0.57139981 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

9 0.56772286 16 nips-2012-A Polynomial-time Form of Robust Regression

10 0.54169703 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

11 0.53795797 86 nips-2012-Convex Multi-view Subspace Learning

12 0.53202879 282 nips-2012-Proximal Newton-type methods for convex optimization

13 0.53177738 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

14 0.52435803 125 nips-2012-Factoring nonnegative matrices with linear programs

15 0.52311504 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

16 0.52250671 247 nips-2012-Nonparametric Reduced Rank Regression

17 0.51745027 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

18 0.51499587 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

19 0.51457018 161 nips-2012-Interpreting prediction markets: a stochastic approach

20 0.50120145 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.051), (21, 0.021), (38, 0.212), (42, 0.054), (54, 0.012), (55, 0.021), (68, 0.224), (74, 0.037), (76, 0.156), (80, 0.089), (92, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86477959 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

2 0.85671228 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

Author: Alex Schwing, Tamir Hazan, Marc Pollefeys, Raquel Urtasun

Abstract: While finding the exact solution for the MAP inference problem is intractable for many real-world tasks, MAP LP relaxations have been shown to be very effective in practice. However, the most efficient methods that perform block coordinate descent can get stuck in sub-optimal points as they are not globally convergent. In this work we propose to augment these algorithms with an -descent approach and present a method to efficiently optimize for a descent direction in the subdifferential using a margin-based formulation of the Fenchel-Young duality theorem. Furthermore, the presented approach provides a methodology to construct a primal optimal solution from its dual optimal counterpart. We demonstrate the efficiency of the presented approach on spin glass models and protein interaction problems and show that our approach outperforms state-of-the-art solvers. 1

3 0.83123225 126 nips-2012-FastEx: Hash Clustering with Exponential Families

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1

4 0.79281688 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is defined on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1

5 0.78455257 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Author: Xi Chen, Qihang Lin, Javier Pena

Abstract: This paper considers a wide spectrum of regularized stochastic optimization problems where both the loss function and regularizer can be non-smooth. We develop a novel algorithm based on the regularized dual averaging (RDA) method, that can simultaneously achieve the optimal convergence rates for both convex and strongly convex loss. In particular, for strongly convex loss, it achieves the opti1 1 mal rate of O( N + N 2 ) for N iterations, which improves the rate O( log N ) for preN vious regularized dual averaging algorithms. In addition, our method constructs the final solution directly from the proximal mapping instead of averaging of all previous iterates. For widely used sparsity-inducing regularizers (e.g., 1 -norm), it has the advantage of encouraging sparser solutions. We further develop a multistage extension using the proposed algorithm as a subroutine, which achieves the 1 uniformly-optimal rate O( N + exp{−N }) for strongly convex loss. 1

6 0.78293979 86 nips-2012-Convex Multi-view Subspace Learning

7 0.78185469 27 nips-2012-A quasi-Newton proximal splitting method

8 0.78177887 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

9 0.78142804 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

10 0.78061742 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.78048319 187 nips-2012-Learning curves for multi-task Gaussian process regression

12 0.78008115 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

13 0.77981228 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

14 0.77948707 227 nips-2012-Multiclass Learning with Simplex Coding

15 0.77885216 179 nips-2012-Learning Manifolds with K-Means and K-Flats

16 0.77864337 285 nips-2012-Query Complexity of Derivative-Free Optimization

17 0.77838993 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

18 0.77814889 361 nips-2012-Volume Regularization for Binary Classification

19 0.77804184 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

20 0.77801347 358 nips-2012-Value Pursuit Iteration