nips nips2009 nips2009-79 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye
Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. [sent-6, score-0.409]
2 MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). [sent-7, score-0.206]
3 Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. [sent-8, score-0.226]
4 However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. [sent-9, score-0.132]
5 Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. [sent-10, score-0.312]
6 In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. [sent-11, score-0.148]
7 Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. [sent-12, score-0.18]
8 Our simulation studies demonstrate the scalability of the proposed algorithm. [sent-13, score-0.139]
9 1 Introduction Compressive sensing (CS), also known as compressive sampling, has recently received increasing attention in many areas of science and engineering [3]. [sent-14, score-0.091]
10 In CS, an unknown sparse signal is reconstructed from a single measurement vector. [sent-15, score-0.241]
11 Recent theoretical studies show that one can recover certain sparse signals from far fewer samples or measurements than traditional methods [4, 8]. [sent-16, score-0.196]
12 In this paper, we consider the problem of reconstructing sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. [sent-17, score-0.398]
13 MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing. [sent-18, score-0.186]
14 The MMV model was motivated by the need to solve the neuromagnetic inverse problem that arises in Magnetoencephalography (MEG), which is a modality for imaging the brain [7]. [sent-19, score-0.088]
15 It arises from a variety of applications, such as DNA microarrays [11], equalization of sparse communication channels [6], echo cancellation [9], magenetoencephalography [12], computing sparse solutions to linear inverse problems [7], and source localization in sensor networks [17]. [sent-20, score-0.227]
16 Unlike SMV, the signal in the MMV model is represented as a set of jointly sparse vectors sharing their common nonzeros occurring in a set of locations [5, 7]. [sent-21, score-0.251]
17 It has been shown that the additional block-sparse structure can lead to improved performance in signal recovery [5, 10, 16, 21]. [sent-22, score-0.195]
18 Several recovery algorithms have been proposed for the MMV model in the past [5, 7, 18, 24, 25]. [sent-23, score-0.16]
19 Since the sparse representation problem is a combinatorial optimization problem and is in general NP-hard [5], the algorithms in [18, 25] employ the greedy strategy to recover the signal using an iterative scheme. [sent-24, score-0.291]
20 One alternative is to relax it into a convex optimization problem, from which the 1 global optimal solution can be obtained. [sent-25, score-0.1]
21 Recent studies have shown that most of theoretical results on the convex relaxation of the SMV model can be extended to the MMV model [5], although further theoretical investigation is needed [26]. [sent-28, score-0.16]
22 Unlike the SMV model where the 1-norm minimization can be solved efficiently, the resulting convex optimization problem in MMV is much more difficult to solve. [sent-29, score-0.184]
23 Existing algorithms formulate it as a second-order cone programming (SOCP) or semdefinite programming (SDP) [16] problem, which can be solved using standard software packages such as SeDuMi [23]. [sent-30, score-0.251]
24 However, for problems of moderate size, solving either SOCP or SDP is computationally expensive, which limits their use in practice. [sent-31, score-0.066]
25 In this paper, we derive a dual reformulation of the (2, 1)-norm minimization problem in MMV. [sent-32, score-0.215]
26 More especially, we show that the (2, 1)-norm minimization problem can be reformulated as a min-max problem, which can be solved efficiently via the prox-method with a nearly dimensionindependent convergence rate [19]. [sent-33, score-0.171]
27 Interestingly, our theoretical analysis reveals the close relationship between the resulting min-max problem and multiple kernel learning [14]. [sent-35, score-0.14]
28 We have performed simulation studies and our results demonstrate the scalability of the proposed algorithm in comparison with existing algorithms. [sent-36, score-0.157]
29 For R matrix A ∈ I R , we denote by ai and ai the ith row and the ith column of A, respectively. [sent-42, score-0.156]
30 The (r, p)-norm of A is defined as: 1 p m A r,p ai := p r . [sent-43, score-0.078]
31 (1) i=1 2 The Multiple Measurement Vector Model In the SMV model, one aims to recover the sparse signal w from a measurement vector b = Aw for a given matrix A [3]. [sent-44, score-0.28]
32 The SMV model can be extended to the multiple measurement vector (MMV) model, in which the signal is represented as a set of jointly sparse vectors sharing a common set of nonzeros [5, 7]. [sent-45, score-0.383]
33 The MMV model aims to recover the sparse representations for SMVs simultaneously. [sent-46, score-0.118]
34 It has been shown that the MMV model provably improves the standard CS recovery by exploiting the block-sparse structure [10, 21]. [sent-47, score-0.116]
35 Specifically, in the MMV model we consider the reconstruction of the signal represented by a matrix W ∈ I d×n , which is given by a dictionary (or measurement matrix) A ∈ I m×d and multiple R R measurement vector B ∈ I m×n such that R B = AW. [sent-48, score-0.33]
36 A sparse representation means that the matrix W has a small number of rows containing nonzero entries. [sent-50, score-0.135]
37 Thus, the problem of finding the sparsest representation of the signal W in MMV is equivalent to solving the following problem, a. [sent-53, score-0.132]
38 the sparse representation problem: (P0) : min W W p,0 , s. [sent-56, score-0.114]
39 However, solving (P0) requires enumerating all subsets of the set {1, 2, · · · , d}, which is essentially a combinatorial optimization problem and is in general NP-hard [5]. [sent-60, score-0.09]
40 Similar to the use of the 1-norm minimization in the SMV model, one natural alternative is to use W p,1 instead of W p,0 , resulting in the following convex optimization problem (P1): (P1) : min W p,1 , s. [sent-61, score-0.193]
41 For p = 2, the optimal W is given by solving the following convex optimization problem: min W 1 W 2 2 2,1 s. [sent-65, score-0.14]
42 (5) as a second-order cone programming (SOCP) problem or a semidefinite programming (SDP) problem [16]. [sent-69, score-0.233]
43 (5) is equivalent to the following problem by removing the square in the objective: min W 1 W 2 2,1 s. [sent-71, score-0.061]
44 By introducing auxiliary variable ti (i = 1, · · · , d), this problem can be reformulated in the standard second-order cone programming (SOCP) formulation: 1 2 min W,t1 ,··· ,td s. [sent-74, score-0.295]
45 d ti i=1 i W 2 (6) ≤ ti , ti ≥ 0, i = 1, · · · , d, AW = B. [sent-76, score-0.207]
46 Based on this SOCP formulation, it can also be transformed into the standard semidefinite programming (SDP) formulation: min W,t1 ,··· ,td 1 2 d ti ti I Wi s. [sent-77, score-0.232]
47 (7) i=1 Wi ti T ≥ 0, ti ≥ 0, i = 1, · · · , d, AW = B. [sent-79, score-0.138]
48 3 The Proposed Dual Formulation In this section we present a dual reformulation of the optimization problem in Eq. [sent-82, score-0.198]
49 Without loss of generality, we assume that max1≤i≤m ai 2 for 1 ≤ k ≤ m . [sent-91, score-0.078]
50 Thus, A 2,∞ = ak 2 , and we have m A, X m T i=1 ≤ m ai xi ≤ = 1 k a 2 ai 2 xi 2 i=1 2 m 2 2 xi + 2 i=1 2 xi 2 = ak 2,1 i=1 A 2 2,∞ = A + X 2 2,1 2,∞ . [sent-92, score-0.246]
51 Then the following holds: max X A, X − 1 X 2 3 2 2,1 xi 2 i=1 = 1 2 Clearly, the last inequality becomes equality when X xi 2 , a 2 = m ak ≤ m i=1 k = 1 A 2 2 2,∞ . [sent-95, score-0.093]
52 Denote the set Q = {k : 1 ≤ k ≤ m, ak 2 = max1≤i≤m ai 2 }. [sent-98, score-0.123]
53 1 2 A 2 2,∞ , (9) which is achieved when X is constructed Based on the results established in Lemmas 1 and 2, we can derive the dual formulation of the optimization problem in Eq. [sent-103, score-0.205]
54 First we construct the Lagrangian L: 1 1 W 2 − U, AW − B = W 2,1 2 2 The dual problem can be formulated as follows: 2 2,1 L(W, U) = max min U W 1 W 2 2 2,1 − U, AW + U, B . [sent-105, score-0.174]
55 (10) It follows from Lemma 2 that min W 1 W 2 2 2,1 − U, AW 1 W 2 = min W 2 2,1 − AT U, W =− 1 T A U 2 2 2,∞ . [sent-107, score-0.07]
56 Thus, the dual problem can be simplified into the following form: max − U 1 T A U 2 2 2,∞ + U, B . [sent-109, score-0.139]
57 (12) Following the definition of the (2, ∞)-norm, we can reformulate the dual problem in Eq. [sent-110, score-0.141]
58 (5) can be formulated equivalently as: n d i=1 min uT bj − j max θi =1,θi ≥0 u1 ,··· ,un j=1 1 2 d θi uT Gi uj j , (13) i=1 where the matrix Gi is defined as Gi = ai aT (1 ≤ i ≤ d), and ai is the ith column of A. [sent-113, score-0.269]
59 Note that AT U AT U 2 2,∞ = 2 2,∞ can be reformulated as follows: max 1≤i≤d aT U i 2 2 = max {tr(UT ai aT U)} = max {tr(UT Gi U)} i 1≤i≤d 1≤i≤d d = θi ≥0, max d i=1 θi =1 θi tr(UT Gi U). [sent-115, score-0.209]
60 (12), we obtain the following problem: max − U 1 T A U 2 2 2,∞ + U, B ⇔ max U d i=1 min θi =1,θi ≥0 U, B − 1 2 d θi tr(UT Gi U). [sent-118, score-0.079]
61 4 Based on the solution to the dual problem in Eq. [sent-126, score-0.139]
62 (13), we can construct the optimal solution to the primal problem in Eq. [sent-127, score-0.07]
63 i=1 d Therefore, for any U, θ1 , · · · , θd such that i=1 θi = 1, θi ≥ 0, we have φ(α1 , · · · , αd , U) ≤ φ(α1 , · · · , αd , U∗ ) ≤ φ(θ1 , · · · , θd , U∗ ), (17) which implies that (α1 , · · · , αd , U∗ ) is a saddle point of the min-max problem in Eq. [sent-154, score-0.061]
64 Theorem 2 shows that we can reconstruct the solution to the primal problem based on the solution to the dual problem in Eq. [sent-158, score-0.209]
65 In this paper, the prox-method [19], which is discussed in detail in the next section, is employed to solve the dual problem in Eq. [sent-162, score-0.177]
66 (13) is closely related to the optimization problem in multiple kernel learning (MKL) [14]. [sent-165, score-0.124]
67 (13) can be reformulated as n 1 min max uT bj − uT Guj , (18) j d u1 ,··· ,un 2 j i=1 θi =1,θi ≥0 j=1 where the positive semidefinite (kernel) matrix G is constrained as a linear combination of a set of base kernels Gi = ai ai T d i=1 as G = d i=1 θi Gi . [sent-167, score-0.288]
68 In [27], an extended level set method was proposed to solve MKL, which was shown to outperform the one based on the semi-infinite linear programming formulation [22]. [sent-171, score-0.179]
69 However, the extended level set method involves a linear programming in each iteration and its theoretical convergence rate of √ O(1/ N ) (N denotes the number of iterations) is slower than the proposed algorithm presented in the next section. [sent-172, score-0.19]
70 5 4 The Main Algorithm We propose to employ the prox-method [19] to solve the min-max formulation in Eq. [sent-173, score-0.079]
71 The prox-method is a first-order method [1, 19] which is specialized for solving the saddle point problem and has a nearly dimension-independent convergence rate of O(1/N ) (N denotes the number of iterations). [sent-176, score-0.11]
72 It has been shown in [19] that, when γ ≤ √1 [L denotes the Lipschitz 2L continuous constant of the operator F (·)], the inner iteration converges within two iterations, i. [sent-196, score-0.077]
73 Recall that at each iteration t, the inner iteration is at most 2. [sent-212, score-0.1]
74 In this case, the proposed MMVprox algorithm has a much smaller cost per iteration than SOCP. [sent-219, score-0.063]
75 6 Table 1: The averaged recovery results over 10 experiments (d = 100, m = 50, and n = 80). [sent-221, score-0.116]
76 1914e-6 Experiments In this section, we conduct simulations to evaluate the proposed MMVprox algorithm in terms of the recovery quality and scalability. [sent-246, score-0.165]
77 We employed the optimization package SeDuMi [23] for solving the SOCP formulation. [sent-252, score-0.096]
78 Recovery Quality In this experiment, we evaluate the recovery quality of the proposed MMVprox algorithm. [sent-255, score-0.165]
79 We measured the recovery quality in terms of the mean squared error: W − Wp 2 /(dn). [sent-257, score-0.143]
80 We can observe from the table that MMVprox recovers the sparse signal successfully in all cases. [sent-261, score-0.176]
81 Next, we study how the recovery error changes as the sparsity of W varies. [sent-262, score-0.116]
82 7d, and used W − Wp 2 /(dn) as the recovery quality F measure. [sent-265, score-0.143]
83 We can observe from the figure that MMVprox works well in all cases, and a larger k (less sparse W) tends to result in a larger recovery error. [sent-267, score-0.213]
84 7 Figure 1: The increase of the recovery error as the sparsity level decreases 7 Scalability In this experiment, we study the scalability of the proposed MMVprox algorithm. [sent-275, score-0.211]
85 This experimental result demonstrates the good scalability of the proposed MMVprox algorithm in comparison with the SOCP formulation. [sent-280, score-0.095]
86 To further examine the scalability of both algorithms, we compare the execution time of each iteration for both SOCP and the proposed algorithm. [sent-283, score-0.136]
87 Thus, there is a tradeoff between scalability and convergence rate. [sent-292, score-0.095]
88 6 Conclusions In this paper, we consider the (2, 1)-norm minimization for the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal consists of a set of jointly sparse vectors. [sent-294, score-0.524]
89 Existing algorithms formulate it as second-order cone programming or semdefinite programming, which is computationally expensive to solve for problems of moderate size. [sent-295, score-0.251]
90 In this paper, we propose an equivalent dual formulation for the (2, 1)-norm minimization in the MMV model, and develop the MMVprox algorithm for solving the dual formulation based on the proxmethod. [sent-296, score-0.365]
91 In addition, our theoretical analysis reveals the close connection between the proposed dual formulation and multiple kernel learning. [sent-297, score-0.278]
92 Our simulation studies demonstrate the effectiveness of the proposed algorithm in terms of recovery quality and scalability. [sent-298, score-0.209]
93 In the future, we plan to compare existing solvers for multiple kernel learning [14, 22, 27] with the proposed MMVprox algorithm. [sent-299, score-0.12]
94 Robust uncertainty principles: exact signal reconstruction from highly e incomplete frequency information. [sent-320, score-0.116]
95 Sparse solutions to linear inverse problems with multiple measurement vectors. [sent-340, score-0.113]
96 Robust recovery of signals from a structured union of subspaces. [sent-356, score-0.158]
97 Empirical bayes estimation of a sparse vector of gene expression changes. [sent-361, score-0.079]
98 On the reconstruction of block-sparse signals with an optimal number of measurements. [sent-399, score-0.079]
99 Reduce and boost: Recovering arbitrary sets of jointly sparse vectors. [sent-411, score-0.12]
100 An extended level method for efficient multiple kernel learning. [sent-473, score-0.08]
wordName wordTfidf (topN-words)
[('mmvprox', 0.623), ('socp', 0.409), ('mmv', 0.375), ('aw', 0.211), ('smv', 0.195), ('recovery', 0.116), ('gi', 0.098), ('wp', 0.094), ('dual', 0.091), ('ut', 0.087), ('measurement', 0.083), ('sparse', 0.079), ('signal', 0.079), ('ai', 0.078), ('scalability', 0.073), ('compressive', 0.071), ('ti', 0.069), ('sedumi', 0.068), ('sdp', 0.063), ('cone', 0.063), ('programming', 0.059), ('minimization', 0.054), ('mkl', 0.053), ('formulation', 0.051), ('cs', 0.049), ('ak', 0.045), ('reformulation', 0.044), ('semide', 0.043), ('reformulated', 0.043), ('signals', 0.042), ('convex', 0.041), ('iteration', 0.041), ('jointly', 0.041), ('moderate', 0.039), ('adiag', 0.039), ('awp', 0.039), ('dmn', 0.039), ('semde', 0.039), ('optimization', 0.037), ('reconstruction', 0.037), ('dn', 0.037), ('transactions', 0.036), ('tr', 0.036), ('saddle', 0.035), ('min', 0.035), ('cotter', 0.034), ('neuromagnetic', 0.034), ('nonzeros', 0.034), ('bj', 0.032), ('employed', 0.032), ('lemma', 0.032), ('boldface', 0.031), ('echo', 0.031), ('kernel', 0.031), ('nonzero', 0.03), ('multiple', 0.03), ('mn', 0.03), ('pz', 0.029), ('diag', 0.029), ('solve', 0.028), ('quality', 0.027), ('solving', 0.027), ('theoretical', 0.027), ('atom', 0.026), ('rows', 0.026), ('studies', 0.026), ('problem', 0.026), ('reveals', 0.026), ('equality', 0.026), ('solved', 0.026), ('uj', 0.024), ('reformulate', 0.024), ('max', 0.022), ('convergence', 0.022), ('algorithms', 0.022), ('recover', 0.022), ('formulate', 0.022), ('proposed', 0.022), ('solution', 0.022), ('primal', 0.022), ('cand', 0.021), ('sensing', 0.02), ('zt', 0.02), ('relaxation', 0.02), ('extended', 0.019), ('ieee', 0.019), ('boyd', 0.019), ('localization', 0.019), ('plan', 0.019), ('lipschitz', 0.019), ('source', 0.019), ('existing', 0.018), ('operator', 0.018), ('observe', 0.018), ('expensive', 0.018), ('inner', 0.018), ('represented', 0.018), ('simulation', 0.018), ('aims', 0.017), ('lemmas', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye
Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.
2 0.084060274 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg
Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1
3 0.083122671 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
Author: Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano
Abstract: We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations that admit fast algorithms. The first is a max-min spectral reformulation exploiting quasi-Newton descent. The second is a min-min reformulation consisting of fast alternating steps of closed-form updates. We evaluate the methods against Expectation-Maximization in a real problem of motion segmentation from video data. 1
4 0.080072649 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation
Author: Saketha N. Jagarlapudi, Dinesh G, Raman S, Chiranjib Bhattacharyya, Aharon Ben-tal, Ramakrishnan K.r.
Abstract: Motivated from real world problems, like object categorization, we study a particular mixed-norm regularization for Multiple Kernel Learning (MKL). It is assumed that the given set of kernels are grouped into distinct components where each component is crucial for the learning task at hand. The formulation hence employs l∞ regularization for promoting combinations at the component level and l1 regularization for promoting sparsity among kernels in each component. While previous attempts have formulated this as a non-convex problem, the formulation given here is an instance of non-smooth convex optimization problem which admits an efficient Mirror-Descent (MD) based procedure. The MD procedure optimizes over product of simplexes, which is not a well-studied case in literature. Results on real-world datasets show that the new MKL formulation is well-suited for object categorization tasks and that the MD based algorithm outperforms stateof-the-art MKL solvers like simpleMKL in terms of computational effort. 1
5 0.076366618 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
Author: John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma
Abstract: Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component analysis” problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns by solving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction of its entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision. 1
6 0.067732833 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
7 0.067415729 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
8 0.06415797 147 nips-2009-Matrix Completion from Noisy Entries
9 0.061328232 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations
10 0.061151583 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
11 0.061003242 157 nips-2009-Multi-Label Prediction via Compressed Sensing
12 0.060762402 33 nips-2009-Analysis of SVM with Indefinite Kernels
13 0.057738498 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
14 0.055784341 223 nips-2009-Sparse Metric Learning via Smooth Optimization
15 0.055567078 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
16 0.053590965 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis
17 0.050681483 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)
18 0.047121044 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
19 0.045260582 138 nips-2009-Learning with Compressible Priors
20 0.044928174 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
topicId topicWeight
[(0, -0.133), (1, 0.077), (2, 0.014), (3, 0.055), (4, -0.009), (5, -0.02), (6, 0.069), (7, -0.048), (8, 0.097), (9, 0.035), (10, 0.038), (11, -0.051), (12, 0.026), (13, 0.087), (14, -0.102), (15, 0.009), (16, -0.022), (17, -0.036), (18, -0.126), (19, 0.011), (20, 0.013), (21, -0.103), (22, -0.065), (23, 0.09), (24, 0.04), (25, 0.003), (26, 0.021), (27, -0.032), (28, -0.019), (29, 0.051), (30, -0.002), (31, 0.018), (32, 0.019), (33, 0.002), (34, 0.006), (35, 0.015), (36, 0.099), (37, -0.096), (38, 0.025), (39, -0.058), (40, -0.029), (41, -0.054), (42, -0.019), (43, -0.011), (44, -0.087), (45, -0.096), (46, -0.078), (47, -0.056), (48, -0.076), (49, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.91137254 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye
Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.
2 0.61687005 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
Author: David P. Wipf, Srikantan S. Nagarajan
Abstract: Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient and globally-convergent, reweighted 1 -norm minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in some empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted 1 -norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function. 1
3 0.56304753 138 nips-2009-Learning with Compressible Priors
Author: Volkan Cevher
Abstract: We describe a set of probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in p-compressible signals. A signal x ∈ RN is called p-compressible with magnitude R if its sorted coefficients exhibit a power-law decay as |x|(i) R · i−d , where the decay rate d is equal to 1/p. p-compressible signals live close to K-sparse signals (K N) in the r -norm (r > p) since their best K-sparse approximation error decreases with O R · K 1/r−1/p . We show that the membership of generalized Pareto, Student’s t, log-normal, Fr´ chet, and log-logistic distributions to the set of compresse ible priors depends only on the distribution parameters and is independent of N . In contrast, we demonstrate that the membership of the generalized Gaussian distribution (GGD) depends both on the signal dimension and the GGD parameters: the expected decay rate of N -sample iid realizations from the GGD with the shape parameter q is given by 1/ [q log (N/q)]. As stylized examples, we show via experiments that the wavelet coefficients of natural images are 1.67-compressible whereas their pixel gradients are 0.95 log (N/0.95)-compressible, on the average. We also leverage the connections between compressible priors and sparse signals to develop new iterative re-weighted sparse signal recovery algorithms that outperform the standard 1 -norm minimization. Finally, we describe how to learn the hyperparameters of compressible priors in underdetermined regression problems by exploiting the geometry of their order statistics during signal recovery. 1
4 0.55168557 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg
Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1
5 0.54138184 33 nips-2009-Analysis of SVM with Indefinite Kernels
Author: Yiming Ying, Colin Campbell, Mark Girolami
Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
6 0.54030913 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis
7 0.50135708 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
8 0.49659216 179 nips-2009-On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation
10 0.47384867 223 nips-2009-Sparse Metric Learning via Smooth Optimization
11 0.46868634 180 nips-2009-On the Convergence of the Concave-Convex Procedure
12 0.44408691 147 nips-2009-Matrix Completion from Noisy Entries
13 0.44322598 76 nips-2009-Efficient Learning using Forward-Backward Splitting
14 0.43713215 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
15 0.43614617 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
16 0.43129832 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
17 0.42721257 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
18 0.42714557 73 nips-2009-Dual Averaging Method for Regularized Stochastic Learning and Online Optimization
19 0.42518941 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
20 0.42450231 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
topicId topicWeight
[(7, 0.02), (24, 0.037), (25, 0.075), (35, 0.032), (36, 0.143), (39, 0.041), (58, 0.116), (61, 0.012), (71, 0.033), (81, 0.048), (84, 0.248), (86, 0.086), (91, 0.012)]
simIndex simValue paperId paperTitle
1 0.90091914 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model
Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda
Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classification and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.
2 0.86016959 74 nips-2009-Efficient Bregman Range Search
Author: Lawrence Cayton
Abstract: We develop an algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence. The range search task is to return all points in a potentially large database that are within some specified distance of a query. It arises in many learning algorithms such as locally-weighted regression, kernel density estimation, neighborhood graph-based algorithms, and in tasks like outlier detection and information retrieval. In metric spaces, efficient range search-like algorithms based on spatial data structures have been deployed on a variety of statistical tasks. Here we describe an algorithm for range search for an arbitrary Bregman divergence. This broad class of dissimilarity measures includes the relative entropy, Mahalanobis distance, Itakura-Saito divergence, and a variety of matrix divergences. Metric methods cannot be directly applied since Bregman divergences do not in general satisfy the triangle inequality. We derive geometric properties of Bregman divergences that yield an efficient algorithm for range search based on a recently proposed space decomposition for Bregman divergences. 1
same-paper 3 0.81291533 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye
Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.
4 0.80393648 90 nips-2009-Factor Modeling for Advertisement Targeting
Author: Ye Chen, Michael Kapralov, John Canny, Dmitry Y. Pavlov
Abstract: We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson) [6], to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We scale up the algorithm to terabytes of real-world SS and BT data that contains hundreds of millions of users and hundreds of thousands of features, by leveraging the scalability characteristics of the algorithm and the inherent structure of the problem including data sparsity and locality. SpeciÄ?Ĺš cally, we demonstrate two somewhat orthogonal philosophies of scaling algorithms to large-scale problems, through the SS and BT implementations, respectively. Finally, we report the experimental results using Yahoo’s vast datasets, and show that our approach substantially outperform the state-of-the-art methods in prediction accuracy. For BT in particular, the ROC area achieved by GaP is exceeding 0.95, while one prior approach using Poisson regression [11] yielded 0.83. For computational performance, we compare a single-node sparse implementation with a parallel implementation using Hadoop MapReduce, the results are counterintuitive yet quite interesting. We therefore provide insights into the underlying principles of large-scale learning. 1
5 0.66191477 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds
Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj
Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1
6 0.66017604 104 nips-2009-Group Sparse Coding
7 0.65837705 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
8 0.65763384 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
9 0.65634358 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
10 0.65338337 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
11 0.65266353 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling
12 0.6522516 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem
13 0.65187955 80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning
14 0.65176952 70 nips-2009-Discriminative Network Models of Schizophrenia
15 0.65101689 128 nips-2009-Learning Non-Linear Combinations of Kernels
16 0.65050983 157 nips-2009-Multi-Label Prediction via Compressed Sensing
17 0.65024757 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
18 0.64933306 76 nips-2009-Efficient Learning using Forward-Backward Splitting
19 0.64736134 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
20 0.64732993 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity