jmlr jmlr2012 jmlr2012-52 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Electrical Engineering University of Washington Seattle, WA 98195-4322, USA Editor: Tong Zhang Abstract The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. [sent-3, score-0.249]
2 A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. [sent-4, score-0.387]
3 In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. [sent-5, score-0.224]
4 When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. [sent-7, score-0.46]
5 For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. [sent-8, score-0.266]
6 These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. [sent-10, score-0.462]
7 Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property 1. [sent-11, score-0.275]
8 M OHAN AND FAZEL minimization or sparse vector recovery problem, minimize card(x) subject to Ax = b, where x ∈ Rn , A ∈ Rm×n and card(x) denotes the number of non-zeros entries of x. [sent-26, score-0.367]
9 , 2006) that by appropriately weighting the ℓ1 norm and iteratively updating the weights, the recovery performance of the algorithm is enhanced. [sent-37, score-0.318]
10 It is shown theoretically that for sparse recovery from noisy measurements, this algorithm has a better recovery error than ℓ1 minimization under suitable assumptions on the matrix A (Needell, 2009; Zhang, 2010). [sent-38, score-0.598]
11 The Reweighted ℓ1 algorithm has also been generalized to the recovery of low-rank matrices (Fazel et al. [sent-39, score-0.256]
12 Another simple and computationally efficient reweighted algorithm for sparse recovery that has been proposed in the literature (Daubechies et al. [sent-41, score-0.391]
13 For p = 1, a theoretical guarantee for sparse recovery has been given for IRLS-p (Daubechies et al. [sent-44, score-0.231]
14 It was empirically observed (Chartrand and Staneva, 2008; Chartrand and Yin, 2008) that IRLS-p shows a better recovery performance than ℓ1 minimization for p < 1 and has a similar performance as the reweighted ℓ1 algorithm when p is set to zero. [sent-48, score-0.451]
15 1 Iterative Reweighted Least Squares for ARMP Towards answering this question, we propose the iterative reweighted least squares algorithm for rank minimization, outlined below. [sent-51, score-0.362]
16 Usually a reweighted algorithm trades off computational time for improved recovery performance when compared to the unweighted convex heuristic. [sent-55, score-0.433]
17 As an example, the reweighted ℓ1 algorithm for sparse recovery (Candes et al. [sent-56, score-0.391]
18 , 2008) and the reweighted nuclear norm algorithm (Mohan and Fazel, 2010a) for matrix rank minimization solve the corresponding standard convex relaxations (ℓ1 and nuclear norm minimization respectively) in their first iteration. [sent-57, score-0.893]
19 However, the iterates of the IRLS-p family of algorithms simply minimize a weighted Frobenius norm, and have run-times comparable with the nuclear norm heuristic, while (for p < 1) also enjoy improved recovery performance. [sent-59, score-0.561]
20 In the p = 1 case, we show that the algorithm minimizes a certain smooth approximation to the nuclear norm, allowing for efficient implementations while having theoretical guarantees similar to nuclear norm minimization. [sent-60, score-0.418]
21 The results exploit the fact that these algorithms can be derived from the KKT conditions for minimization problems whose objectives are suitable smooth approximations to the rank function. [sent-65, score-0.23]
22 Numerical experiments demonstrate that IRLS-0 and sIRLS-0 applied to the matrix completion problem have a better recovery performance than the Singular Value Thresholding algorithm, an implementation of Nuclear Norm Minimization (Cai et al. [sent-75, score-0.42]
23 Many approaches have been proposed for recovery of sparse vectors from linear measurements including ℓ1 minimization and greedy algorithms (see, e. [sent-80, score-0.291]
24 As mentioned earlier, reweighted algorithms including Iterative reweighted ℓ1 (Candes et al. [sent-83, score-0.32]
25 , 2010) with 0 < p ≤ 1 have been proposed to improve on the recovery performance of ℓ1 minimization. [sent-85, score-0.231]
26 For the ARMP, analogous algorithms have been proposed including nuclear norm minimization (Fazel et al. [sent-86, score-0.264]
27 , 2001), reweighted nuclear norm minimization (Mohan and Fazel, 2010a; Fazel et al. [sent-87, score-0.424]
28 Developing efficient implementations for nuclear norm minimization is an important research area since standard semidefinite programming solvers cannot handle large problem sizes. [sent-90, score-0.288]
29 3444 I TERATIVE R EWEIGHTED A LGORITHMS FOR M ATRIX R ANK M INIMIZATION In a preliminary conference version of this paper (Mohan and Fazel, 2010b), we proposed the IRLS-p family of algorithms for ARMP, analogous to IRLS for sparse vector recovery (Daubechies et al. [sent-95, score-0.251]
30 However, their analysis of low-rank recovery uses a Strong Null Space Property (SRNSP) (which is equivalent to the condition in, Mohan and Fazel, 2010b) and is less general than the condition we consider in this paper, as discussed in Section 3. [sent-101, score-0.231]
31 Also, the authors have only considered recovery of matrices of size 500 × 500 in their experiments. [sent-102, score-0.256]
32 We present extensive comparisons of our algorithms with algorithms including Optspace, FPCA, and IHT for recovery of matrices of different sizes. [sent-103, score-0.256]
33 Recall that replacing the rank function in (1) by X ⋆ yields the nuclear norm heuristic, minimize X ⋆ subject to A (X) = b. [sent-121, score-0.411]
34 We show that IRLS-1 solves the smooth Schatten-1 norm or nuclear norm minimization problem, that is, finds a globally optimal solution to (3) with p = 1. [sent-137, score-0.356]
35 We also give a matrix recovery guarantee for IRLS-1, under suitable assumptions on the null space of the linear operator A . [sent-157, score-0.331]
36 1 Convergence of IRLS-p We show that the difference between successive iterates of the IRLS-p (0 ≤ p ≤ 1) algorithm converges to zero and that every cluster point of the iterates is a stationary point of (3). [sent-159, score-0.228]
37 By definition, X ni +1 is the minimizer of minimize TrW ni X T X subject to A (X) = b. [sent-179, score-0.286]
38 Thus, X ni +1 satisfies the KKT conditions of (7), that is, X ni +1W ni ∈ Ran(A ∗ ) and A (X ni +1 ) = b, 3448 (7) I TERATIVE R EWEIGHTED A LGORITHMS FOR M ATRIX R ANK M INIMIZATION where Ran(A ∗ ) denotes the range space of A ∗ . [sent-180, score-0.42]
39 2 Performance Guarantee for IRLS-1 In this section, we discuss necessary and sufficient conditions for low-rank recovery using IRLS1. [sent-190, score-0.231]
40 This condition requires that every nonzero matrix in the null space of A has a rank larger than 2r. [sent-221, score-0.231]
41 Note that the necessary and sufficient conditions for recovery of approximately low-rank matrices using IRLS-1 mirror the conditions for recovery of approximately low-rank matrices using nuclear norm minimization which is that A satisfy 1-NSP. [sent-255, score-0.776]
42 (2010) and shown to be sufficient for low-rank recovery using IRLSM, is equivalent to 1-NSP of order 2r. [sent-259, score-0.231]
43 In this paper, we show that τ-NSP of order r (as contrasted with the order 2r NSP) is both necessary and sufficient for recovery of approximately low-rank matrices using IRLS-1. [sent-260, score-0.256]
44 Thus, our condition for the recovery of approximately low-rank matrices using IRLS-1 generalizes those stated in previous literature (Mohan and Fazel, 2010b; Fornasier et al. [sent-261, score-0.256]
45 1 The Matrix Completion Problem The matrix completion problem is a special case of the affine rank minimization problem with constraints that restrict some of the entries of the matrix variable X to equal given values. [sent-266, score-0.412]
46 To work around this, we observe that the singular values of subsequent iterates of IRLS cluster into two distinct groups, so a low rank approximation of the iterates (obtained by setting the smaller set of singular values to zero) can be used to compute the weighting matrix efficiently. [sent-280, score-0.51]
47 Small values of γc (< 10−3 ) don’t give good recovery results (premature convergence to a larger relative error). [sent-337, score-0.231]
48 We define the recovery to be successful when the relative error, X − X0 F / X0 F ≤ 10−3 (with X being the output of the algorithm considered) and unsuccessful recovery otherwise. [sent-363, score-0.503]
49 3 Comparison of (s)IRLS and Nuclear Norm Minimization In this section, we compare the gradient projection implementation IRLS-GP (of IRLS-0,1) and the algorithm sIRLS-0,1 with the Singular Value Thresholding (SVT) algorithm (an implementation for nuclear norm minimization Cai et al. [sent-396, score-0.368]
50 Note that SVT is not the only implementation of nuclear norm minimization. [sent-398, score-0.224]
51 49 # iter 4705 10000 10000 645 10000 1152 550 sIRLS-1 NS Time 4 163. [sent-509, score-0.229]
52 56 10 342 # iter 1385 4811 4646 340 2679 781 191 IRLS-0 NS Time 10 17. [sent-515, score-0.229]
53 77 # iter 2364 5039 5140 406 2925 915 270 sIRLS-0 NS Time 9 30. [sent-522, score-0.229]
54 This indicates that (s)IRLS-p with p = 0 has a better recovery performance and computational time as compared to p = 1. [sent-536, score-0.251]
55 There are also certain instances where SVT fails to have successful recovery while sIRLS-1 succeeds. [sent-546, score-0.272]
56 We also found that SVT was not successful in recovery for any of the hard problems. [sent-551, score-0.312]
57 (s)IRLS-0 also compares favorably with FPCA (Goldfarb and Ma, 2011) and Optspace (Keshavan and Oh, 2009) in terms of recovery and computational time on the easy and hard problem sets. [sent-552, score-0.291]
58 In summary, (s)IRLS-1,(s)IRLS-0 have a better recovery performance than a nuclear norm minimization implementation (SVT) as evidenced by successful recovery in both easy and hard problem sets. [sent-554, score-0.827]
59 We note that (s)IRLS-1 converges to the Nuclear Norm Minimizer (when the regularization, γ → 0) and empirically has a better recovery performance than SVT. [sent-555, score-0.231]
60 We also note that among the family of (s)IRLS-p algorithms tested, sIRLS-0 and IRLS-0 are better in both recovery performance and computational times. [sent-556, score-0.251]
61 , 2010) over easy and hard problems with the assumption that the rank of the matrix to be recovered (X0 ) is known. [sent-618, score-0.245]
62 When the rank of X0 (denoted as r) is known, the weighting matrix W k for sIRLS is computed using a rank r approximation of X k (also see Section 3. [sent-621, score-0.348]
63 2 # iter 1718 4298 417 814 1368 949 270 sIRLS-0 NS Time 10 12. [sent-646, score-0.229]
64 84 # iter 428 1120 2180 1536 244 IRLSM NS Time 0 0 10 642. [sent-653, score-0.229]
65 1 10 1736 # iter 1635 4868 466 947 1564 1006 254 IHT NS 10 10 10 10 10 10 10 Time 12. [sent-657, score-0.229]
66 88 # iter 1543 4011 69 103 147 134 46 Optspace NS Time 7 6. [sent-664, score-0.229]
67 44 # iter 38 44 71 225 104 35 177 98 167 IHT Time 1. [sent-698, score-0.229]
68 49 # iter 1498 4934 4859 406 1368 2961 897 270 sIRLS NS Time 10 12. [sent-743, score-0.229]
69 45 # iter 280 1059 660 203 IHT NS 0 0 0 10 10 0 10 10 Time 72. [sent-751, score-0.229]
70 21 # iter 40 133 89 45 Optspace NS Time 0 0 0 10 256. [sent-759, score-0.229]
71 For hard problems, however, sIRLS has a clear advantage over IHT, Optspace and FPCA in successful recovery (Table 7). [sent-768, score-0.312]
72 On the other hand IHT, Optspace and FPCA have partial or unsuccessful recovery in many problems. [sent-770, score-0.231]
73 We consider the following noisy matrix completion problem, minimize rank(X) subject to PΩ (X) = PΩ (B), where PΩ (B) = PΩ (X0 )+ PΩ (Z), X0 is a low rank matrix of rank r that we wish to recover and PΩ (Z) is the measurement noise. [sent-775, score-0.583]
74 For all the algorithms tested in Table 8, we declare the recovery to be successful if X − X0 F / X0 F ≤ ε = 10−3 , where X is the output of the algorithms. [sent-785, score-0.272]
75 Table 8 shows that sIRLS has successful recovery for easy noisy matrix completion problems with apriori knowledge of rank. [sent-786, score-0.489]
76 33 # iter 51 56 96 298 141 50 254 130 236 sIRLS NS Time 10 0. [sent-805, score-0.229]
77 Thus estimating the remaining ratings in the matrix corresponds to a matrix completion problem. [sent-850, score-0.262]
78 We set the rank of the unknown ratings matrix to be equal to 5 while running all the three algorithms. [sent-857, score-0.224]
79 We showed that IRLS-1 converges to the global minimum of the smoothed nuclear norm, and that IRLS-p with p < 1 converges to a stationary point of the corresponding non-convex yet smooth approximation to the rank function. [sent-881, score-0.38]
80 We gave a matrix recovery guarantee for IRLS-1, showing that it approximately recovers the true low-rank solution (within a small error depending on the algorithm parameter γ), if the null space of the affine measurement map satisfies certain properties. [sent-882, score-0.331]
81 We then focused on the matrix completion problem, a special case of affine rank minimization arising in collaborative filtering among other applications, and presented efficient implementations specialized to this problem. [sent-884, score-0.384]
82 Our first set of numerical experiments show that (s)IRLS-0 has a better recovery performance than nuclear norm minimization via SVT. [sent-887, score-0.495]
83 We show that sIRLS-0 has a good recovery performance even when noise is present. [sent-888, score-0.231]
84 Our second set of experiments demonstrate that sIRLS-0 compares favorably in terms of performance and run time with IHT, Optspace, and IRLSM when the rank of the low rank matrix to be recovered is known. [sent-889, score-0.356]
85 1 Future Directions Low-rank recovery problems have recently been pursued in machine learning motivated by applications including collaborative filtering. [sent-892, score-0.231]
86 Iterative reweighted algorithms for low-rank matrix recovery have empirically exhibited improved performance compared to unweighted convex relaxations. [sent-893, score-0.465]
87 The convex relaxation proposed for this problem minimizes a combination of nuclear norm and ℓ1 norm. [sent-902, score-0.226]
88 The simple Null space Property used here, being based on only the singular values of elements in the null space, makes the connection between associated vector and matrix recovery proofs clear and transparent, and may be of independent interest (see Oymak et al. [sent-911, score-0.393]
89 Now, argmin J p (Z, W , γ) = argmin Tr W (Z T Z + γI) Z:A (Z)=b Z:A (Z)=b W ≻0 p argmin Tr(X T X + γI) 2 −1 (Z T Z + γI) = Z:A (Z)=b argmin Tr W ∗ (Z T Z + γI) = Z:A (Z)=b p argmin F p (Z, (W ∗ ) p−2 , γ). [sent-947, score-0.25]
90 However since lim J p (X i ,W i , γi ) = lim J p (X ni ,W ni , γni ) = lim J p (X ni+1 ,W ni+1 , γni+1 ), we have a contradiction. [sent-975, score-0.297]
91 A study of convex regularizers for sparse recovery and feature selection. [sent-993, score-0.253]
92 Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. [sent-1121, score-0.263]
93 Low-rank matrix recovery via iteratively reweighted least squares minimization. [sent-1128, score-0.474]
94 Gradient descent with sparsification: An iterative algorithm for sparse recovery with restricted isometry property. [sent-1142, score-0.271]
95 Reweighted nuclear norm minimization with application to system identification. [sent-1270, score-0.264]
96 Cosamp: Iterative signal recovery from incomplete and inaccurate samples. [sent-1288, score-0.231]
97 New null space results and recovery thresholds for matrix rank minimization. [sent-1293, score-0.462]
98 A simplified approach to recovery conditions for low rank matrices. [sent-1303, score-0.362]
99 Face recovery in conference video streaming using robust principal component analysis. [sent-1325, score-0.231]
100 An accelerated proximal gradient algorithm for nuclear norm regularized least squares problem. [sent-1335, score-0.268]
wordName wordTfidf (topN-words)
[('optspace', 0.353), ('iht', 0.335), ('sirls', 0.335), ('recovery', 0.231), ('iter', 0.229), ('fpca', 0.214), ('fazel', 0.203), ('irlsm', 0.167), ('reweighted', 0.16), ('nuclear', 0.151), ('eweighted', 0.149), ('ohan', 0.149), ('terative', 0.149), ('ank', 0.143), ('rank', 0.131), ('irls', 0.122), ('mohan', 0.121), ('completion', 0.117), ('inimization', 0.115), ('ni', 0.105), ('tr', 0.102), ('ns', 0.093), ('svt', 0.093), ('lgorithms', 0.088), ('atrix', 0.088), ('fornasier', 0.087), ('fr', 0.079), ('candes', 0.076), ('oymak', 0.074), ('trw', 0.074), ('goldfarb', 0.072), ('iterates', 0.071), ('wp', 0.066), ('keshavan', 0.064), ('singular', 0.062), ('minimization', 0.06), ('stationary', 0.059), ('armp', 0.056), ('nmae', 0.056), ('norm', 0.053), ('svd', 0.053), ('matrix', 0.052), ('argmin', 0.05), ('null', 0.048), ('chandrasekaran', 0.048), ('kkt', 0.043), ('successful', 0.041), ('ratings', 0.041), ('movie', 0.041), ('subject', 0.041), ('iterative', 0.04), ('hard', 0.04), ('smooth', 0.039), ('nsp', 0.037), ('xtemp', 0.037), ('recovering', 0.037), ('minimize', 0.035), ('weighting', 0.034), ('af', 0.033), ('cai', 0.033), ('daubechies', 0.033), ('gradient', 0.033), ('chartrand', 0.032), ('projection', 0.031), ('squares', 0.031), ('sk', 0.03), ('lim', 0.029), ('admira', 0.028), ('cosamp', 0.028), ('hassibi', 0.028), ('lens', 0.028), ('needell', 0.028), ('parrilo', 0.028), ('toh', 0.028), ('cluster', 0.027), ('alternately', 0.026), ('spectral', 0.025), ('minimizing', 0.025), ('oh', 0.025), ('matrices', 0.025), ('implementations', 0.024), ('noisy', 0.024), ('apriori', 0.024), ('meka', 0.024), ('xr', 0.024), ('converged', 0.023), ('convex', 0.022), ('recovered', 0.022), ('zr', 0.021), ('halko', 0.021), ('thresholding', 0.021), ('iteration', 0.021), ('competitive', 0.021), ('family', 0.02), ('time', 0.02), ('horn', 0.02), ('heuristic', 0.02), ('compressive', 0.02), ('mazumder', 0.02), ('implementation', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
2 0.12966828 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
Author: Sahand Negahban, Martin J. Wainwright
Abstract: We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. Our results are based on measures of the “spikiness” and “low-rankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq -“balls” of bounded spikiness. Using information-theoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal. Keywords: matrix completion, collaborative filtering, convex optimization
3 0.06496878 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage
Author: Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, David P. Woodruff
Abstract: The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nystr¨ m-based low-rank o matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd log n) time, as opposed to the O(nd 2 ) time required by the na¨ve algorithm that involves computing an orthogonal basis for the ı range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments. Keywords: matrix coherence, statistical leverage, randomized algorithm
4 0.061777528 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
5 0.055048026 35 jmlr-2012-EP-GIG Priors and Applications in Bayesian Sparse Learning
Author: Zhihua Zhang, Shusen Wang, Dehua Liu, Michael I. Jordan
Abstract: In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. We exploit these properties to develop EM algorithms for sparse empirical Bayesian learning. We also show that these algorithms bear an interesting resemblance to iteratively reweighted ℓ2 or ℓ1 methods. Finally, we present two extensions for grouped variable selection and logistic regression. Keywords: sparsity priors, scale mixtures of exponential power distributions, generalized inverse Gaussian distributions, expectation-maximization algorithms, iteratively reweighted minimization methods
6 0.047116261 73 jmlr-2012-Multi-task Regression using Minimal Penalties
7 0.042757597 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory
8 0.042531125 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
9 0.040906988 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
10 0.040246062 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
11 0.038985528 97 jmlr-2012-Regularization Techniques for Learning with Matrices
12 0.037914906 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
13 0.036560781 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications
14 0.034053579 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
15 0.032952942 103 jmlr-2012-Sampling Methods for the Nyström Method
16 0.030853657 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
17 0.030446686 12 jmlr-2012-Active Clustering of Biological Sequences
18 0.029364999 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
19 0.029282458 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
20 0.029182035 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
topicId topicWeight
[(0, -0.14), (1, 0.038), (2, 0.006), (3, 0.006), (4, -0.061), (5, 0.023), (6, -0.043), (7, -0.14), (8, -0.02), (9, -0.195), (10, -0.146), (11, 0.027), (12, -0.09), (13, 0.064), (14, 0.138), (15, -0.036), (16, 0.142), (17, -0.007), (18, 0.06), (19, -0.2), (20, -0.254), (21, -0.005), (22, -0.04), (23, -0.003), (24, -0.125), (25, 0.027), (26, -0.129), (27, -0.011), (28, 0.043), (29, 0.087), (30, 0.023), (31, 0.068), (32, -0.054), (33, 0.102), (34, -0.003), (35, 0.039), (36, 0.013), (37, 0.035), (38, 0.132), (39, -0.184), (40, 0.012), (41, -0.053), (42, -0.074), (43, -0.078), (44, 0.035), (45, -0.062), (46, -0.004), (47, -0.069), (48, -0.061), (49, 0.174)]
simIndex simValue paperId paperTitle
same-paper 1 0.9319604 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
2 0.58491117 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
Author: Sahand Negahban, Martin J. Wainwright
Abstract: We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. Our results are based on measures of the “spikiness” and “low-rankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq -“balls” of bounded spikiness. Using information-theoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal. Keywords: matrix completion, collaborative filtering, convex optimization
3 0.58231956 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory
Author: Tamer Salman, Yoram Baram
Abstract: We describe a quantum algorithm for computing the intersection of two sets and its application to associative memory. The algorithm is based on a modification of Grover’s quantum search algorithm (Grover, 1996). We present algorithms for pattern retrieval, pattern completion, and pattern correction. We show that the quantum associative memory can store an exponential number of memories and retrieve them in sub-exponential time. We prove that this model has advantages over known classical associative memories as well as previously proposed quantum models. Keywords: associative memory, pattern completion, pattern correction, quantum computation, quantum search
4 0.48649108 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
5 0.41005319 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage
Author: Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, David P. Woodruff
Abstract: The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nystr¨ m-based low-rank o matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd log n) time, as opposed to the O(nd 2 ) time required by the na¨ve algorithm that involves computing an orthogonal basis for the ı range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments. Keywords: matrix coherence, statistical leverage, randomized algorithm
6 0.29056424 66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation
7 0.2786276 103 jmlr-2012-Sampling Methods for the Nyström Method
8 0.27130631 35 jmlr-2012-EP-GIG Priors and Applications in Bayesian Sparse Learning
9 0.26224011 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox
10 0.24763151 12 jmlr-2012-Active Clustering of Biological Sequences
11 0.23693897 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning
12 0.23619086 73 jmlr-2012-Multi-task Regression using Minimal Penalties
13 0.22580211 97 jmlr-2012-Regularization Techniques for Learning with Matrices
14 0.22320636 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
15 0.21516873 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
16 0.21445301 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
17 0.20482975 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
18 0.19982705 112 jmlr-2012-Structured Sparsity via Alternating Direction Methods
19 0.19616601 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
20 0.19326274 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
topicId topicWeight
[(7, 0.012), (21, 0.021), (26, 0.027), (29, 0.019), (35, 0.013), (56, 0.017), (69, 0.016), (75, 0.024), (77, 0.01), (79, 0.014), (92, 0.096), (96, 0.642)]
simIndex simValue paperId paperTitle
1 0.99290359 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
2 0.99229765 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
3 0.98027426 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
Author: Koby Crammer, Mark Dredze, Fernando Pereira
Abstract: Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of textcategorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing. Keywords: online learning, confidence prediction, text categorization
same-paper 4 0.9796952 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
5 0.9438771 91 jmlr-2012-Plug-in Approach to Active Learning
Author: Stanislav Minsker
Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classification, confidence bands
6 0.79452151 106 jmlr-2012-Sign Language Recognition using Sub-Units
7 0.77750635 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
8 0.77691013 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
9 0.76846159 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
10 0.7511586 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
11 0.74959689 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
12 0.74796695 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
13 0.74670255 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
14 0.74427587 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
15 0.74309748 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
16 0.7426886 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
17 0.74253136 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
18 0.74101567 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
19 0.74014497 73 jmlr-2012-Multi-task Regression using Minimal Penalties
20 0.73962146 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing