nips nips2012 nips2012-64 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tingni Sun, Cun-hui Zhang
Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. [sent-5, score-0.137]
2 We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. [sent-6, score-0.813]
3 The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. [sent-7, score-0.403]
4 Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. [sent-9, score-0.549]
5 This provides a unified analysis of the noisy and noiseless matrix completion problems. [sent-10, score-0.245]
6 Suppose we observe vectors (ωi , yi ), yi = Θωi + εi , i = 1, . [sent-19, score-0.11]
7 A well-known application of matrix completion is the Netflix problem where yi is the rating of movie bj by user ai for ω = (ai , bj ) ∈ Ω∗ [1]. [sent-24, score-0.267]
8 In such applications, the proportion of the observed entries is typically very small, so that the estimation or recovery of Θ is impossible without a structure assumption on Θ. [sent-25, score-0.125]
9 A focus of recent studies of matrix completion has been on a simpler formulation, also known as exact recovery, where the observations are assumed to be uncorrupted, i. [sent-27, score-0.226]
10 An iterative algorithm was proposed in [5] to project a trimmed SVD of the incomplete data matrix to the space of matrices of a fixed rank r. [sent-31, score-0.235]
11 The nuclear norm was proposed as a surrogate for the rank, leading to the following convex minimization problem in a linear space [2]: Θ(CR) = arg min M (N ) : Mωi = yi ∀ i ≤ n . [sent-32, score-0.241]
12 M We denote the nuclear norm by · (N ) here and throughout this paper. [sent-33, score-0.136]
13 This procedure, analyzed in [2, 3, 4, 11] among others, is parallel to the replacement of the 0 penalty by the 1 penalty in solving the sparse recovery problem in a linear space. [sent-34, score-0.225]
14 1 In this paper, we focus on the problem of matrix completion with noisy observations (1) and take the exact recovery as a special case. [sent-35, score-0.308]
15 Since the exact constraint is no longer appropriate in the presence n of noise, penalized squared error i=1 (Mωi − yi )2 is considered. [sent-36, score-0.138]
16 By reformulating the problem in Lagrange form, [8] proposed the spectrum Lasso n n Θ(MHT) = arg min M 2 Mωi /2 − i=1 yi Mωi + λ M (N ) , (2) i=1 along with an iterative convex minimization algorithm. [sent-37, score-0.476]
17 Both [7, 9] provided nearly optimal error bounds when the noise level is of no smaller order than the ∞ norm of the target matrix Θ, but not of smaller order, especially not for exact recovery. [sent-41, score-0.377]
18 In a different approach, [6] proposed a non-convex recursive algorithm and provided error bounds in proportion to the noise level. [sent-42, score-0.179]
19 However, the procedure requires the knowledge of the rank r of the unknown Θ and the error bound is optimal only when d1 and d2 are of the same order. [sent-43, score-0.195]
20 Our goal is to develop an algorithm for matrix completion that can be as easily computed as the spectrum Lasso (2) and enjoys a nearly optimal error bound proportional to the noise level to continuously cover both the noisy and noiseless cases. [sent-44, score-0.797]
21 We propose to use an elastic penalty, a linear combination of the nuclear and Frobenius norms, which leads to the estimator n M n 2 Mωi /2 − Θ = arg min i=1 yi Mωi + λ1 M (N ) + (λ2 /2) M 2 (F ) , (3) i=1 where · (N ) and · (F ) are the nuclear and Frobenius norms, respectively. [sent-45, score-0.43]
22 We call (3) spectrum elastic net (E-net) since it is parallel to the E-net in linear regression, the least squares estimator with a sum of the 1 and 2 penalties, introduced in [15]. [sent-46, score-0.546]
23 Here the nuclear penalty provides the sparsity in the spectrum, while the Frobenius penalty regularizes the inversion of the quadratic term. [sent-47, score-0.252]
24 Meanwhile, since the Frobenius penalty roughly shrinks the estimator by a factor π0 /(π0 + λ2 ), we correct this bias by a calibration step, Θ = (1 + λ2 /π0 )Θ. [sent-48, score-0.17]
25 (4) We call this estimator calibrated spectrum E-net. [sent-49, score-0.571]
26 The algorithm iteratively replaces the missing entries with those obtained from a scaled soft-thresholding singular value decomposition (SVD) until the resulting matrix converges. [sent-51, score-0.2]
27 Under proper coherence conditions, we prove that for suitable penalty levels λ1 and λ2 , the calibrated spectrum E-net (4) achieves a desired error bound in the Frobenius norm. [sent-53, score-0.916]
28 Our error bound is of nearly optimal order and in proportion to the noise level. [sent-54, score-0.23]
29 This provides a sharper result than those of [7, 9] when the noise level is of smaller order than the ∞ norm of Θ, and than that of [6] when d2 /d1 is large. [sent-55, score-0.159]
30 Our simulation results support the use of the calibrated spectrum E-net. [sent-56, score-0.555]
31 Our analysis of the calibrated spectrum E-net uses an inequality similar to a duel certificate bound in [3]. [sent-58, score-0.612]
32 The bound in [3] requires sample size n min{(r log d)2 , r(log d)6 }d log d, where d = d1 + d2 . [sent-59, score-0.188]
33 We use the method of moments to remove a log d factor in the first component of their sample size requirement. [sent-60, score-0.103]
34 This leads to a sample size requirement of n r2 d log d, with an extra r in comparison to the ideal n rd log d. [sent-61, score-0.268]
35 Since the extra r does not appear in our error bound, its appearance in the sample size requirement seems to be a technicality. [sent-62, score-0.152]
36 In Section 2, we describe an iterative algorithm for the computation of the spectrum E-net and study its convergence. [sent-64, score-0.371]
37 In Section 3, we derive error bounds for the calibrated spectrum E-net. [sent-65, score-0.598]
38 For matrices M ∈ Rd1 ×d2 , M (N ) is the nuclear norm (the sum of all singular values of M ), M (S) is the spectrum norm (the largest 2 singular value), M (F ) is the Frobenius norm (the 2 norm of vectorized M ), and M ∞ = maxjk |Mjk |. [sent-69, score-0.787]
39 2 An algorithm for spectrum elastic regularization We first present a lemma for the M-step of our iterative algorithm. [sent-81, score-0.502]
40 We use Lemma 1 to solve the M-step of the EM algorithm for the spectrum E-net (3). [sent-92, score-0.335]
41 We still need an E-step to impute a complete matrix given the observed data {yi , ωi : i = 1, . [sent-93, score-0.095]
42 Suppose that the complete data is composed of m∗ observations at each ω for a certain integer m∗ . [sent-99, score-0.071]
43 (com) (com) Let Y ω be the sample mean of the complete data at ω and Y be the matrix with components (com) Yω . [sent-100, score-0.102]
44 yi be the sample mean of the observations at ω and Y the white noise model, the conditional expectation of Y (com) ω (obs) given Y = (obs) is (1 − mω /m∗ )Θω for mω ≤ m∗ . [sent-103, score-0.164]
45 We now present the EM-algorithm for the computation of the spectrum E-net Θ in (3). [sent-106, score-0.335]
46 3 Analysis of estimation accuracy In this section, we derive error bounds for the calibrated spectrum E-net. [sent-115, score-0.598]
47 PT R Suppose ≤ 1/2, sr ≥ 5λ1 /λ2 , √ ∆ − R(PT R + PT )−1 PT ∆ (S) ≤ λ1 /4, PT ∆ (F ) ≤ rλ1 /8, √ ⊥ PT ε (F ) ≤ rλ1 /8, Qε (S) ≤ 3λ1 /4, PT ε (S) ≤ λ1 . [sent-126, score-0.121]
48 (op) (7) (8) (9) Then the calibrate spectrum E-net (4) satisfies Θ−Θ (F ) √ ≤ 2 rλ1 /π0 . [sent-127, score-0.335]
49 When ωi are random entries in Ω∗ , EH = π0 I, so that (8) and the first inequality of (7) are expected to hold under proper conditions. [sent-129, score-0.107]
50 Since the rank of PT ε is no greater than 2r, (9) essentially requires ε (S) λ1 . [sent-130, score-0.092]
51 When λ2 /π0 diverges to infinity, the calibrated spectrum E-net (4) becomes the modified spectrum Lasso of [7]. [sent-133, score-0.853]
52 Theorem 2 provides sufficient conditions on the target matrix and the noise for achieving a certain level of estimation error. [sent-134, score-0.194]
53 Intuitively, these conditions on the target matrix Θ must imply a certain level of coherence (or flatness) of the unknown matrix since it is impossible to distinguish the unknown from zero when the observations are completely outside its support. [sent-135, score-0.406]
54 In [2, 3, 4, 11], coherence conditions are imposed on µ0 = max{(d1 /r) U U ∞ , (d2 /r) VV ∞ }, µ1 = d1 d2 /r U V ∞, (11) where U and V are matrices of singular vectors of Θ. [sent-136, score-0.291]
55 [9] considered a more general notation of spikiness of a matrix M , defined as the ratio between the ∞ and dimension-normalized 2 norms, αsp (M ) = M d1 d2 / M ∞ (F ) . [sent-137, score-0.127]
56 (12) Suppose in the rest of the section that ωi are iid points uniformly distributed in Ω∗ and εi are iid N (0, σ 2 ) variables independent of {ωi }. [sent-138, score-0.084]
57 The following theorem asserts that under certain coherence conditions on the matrices Θ, U U , V V and U V , all conditions of Theorem 2 hold with large probability when the sample size n is of the order r2 d log d. [sent-139, score-0.407]
58 We require the knowledge of noise level σ to determine the penalty level that is usually considered as tuning parameter in practice. [sent-143, score-0.188]
59 In our simulation experiment, we use n 2 λ2 = λ1 {n/(d log d)}1/4 /F with F = ( i=1 yi /π0 )1/2 . [sent-145, score-0.148]
60 A key element in our analysis is to find a probabilistic bound for the second inequality of (8), or equivalently an upper bound for P R(PT R + PT )−1 (λ2 Θ + λ1 U V ) (S) > λ1 /4 . [sent-147, score-0.143]
61 (15) This guarantees the existence of a primal dual certificate for the spectrum E-net penalty [14]. [sent-148, score-0.406]
62 For λ2 = 0, a similar inequality was proved in [3], where the sample size requirement is n ≥ C0 min{µ2 r2 (log d)2 d, µ2 r(log d)6 d} for a certain coherence factor µ. [sent-149, score-0.348]
63 We remove a log factor in the first bound, resulting in the sample size requirement in (14), which is optimal when r = O(1). [sent-150, score-0.174]
64 For exact recovery in the noiseless case, the sample size n rd(log d)2 is sufficient if a golfing scheme is used to construct an approximate dual certificate [4, 11]. [sent-151, score-0.163]
65 (16) We use a different graphical approach than those in [3] to bound E trace({(Rk M ) (Rk M )}m ) in the proof of Lemma 2. [sent-157, score-0.079]
66 By (16) with km log d for k ≥ 2 and an even simpler √ bound for k = 1 and Rem, (15) holds when ( d1 d2 /r) M ∞ λ1 η, where η r2 d(log d)/n. [sent-161, score-0.134]
67 Finally, we use matrix exponential inequalities [10, 12] to verify other conditions of Theorem 2. [sent-163, score-0.116]
68 Compared with [7] and [9], the main advantage of Theorem 3 is the proportionality of its error n 2 bound to the noise level. [sent-168, score-0.158]
69 This error bound achieves the squared error rate σ 2 rd(log d)/n as in Theorem 3 when the noise level σ is of no smaller order than Θ ∞ , but not of smaller order. [sent-170, score-0.295]
70 In particular, (17) does not imply exact recovery when σ = 0. [sent-171, score-0.11]
71 In Theorem 3, the error bound converges to zero as the noise level diminishes, implying exact recovery in the noiseless case. [sent-172, score-0.325]
72 √In [9], a constrained spectrum Lasso was proposed that minimizes (2) subject to M ∞ ≤ α∗ / d1 d2 . [sent-173, score-0.335]
73 Scale change from the above error bound yields Θ(NW) − Θ 2 (F ) /(d1 d2 ) ≤ C max{σ 2 , Θ 2 ∗ 2 (F ) /(d1 d2 )}(α ) rd(log d)/n. [sent-175, score-0.103]
74 We shall point out that (17) and (18) only require sample size n rd log d. [sent-177, score-0.141]
75 Compared with [6], the main advantage of Theorem 3 is the independence of its sample size requirement on the aspect ratio d2 /d1 , where d2 ≥ d1 is assumed without loss of generality by symmetry. [sent-179, score-0.152]
76 The error bound in [6] implies Θ(KMO) − Θ 2 (F ) /(d1 d2 ) ≤ C0 (s1 /sr )4 σ 2 rd(log d)/n (19) ∗ ∗ ∗ ∗ for sample size n ≥ C1 rd log d + C2 r2 d d2 /d1 , where {C1 , C2 } are constants depending on the same set of coherence factors as in (14) and s1 > · · · > sr are the singular values of Θ. [sent-180, score-0.6]
77 Therefore, Theorem 3 effectively replaces the root aspect ratio d2 /d1 in the sample size requirement of (19) with a log factor, and removes the coherence factor (s1 /sr )4 on the right-hand side of (19). [sent-181, score-0.412]
78 We note that s1 /sr is a larger coherence factor than Θ (F ) /(r1/2 sr ) in the sample size requirement in Theorem 3. [sent-182, score-0.4]
79 The root aspect ratio can be removed from the sample size requirement for (19) if Θ can be divided into square blocks uniformly satisfying the coherence conditions. [sent-183, score-0.313]
80 We provide the description of the simulation settings in our notation as follows: The target matrix is Θ = U V , where Ud1 ×r and Vd2 ×r are random matrices with independent standard normal entries. [sent-185, score-0.119]
81 The signal to noise ratio (SNR) is √ defined as SNR = r/σ. [sent-198, score-0.082]
82 We compare the calibrated spectrum E-net (4) with the spectrum Lasso (2) and its modification Θ(KLT) of [7]. [sent-199, score-0.853]
83 For all methods, we compute a series of estimators with 100 different penalty levels, where the smallest penalty level corresponds to a full-rank solution and the largest penalty level corresponds to a zero solution. [sent-200, score-0.275]
84 For the calibrated spectrum E-net, we always use λ2 = n 2 λ1 {n/(d log d)}1/4 /F , where F = ( i=1 yi /π0 )1/2 is an estimator for Θ (F ) . [sent-201, score-0.682]
85 We plot the training errors and test errors as functions of estimated ranks, where the training and test errors are defined as Training error = PΩ (Θ − Y ) PΩ Y 2 (F ) 2 (F ) , Test error = ⊥ PΩ (Θ − Θ) ⊥ PΩ Θ 2 (F ) 2 (F ) . [sent-202, score-0.201]
86 The rank of Θ is 10 but {Θ, Ω, ε} are regenerated in each replication. [sent-204, score-0.092]
87 Different noise levels and proportions of the observed entries are considered. [sent-205, score-0.104]
88 In this experiment, the calibrated spectrum E-net and the spectrum Lasso estimator have very close testing and training errors, and both of them significantly outperform the modified Lasso. [sent-207, score-0.906]
89 Figure 1 also illustrates that in most cases, the calibrated spectrum E-net and spectrum Lasso achieve the optimal test error when the estimated rank is around the true rank. [sent-208, score-0.999]
90 We note that the constrained spectrum Lasso estimator Θ(NW) would have the same performance as the spectrum Lasso when the constraint αsp (Θ) ≤ α∗ is set with a sufficiently high α∗ . [sent-209, score-0.723]
91 However, analytic properties of the spectrum Lasso is unclear without constraint or modification. [sent-210, score-0.335]
92 5 Proof of Theorem 2 The proof of Theorem 2 requires the following proposition that controls the approximation error of the Taylor expansion of the nuclear norm with subdifferentiation. [sent-211, score-0.22]
93 in [13], is used to control the variation of the tangent space of the spectrum E-net estimator. [sent-225, score-0.358]
94 (22) The last inequality above follows from the first inequalities in (7), (8) and (9). [sent-235, score-0.077]
95 We write the spectrum E-net estimator (3) as Θ = arg min HM, M /2 − Y, M + λ1 M M 7 (N ) + (λ2 /2) M 2 (F ) . [sent-238, score-0.438]
96 (24) Since Θ∗ = ∆∗ − Θ ∈ T and the singular values of Θ is no smaller than (π0 sr − λ1 )/(π0 + λ2 ) ≥ (sr − λ1 /λ2 )/ξ ≥ 4λ1 /(λ2 ξ) by the second inequality in (7), Proposition 1 and (22) imply Rem1 ≤ 2 Θ∗ − Θ 2 (F ) /{(π0 sr − λ1 )/(π0 + λ2 )} ≤ r(λ1 /π0 )2 /(8ξλ1 /λ2 ). [sent-247, score-0.411]
97 1 (26) Therefore, the error bound (10) follows from (21), (22) and (26). [sent-249, score-0.103]
98 (29) Therefore, the error bound (10) follows from (20) and (29). [sent-257, score-0.103]
99 Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. [sent-302, score-0.08]
100 Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. [sent-314, score-0.081]
wordName wordTfidf (topN-words)
[('pt', 0.728), ('spectrum', 0.335), ('calibrated', 0.183), ('coherence', 0.161), ('obs', 0.129), ('sr', 0.121), ('hpt', 0.12), ('completion', 0.115), ('imp', 0.113), ('snr', 0.105), ('sp', 0.098), ('elastic', 0.094), ('lasso', 0.094), ('rank', 0.092), ('nuclear', 0.089), ('frobenius', 0.087), ('com', 0.086), ('singular', 0.074), ('cate', 0.073), ('penalty', 0.071), ('requirement', 0.071), ('certi', 0.069), ('rd', 0.058), ('recovery', 0.057), ('log', 0.056), ('noise', 0.055), ('matrix', 0.055), ('yi', 0.055), ('theorem', 0.054), ('error', 0.054), ('estimator', 0.053), ('noiseless', 0.05), ('bound', 0.049), ('svd', 0.048), ('norm', 0.047), ('nw', 0.047), ('inequality', 0.045), ('spikiness', 0.045), ('tingni', 0.045), ('proportion', 0.044), ('dv', 0.044), ('iid', 0.042), ('penalties', 0.038), ('net', 0.038), ('proper', 0.038), ('lemma', 0.037), ('klt', 0.037), ('simulation', 0.037), ('iterative', 0.036), ('op', 0.035), ('keshavan', 0.035), ('old', 0.033), ('rk', 0.033), ('inequalities', 0.032), ('montanari', 0.031), ('level', 0.031), ('modi', 0.031), ('errors', 0.031), ('proof', 0.03), ('conditions', 0.029), ('km', 0.029), ('exact', 0.029), ('arg', 0.029), ('nearly', 0.028), ('multiplicity', 0.028), ('observations', 0.027), ('matrices', 0.027), ('ratio', 0.027), ('sample', 0.027), ('aspect', 0.027), ('parallel', 0.026), ('smaller', 0.026), ('calibration', 0.026), ('pennsylvania', 0.026), ('bounds', 0.026), ('norms', 0.025), ('incomplete', 0.025), ('dr', 0.025), ('levels', 0.025), ('noisy', 0.025), ('entries', 0.024), ('scaled', 0.024), ('certain', 0.024), ('imply', 0.024), ('tangent', 0.023), ('replaces', 0.023), ('suppose', 0.023), ('min', 0.021), ('bj', 0.021), ('quadratic', 0.021), ('replaced', 0.021), ('factor', 0.02), ('em', 0.02), ('complete', 0.02), ('atness', 0.02), ('czhang', 0.02), ('dms', 0.02), ('impute', 0.02), ('rem', 0.02), ('uncorrupted', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
Author: Tingni Sun, Cun-hui Zhang
Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1
2 0.17083639 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville
Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results. 1
3 0.16044839 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)
Author: Nicolò Cesa-bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz
Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. 1
4 0.14932236 319 nips-2012-Sparse Prediction with the $k$-Support Norm
Author: Andreas Argyriou, Rina Foygel, Nathan Srebro
Abstract: We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an 2 penalty. We show that this new k-support norm provides a tighter relaxation than the elastic net and can thus be advantageous in in sparse prediction problems. We also bound the looseness of the elastic net, thus shedding new light on it and providing justification for its use. 1
5 0.14482729 44 nips-2012-Approximating Concavely Parameterized Optimization Problems
Author: Joachim Giesen, Jens Mueller, Soeren Laue, Sascha Swiercy
Abstract: We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the √ parameter can always be approximated with accuracy ε > 0 by a set of size O(1/ ε). A √ lower bound of size Ω(1/ ε) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an √ approximate path of size O(1/ ε). Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. 1
6 0.14318189 102 nips-2012-Distributed Non-Stochastic Experts
7 0.1408436 247 nips-2012-Nonparametric Reduced Rank Regression
8 0.12675016 260 nips-2012-Online Sum-Product Computation Over Trees
9 0.1024924 69 nips-2012-Clustering Sparse Graphs
10 0.10129955 165 nips-2012-Iterative ranking from pair-wise comparisons
11 0.097276017 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
12 0.097113319 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses
13 0.083768912 11 nips-2012-A Marginalized Particle Gaussian Process Regression
14 0.083764374 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
15 0.083027318 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
16 0.082272284 208 nips-2012-Matrix reconstruction with the local max norm
17 0.075941101 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
18 0.075871557 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
19 0.073712215 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
20 0.069578163 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
topicId topicWeight
[(0, 0.163), (1, 0.037), (2, 0.13), (3, -0.029), (4, 0.062), (5, 0.046), (6, -0.028), (7, 0.027), (8, -0.049), (9, 0.04), (10, 0.084), (11, 0.035), (12, -0.135), (13, -0.032), (14, 0.024), (15, 0.065), (16, 0.101), (17, 0.007), (18, 0.04), (19, -0.143), (20, -0.059), (21, 0.054), (22, -0.047), (23, 0.014), (24, -0.145), (25, 0.088), (26, -0.01), (27, 0.018), (28, 0.018), (29, -0.064), (30, -0.036), (31, 0.008), (32, 0.037), (33, 0.057), (34, 0.006), (35, 0.108), (36, 0.072), (37, 0.05), (38, -0.171), (39, 0.148), (40, 0.05), (41, -0.011), (42, -0.107), (43, 0.235), (44, -0.012), (45, 0.092), (46, -0.003), (47, -0.1), (48, -0.099), (49, -0.167)]
simIndex simValue paperId paperTitle
same-paper 1 0.94615722 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
Author: Tingni Sun, Cun-hui Zhang
Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1
2 0.6307689 102 nips-2012-Distributed Non-Stochastic Experts
Author: Varun Kanade, Zhenming Liu, Bozidar Radunovic
Abstract: We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1, . . . , n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T , while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O( log(n)T ) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O( log(n)kT ) and the communication is 0. This paper shows the √ difficulty of simultaneously achieving regret asymptotically better than kT and communication better than T . We give a novel algorithm that for an oblivious √ adversary achieves a non-trivial trade-off: regret O( k 5(1+ )/6 T ) and communication O(T /k ), for any value of ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1
3 0.60615247 44 nips-2012-Approximating Concavely Parameterized Optimization Problems
Author: Joachim Giesen, Jens Mueller, Soeren Laue, Sascha Swiercy
Abstract: We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the √ parameter can always be approximated with accuracy ε > 0 by a set of size O(1/ ε). A √ lower bound of size Ω(1/ ε) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an √ approximate path of size O(1/ ε). Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. 1
4 0.56917971 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville
Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results. 1
5 0.54101074 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
Author: Borja Balle, Mehryar Mohri
Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1
6 0.5404321 247 nips-2012-Nonparametric Reduced Rank Regression
7 0.51638848 208 nips-2012-Matrix reconstruction with the local max norm
8 0.51227385 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
9 0.49957737 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)
10 0.46813127 319 nips-2012-Sparse Prediction with the $k$-Support Norm
11 0.45697433 221 nips-2012-Multi-Stage Multi-Task Feature Learning
12 0.43168002 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
13 0.42948553 212 nips-2012-Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL
14 0.41696152 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
15 0.41327336 34 nips-2012-Active Learning of Multi-Index Function Models
16 0.40957746 260 nips-2012-Online Sum-Product Computation Over Trees
17 0.40333131 206 nips-2012-Majorization for CRFs and Latent Likelihoods
18 0.39274791 11 nips-2012-A Marginalized Particle Gaussian Process Regression
19 0.387299 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
20 0.38305923 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
topicId topicWeight
[(0, 0.054), (4, 0.01), (17, 0.013), (21, 0.017), (31, 0.098), (38, 0.191), (39, 0.011), (42, 0.023), (54, 0.024), (55, 0.016), (74, 0.023), (76, 0.212), (80, 0.09), (92, 0.118)]
simIndex simValue paperId paperTitle
1 0.95568115 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections
Author: Ping Li, Cun-hui Zhang
Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.
2 0.95510113 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
Author: Ronald Ortner, Daniil Ryabko
Abstract: We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are H¨ lder continuity of rewards and transition o probabilities. 1
same-paper 3 0.95242536 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
Author: Tingni Sun, Cun-hui Zhang
Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1
Author: Stefan Habenschuss, Johannes Bill, Bernhard Nessler
Abstract: Recent spiking network models of Bayesian inference and unsupervised learning frequently assume either inputs to arrive in a special format or employ complex computations in neuronal activation functions and synaptic plasticity rules. Here we show in a rigorous mathematical treatment how homeostatic processes, which have previously received little attention in this context, can overcome common theoretical limitations and facilitate the neural implementation and performance of existing models. In particular, we show that homeostatic plasticity can be understood as the enforcement of a ’balancing’ posterior constraint during probabilistic inference and learning with Expectation Maximization. We link homeostatic dynamics to the theory of variational inference, and show that nontrivial terms, which typically appear during probabilistic inference in a large class of models, drop out. We demonstrate the feasibility of our approach in a spiking WinnerTake-All architecture of Bayesian inference and learning. Finally, we sketch how the mathematical framework can be extended to richer recurrent network architectures. Altogether, our theory provides a novel perspective on the interplay of homeostatic processes and synaptic plasticity in cortical microcircuits, and points to an essential role of homeostasis during inference and learning in spiking networks. 1
5 0.94077426 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes
Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar
Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1
6 0.92486167 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
7 0.92416233 319 nips-2012-Sparse Prediction with the $k$-Support Norm
8 0.92153513 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
9 0.92082989 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
10 0.92081153 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
11 0.92040694 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
12 0.9197669 148 nips-2012-Hamming Distance Metric Learning
13 0.91905117 292 nips-2012-Regularized Off-Policy TD-Learning
14 0.91878748 34 nips-2012-Active Learning of Multi-Index Function Models
15 0.91754985 163 nips-2012-Isotropic Hashing
16 0.91736835 361 nips-2012-Volume Regularization for Binary Classification
17 0.91684973 65 nips-2012-Cardinality Restricted Boltzmann Machines
18 0.91638684 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
19 0.91611338 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
20 0.91583353 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models