nips nips2012 nips2012-139 knowledge-graph by maker-knowledge-mining

139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Source: pdf

Author: Arnak Dalalyan, Yin Chen

Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish ﬁnite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Fused sparsity and robust estimation for linear models with unknown variance Arnak S. [sent-1, score-0.339]

2 fr Abstract In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. [sent-7, score-0.655]

3 We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. [sent-8, score-0.152]

4 A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. [sent-9, score-0.546]

5 We establish ﬁnite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. [sent-10, score-0.111]

6 Even if the ambient dimensionality p of β ∗ is larger than n, it has proven possible to consistently estimate this vector under the sparsity assumption. [sent-12, score-0.116]

7 Most famous methods of estimating sparse vectors, the Lasso and the Dantzig Selector (DS), rely on convex relaxation of 0 -norm penalty leading to a convex program that in¯ volves the 1 -norm of β. [sent-14, score-0.153]

8 (DS) ¯ The performance of these algorithms depends heavily on the choice of the tuning parameter λ. [sent-16, score-0.079]

9 ¯ should be chosen proportionally to the Several empirical and theoretical studies emphasized that λ noise standard deviation σ ∗ . [sent-17, score-0.136]

10 scaled Lasso [24]) and the 1 penalized log-likelihood minimization [20]. [sent-24, score-0.201]

11 In the present work, we are interested in the setting where β ∗ is not necessarily sparse, but for a known q × p matrix M, the vector Mβ ∗ is sparse. [sent-25, score-0.079]

12 For general matrices M, tight risk bounds were proved in [14]. [sent-30, score-0.154]

13 We adopt here this framework of general M and aim at designing a computationally efﬁcient procedure capable to handle the situation of unknown noise level and for which we are able to provide theoretical guarantees along with empirical evidence for its good performance. [sent-31, score-0.21]

14 This goal is attained by introducing a new procedure, termed Scaled Fused Dantzig Selector (SFDS), which is closely related to the penalized maximum likelihood estimator but has some advantages in terms of computational complexity. [sent-32, score-0.227]

15 We establish tight risk bounds for the SFDS, which are nearly as strong as those proved for the Lasso and the Dantzig selector in the case of known σ ∗ . [sent-33, score-0.341]

16 We also show that the robust estimation in linear models can be seen as a particular example of the fused sparsity scenario. [sent-34, score-0.685]

17 2 Estimation under fused sparsity with unknown level of noise 2. [sent-36, score-0.654]

18 1 Scaled Fused Dantzig Selector We will only consider the case rank(M) = q ≤ p, which is more relevant for the applications we have in mind (image denoising and robust estimation). [sent-37, score-0.093]

19 Under this condition, one can ﬁnd a (p − q) × p matrix N such that the augmented matrix M = [M N ] is of full rank. [sent-38, score-0.088]

20 Let us denote by mj the jth column of the matrix M −1 , so that M −1 = [m1 , . [sent-39, score-0.107]

21 Given two positive tuning parameters λ and µ, we deﬁne the Scaled Fused Dantzig Selector (SFDS) (β, σ) as a solution to the following optimization problem: q minimize j=1   |mj X (Xβ − Y )| ≤ λσ Xmj   N X (Xβ − Y ) = 0, Xmj 2 |(Mβ)j | subject to  †   nµσ 2 + Y Xβ ≤ Y 2 . [sent-50, score-0.126]

22 2 Relation with the penalized maximum likelihood estimator One natural way to approach the problem of estimating β ∗ in our setup is to rely on the standard procedure of penalized log-likelihood minimization. [sent-54, score-0.348]

23 , when p/n is not small, the maximum likelihood estimator is subject to overﬁtting and is of very poor quality. [sent-58, score-0.145]

24 If it is plausible to expect that the data can be ﬁtted sufﬁciently well by a vector β ∗ such that for some matrix M, only a small fraction of elements of Mβ ∗ are nonzero, then one can considerably improve the quality of estimation by adding a penalty term to the log-likelihood. [sent-59, score-0.165]

25 This corresponds to deﬁning the estimator (β PL , σ PL ) as the 2 minimizer of the penalized log-likelihood ¯(Y , X; β, σ) = n log(σ) + Y − Xβ 2σ 2 2 2 q ωj |(Mβ)j |. [sent-62, score-0.19]

26 This leads to the estimator ¯ (β PL , σ PL ) = arg min n log(σ) + β,σ Y − Xβ 2σ 2 2 2 q + ωj ¯ j=1 |(Mβ)j | . [sent-64, score-0.098]

27 Our goal is to propose a procedure that is close in spirit to the penalized maximum likelihood but has the additional property of being computable by standard algorithms of second-order cone programming. [sent-67, score-0.269]

28 To achieve this goal, at the ﬁrst step, we remark that it can be useful to introduce a penalty term that depends exclusively on σ and that prevents the estimator of σ ∗ from being too large or too small. [sent-68, score-0.18]

29 One can show that the only function (up to a multiplicative constant) that can serve as penalty without breaking the property of scale equivariance is the logarithmic function. [sent-69, score-0.186]

30 Therefore, we introduce an additional tuning parameter µ > 0 and look for minimizing the criterion nµ log(σ) + Y − Xβ 2σ 2 2 2 q + ωj ¯ j=1 |(Mβ)j | . [sent-70, score-0.079]

31 σ (2) If we make the change of variables φ1 = Mβ/σ, φ2 = Nβ/σ and ρ = 1/σ, we get a convex function for which the ﬁrst-order conditions [20] take the form mj X (Y − Xβ) ∈ ωj sign({Mβ}j ), ¯ N† X (Y − Xβ) = 0, 1 nµ (3) (4) Y 2 2 − Y Xβ = σ 2 . [sent-71, score-0.137]

32 Therefore, to simplify the problem of optimization we propose to replace minimization of (2) by the minimization of the weighted 1 norm j ωj |(Mβ)j | subject to some constraints that are as close as possible to (3-5). [sent-73, score-0.111]

33 As we explain below, the particular choice of ωj s is dictated by the desire to enforce the scale equivariance of the procedure. [sent-76, score-0.145]

34 Indeed, one easily checks that if (β, σ) is a solution to (P1) for some inputs X, Y and M, then α(β, σ) will be a solution to (P1) for the inputs X, αY and M, whatever the value of α ∈ R is. [sent-79, score-0.121]

35 This is the equivariance with respect to the scale change in the response Y . [sent-80, score-0.145]

36 Our method is also equivariant with respect to the scale change in M. [sent-81, score-0.097]

37 More precisely, if (β, σ) is a solution to (P1) for some inputs X, Y and M, then (β, σ) will be a solution to (P1) for the inputs X, Y and DM, whatever the q × q diagonal matrix D is. [sent-82, score-0.2]

38 The latter property is important since if we believe that for a given matrix M the vector Mβ ∗ is sparse, then this is also the case for the vector DMβ ∗ , for any diagonal matrix D. [sent-83, score-0.123]

39 Having a procedure the output of which is independent of the choice of D is of signiﬁcant practical importance, since it leads to a solution that is robust with respect to small variations of the problem formulation. [sent-84, score-0.125]

40 The second attractive feature of the SFDS is that it can be computed by solving a convex optimization problem of second-order cone programming (SOCP). [sent-85, score-0.154]

41 Note that all these constraints can be transformed into linear inequalities, except the last one which is a second order cone constraint. [sent-94, score-0.115]

42 4 Finite sample risk bound To provide theoretical guarantees for our estimator, we impose the by now usual assumption of restricted eigenvalues on a suitably chosen matrix. [sent-97, score-0.184]

43 We say that a n × q matrix A satisﬁes the restricted eigenvalue condition RE(s, 1), e if Aδ 2 ∆ √ κ(s, 1) = min min > 0. [sent-102, score-0.18]

44 n δJ 2 |J|≤s δJ c 1 ≤ δJ 1 We say that A satisﬁes the strong restricted eigenvalue condition RE(s, s, 1), if ∆ κ(s, s, 1) = min |J|≤s √ min δJ c 1≤ δJ 1 Aδ 2 n δJ∪J0 > 0, 2 where J0 is the subset of {1, . [sent-103, score-0.136]

45 n (6) If the vector Mβ ∗ is s-sparse and the matrix (In − Π)XM† satisﬁes the condition RE(s, 1) with some κ > 0 then, with probability at least 1 − 6δ, it holds: 2γ log(q/δ) σ ∗ 2s log(1/δ) 4 (σ + σ ∗ )s + (7) κ2 n κ n 2γs log(q/δ) + σ∗ 8 log(1/δ) + r . [sent-116, score-0.102]

46 4 Before looking at the consequences of these risk bounds in the particular case of robust estimation, let us present some comments highlighting the claims of Theorem 2. [sent-119, score-0.248]

47 The ﬁrst comment is about the conditions on the tuning parameters µ and γ. [sent-121, score-0.114]

48 According to (8), the rate of estimation 1 measured in the mean prediction loss n X(β − β ∗ ) 2 is of the order of s log(q)/n, which is known 2 as fast or parametric rate. [sent-127, score-0.08]

49 To the best of our knowledge, this is the ﬁrst work where such kind of fast rates are derived in the context of fused sparsity with unknown noise-level. [sent-129, score-0.627]

50 With some extra work, one can check that if, for instance, γ = 1 and |µ − 1| ≤ cn−1/2 for some constant c, then the estimator σ has also a risk of the order of sn−1/2 . [sent-130, score-0.174]

51 However, the price to pay for being adaptive with respect to the noise level is the presence of Mβ ∗ 1 in the bound on σ, which deteriorates the quality of estimation in the case of large signal-to-noise ratio. [sent-131, score-0.206]

52 1 requires the noise distribution to be Gaussian, the proposed algorithm remains valid in a far broader context and tight risk bounds can be obtained under more general conditions on the noise distribution. [sent-133, score-0.374]

53 More precisely, the cornerstone of the proof of risk bounds for the Dantzig selector [4, 3, 9] is that the true parameter β ∗ is a feasible solution. [sent-138, score-0.298]

54 Our proposal is then to specify another vector β that simultaneously satisﬁes the following three conditions: Mβ has the same sparsity pattern as Mβ ∗ , β is close to β ∗ and lies in the feasible set. [sent-140, score-0.116]

55 A last remark is about the restricted eigenvalue conditions. [sent-141, score-0.119]

56 They are somewhat cumbersome in this abstract setting, but simplify a lot when the concrete example of robust estimation is considered, cf. [sent-142, score-0.173]

57 Unfortunately, this condition fails for the matrices appearing in the problem of multiple change-point detection, which is an important particular instance of fused sparsity. [sent-145, score-0.454]

58 The extension of these kind of arguments to the case of unknown σ ∗ is an open problem we intend to tackle in the near future. [sent-147, score-0.128]

59 3 Application to robust estimation This methodology can be applied in the context of robust estimation, i. [sent-148, score-0.33]

60 The setting we are interested in is the one frequently encountered in computer vision [13, 25]: the dimensionality k of θ ∗ is small as compared to n but the presence of outliers causes the complete failure of the least squares estimator. [sent-155, score-0.248]

61 Thus, we have rewritten the problem of robust estimation in linear models as a problem of estimation in high dimension under the fused sparsity scenario. [sent-160, score-0.765]

62 Indeed, we have X ∈ Rn×(n+k) 5 and β ∗ ∈ Rn+k , and we are interested in ﬁnding an estimator β of β ∗ for which ω = [In 0n×k ]β contains as many zeros as possible. [sent-161, score-0.133]

63 This means that we expect that the number of outliers is signiﬁcantly smaller than the sample size. [sent-162, score-0.118]

64 We are thus in the setting of fused sparsity with M = [In 0n×k ]. [sent-163, score-0.512]

65 If the vector ω ∗ is s-sparse and the matrix n(In − Π) satisﬁes the condition RE(s, 1) with some κ > 0 then, with probability at least 1 − 5δ, it holds: 4 2γ log(n/δ) σ ∗ 2s log(1/δ) (σ + σ ∗ )s + , (11) κ2 n κ n 2(σ + σ ∗ ) 2s log(n/δ) 2 log(1/δ) (In − Π)(ω − ω ∗ ) 2 ≤ + σ∗ . [sent-173, score-0.102]

66 1, especially those concerning the tuning parameters and the rates of convergence, hold true for the risk bounds in Theorem 3. [sent-176, score-0.19]

67 Furthermore, the restricted eigenvalue condition in the latter theorem is much simpler and deserves a special attention. [sent-178, score-0.136]

68 While the left-hand side of this inequality tends to √ log E[|δ1 |] > 0, the right-hand side is upper-bounded by 2s maxi |δi |, which is on the order of 2s n n . [sent-207, score-0.087]

69 n √ log Therefore, if 2s n n is small, the condition RE(s, 1) is satisﬁed. [sent-208, score-0.145]

70 They represent respectively the accuracy in estimating the regression vector and the level of noise. [sent-310, score-0.105]

71 4 Experiments For the empirical evaluation we use a synthetic dataset with randomly drawn Gaussian design matrix X and the real-world dataset fountain-P113 , on which we apply our methodology for computing the fundamental matrices between consecutive images. [sent-314, score-0.115]

72 Once Y and X available, we computed three estimators of the parameters using the standard sparsity penalization (in order to be able to compare our approach to the others): the SFDS, the Lasso and the squareroot Lasso (SqRL). [sent-320, score-0.116]

73 Note that the latter is not really an estimator but rather an oracle since it exploits the knowledge of the true σ ∗ . [sent-322, score-0.143]

74 It consisted in computing least squares estimators after removing all the covariates corresponding to vanishing coefﬁcients of the estimator of β ∗ . [sent-325, score-0.128]

75 We stress however that the SFDS is designed for being applied in—and has theoretical guarantees for—the broader setting of fused sparsity. [sent-327, score-0.463]

76 2 Robust estimation of the fundamental matrix To provide a qualitative evaluation of the proposed methodology on real data, we applied the SRDS to the problem of fundamental matrix estimation in multiple-view geometry, which constitutes an 3 available at http://cvlab. [sent-329, score-0.358]

77 There is a clear separation between outliers and inliers. [sent-358, score-0.118]

78 Top right: the ﬁrst pair of images and the matches classiﬁed as wrong by SRDS. [sent-359, score-0.159]

79 In short, if we have two images I and I representing the same 3D scene, then there is a 3×3 matrix F, called fundamental matrix, such that a point x = (x, y) in I1 matches with the point x = (x , y ) in I only if [x; y; 1] F [x ; y ; 1] = 0. [sent-362, score-0.199]

80 Thus, each pair x ↔ x of matching points in images I and I yields a linear constraint on the eight remaining coefﬁcients of F. [sent-364, score-0.101]

81 Because of the quantiﬁcation and the presence of noise in images, these linear relations are satisﬁed up to some error. [sent-365, score-0.095]

82 Thus, estimation of F from a family of matching points {xi ↔ xi ; i = 1, . [sent-366, score-0.115]

83 Typically, matches are computed by comparing local descriptors (such as SIFT [16]) and, for images of reasonable resolution, hundreds of matching points are found. [sent-370, score-0.151]

84 The computation of the fundamental matrix would not be a problem in this context of large sample size / low dimension, if the matching algorithms were perfectly correct. [sent-371, score-0.15]

85 However, due to noise, repetitive structures and other factors, a non-negligible fraction of detected matches are wrong (outliers). [sent-372, score-0.155]

86 Elimination of these outliers and robust estimation of F are crucial steps for performing 3D reconstruction. [sent-373, score-0.291]

87 Here, we apply the SRDS to the problem of estimation of F for 10 pairs of consecutive images provided by the fountain dataset [21]: the 11 images are shown at the bottom of Fig. [sent-374, score-0.305]

88 000 point matches in most pairs of images among the 10 pairs we are considering. [sent-377, score-0.116]

89 The number of outliers and the estimated noise-level for each pair of images are reported in Table 2. [sent-379, score-0.184]

90 They are all indeed wrong correspondncies, even those which correspond to the windows (this is due to the repetitive structure of the window). [sent-382, score-0.105]

91 5 Conclusion and perspectives We have presented a new procedure, SFDS, for the problem of learning linear models with unknown noise level under the fused sparsity scenario. [sent-383, score-0.654]

92 We showed that this procedure is inspired by the penalized maximum likelihood but has the advantage of being computable by solving a secondorder cone program. [sent-384, score-0.269]

93 We established tight, nonasymptotic, theoretical guarantees for the SFDS with a special attention paid to robust estimation in linear models. [sent-385, score-0.209]

94 In the future, we intend to generalize the theoretical study of the performance of the SFDS to the case of non-Gaussian errors ξi , as well as to investigate its power in variable selection. [sent-387, score-0.081]

95 Templates for convex cone problems with applie cations to sparse signal recovery. [sent-390, score-0.154]

96 The Dantzig selector: statistical estimation when p is much larger than n. [sent-408, score-0.08]

97 L1 -penalized robust estimation for a class of inverse problems arising in multiview geometry. [sent-430, score-0.212]

98 Robust estimation for an inverse problem arising in multiview geometry. [sent-434, score-0.119]

99 Robust regression through the Huber’s criterion and adaptive lasso penalty. [sent-481, score-0.213]

100 On the conditions used to prove oracle results for the Lasso. [sent-576, score-0.08]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sfds', 0.466), ('fused', 0.396), ('dantzig', 0.208), ('selector', 0.187), ('lasso', 0.173), ('ave', 0.155), ('xmj', 0.124), ('outliers', 0.118), ('sparsity', 0.116), ('cone', 0.115), ('std', 0.113), ('equivariance', 0.11), ('estimator', 0.098), ('arnak', 0.093), ('dalalyan', 0.093), ('fountain', 0.093), ('sqrl', 0.093), ('srds', 0.093), ('robust', 0.093), ('penalized', 0.092), ('emmanuel', 0.09), ('log', 0.087), ('sedumi', 0.082), ('estimation', 0.08), ('tuning', 0.079), ('scaled', 0.077), ('risk', 0.076), ('socp', 0.071), ('rn', 0.07), ('images', 0.066), ('pl', 0.065), ('mj', 0.063), ('nonzero', 0.063), ('equivariant', 0.062), ('renaud', 0.062), ('repetitive', 0.062), ('noise', 0.061), ('xm', 0.058), ('re', 0.058), ('condition', 0.058), ('ds', 0.055), ('mq', 0.051), ('whatever', 0.051), ('indexes', 0.05), ('unknown', 0.05), ('matches', 0.05), ('im', 0.049), ('nn', 0.048), ('sara', 0.048), ('subject', 0.047), ('intend', 0.045), ('cand', 0.045), ('oracle', 0.045), ('comments', 0.044), ('matrix', 0.044), ('satis', 0.043), ('projector', 0.043), ('harchaoui', 0.043), ('tight', 0.043), ('wrong', 0.043), ('mp', 0.042), ('penalty', 0.041), ('remark', 0.041), ('eigenvalue', 0.04), ('regression', 0.04), ('multiview', 0.039), ('proportionally', 0.039), ('convex', 0.039), ('fundamental', 0.039), ('restricted', 0.038), ('rp', 0.038), ('termed', 0.037), ('theoretical', 0.036), ('tolerance', 0.036), ('conic', 0.036), ('matching', 0.035), ('bounds', 0.035), ('conditions', 0.035), ('interested', 0.035), ('dm', 0.035), ('alexandre', 0.035), ('inputs', 0.035), ('diagonal', 0.035), ('scale', 0.035), ('nicolas', 0.034), ('peter', 0.034), ('suitably', 0.034), ('presence', 0.034), ('estimating', 0.034), ('kind', 0.033), ('procedure', 0.032), ('context', 0.032), ('van', 0.032), ('minimization', 0.032), ('methodology', 0.032), ('level', 0.031), ('broader', 0.031), ('failure', 0.031), ('computable', 0.03), ('squares', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

2 0.16030319 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

Author: Emile Richard, Stephane Gaiffas, Nicolas Vayatis

Abstract: In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efﬁciently using proximal methods through a generalized forward-backward agorithm. 1

3 0.14244859 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

Abstract: Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression—Variational M-estimation—that uniﬁes a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods. 1

4 0.1082448 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefﬁcients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefﬁcients, while an empirical Bayesian alternative (Type II) ﬁrst marginalizes the coefﬁcients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefﬁcient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data ﬁt. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefﬁcient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefﬁcients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

5 0.10148942 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backﬁtting algorithms are derived and justiﬁed using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1

6 0.097844429 34 nips-2012-Active Learning of Multi-Index Function Models

7 0.097444095 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

8 0.088933319 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

9 0.088889778 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

10 0.088070013 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

11 0.08601445 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

12 0.083945014 319 nips-2012-Sparse Prediction with the $k$-Support Norm

13 0.079124011 254 nips-2012-On the Sample Complexity of Robust PCA

14 0.077322975 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

15 0.075871557 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

16 0.075801767 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

17 0.075746521 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

18 0.072010532 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

19 0.070360348 221 nips-2012-Multi-Stage Multi-Task Feature Learning

20 0.070084631 27 nips-2012-A quasi-Newton proximal splitting method

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.223), (1, 0.044), (2, 0.072), (3, -0.079), (4, 0.064), (5, 0.098), (6, -0.003), (7, 0.009), (8, -0.049), (9, -0.047), (10, 0.033), (11, 0.0), (12, -0.054), (13, -0.038), (14, 0.022), (15, 0.022), (16, 0.11), (17, -0.02), (18, -0.015), (19, -0.069), (20, 0.021), (21, 0.045), (22, -0.043), (23, 0.082), (24, -0.013), (25, -0.004), (26, 0.044), (27, 0.036), (28, 0.041), (29, -0.035), (30, -0.016), (31, -0.001), (32, -0.036), (33, 0.042), (34, 0.041), (35, 0.094), (36, 0.044), (37, -0.004), (38, 0.111), (39, -0.018), (40, 0.045), (41, 0.011), (42, -0.072), (43, -0.019), (44, -0.008), (45, -0.033), (46, -0.013), (47, -0.034), (48, 0.14), (49, -0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93473345 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

2 0.73742181 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is deﬁned on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1

3 0.70003456 254 nips-2012-On the Sample Complexity of Robust PCA

Author: Matthew Coudron, Gilad Lerman

Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1

4 0.69260097 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

5 0.68982577 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

6 0.6876902 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA

7 0.67916942 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization

8 0.64334452 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

9 0.63391304 319 nips-2012-Sparse Prediction with the $k$-Support Norm

10 0.62055957 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

11 0.61223191 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

12 0.60680264 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

13 0.59591997 221 nips-2012-Multi-Stage Multi-Task Feature Learning

14 0.59488785 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

15 0.5833205 217 nips-2012-Mixability in Statistical Learning

16 0.5805943 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

17 0.57845348 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

18 0.57726902 86 nips-2012-Convex Multi-view Subspace Learning

19 0.57637417 327 nips-2012-Structured Learning of Gaussian Graphical Models

20 0.5718329 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.036), (1, 0.035), (17, 0.029), (20, 0.113), (21, 0.024), (36, 0.012), (38, 0.159), (42, 0.026), (44, 0.012), (54, 0.03), (55, 0.063), (68, 0.014), (74, 0.062), (76, 0.18), (80, 0.073), (92, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95220804 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

Author: Felipe Trevizan, Manuela Veloso

Abstract: Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artiﬁcial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always ﬁnishes and is asymptotically optimal under sufﬁcient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 1070 states. 1

2 0.93325895 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Abstract: We apply salient feature detection and tracking in videos to simulate ﬁxations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classiﬁcation accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1

same-paper 3 0.9277361 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

4 0.92346478 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods signiﬁcantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1

5 0.89657396 188 nips-2012-Learning from Distributions via Support Measure Machines

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a ﬂexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

6 0.89358956 215 nips-2012-Minimizing Uncertainty in Pipelines

7 0.88984287 148 nips-2012-Hamming Distance Metric Learning

8 0.88874692 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

9 0.88780367 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

10 0.88649189 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

11 0.88571709 319 nips-2012-Sparse Prediction with the $k$-Support Norm

12 0.88534302 34 nips-2012-Active Learning of Multi-Index Function Models

13 0.88394779 69 nips-2012-Clustering Sparse Graphs

14 0.88350689 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

15 0.88348967 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

16 0.88324296 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

17 0.88286179 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

18 0.88244212 95 nips-2012-Density-Difference Estimation

19 0.882101 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

20 0.88206989 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging