nips nips2012 nips2012-16 knowledge-graph by maker-knowledge-mining

16 nips-2012-A Polynomial-time Form of Robust Regression

Source: pdf

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

Abstract: Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression—Variational M-estimation—that uniﬁes a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. [sent-3, score-0.896]

2 We present a general formulation for robust regression—Variational M-estimation—that uniﬁes a number of robust regression methods while allowing a tractable approximation strategy. [sent-4, score-0.615]

3 We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. [sent-5, score-0.522]

4 1 Introduction It is well known that outliers have a detrimental effect on standard regression estimators. [sent-7, score-0.327]

5 Thus the need for regression estimators that are both scalable and robust is increasing. [sent-10, score-0.494]

6 Although the ﬁeld of robust regression is well established, it has not considered computational complexity analysis to be one of its central concerns. [sent-11, score-0.359]

7 Surprisingly, there remain no standard regression formulations that guarantee both polynomial run-time with bounded response to even single outliers. [sent-13, score-0.642]

8 It is important to note that robustness and tractability can be achieved under restricted conditions. [sent-14, score-0.326]

9 For example, if the domain is bounded, then any estimator based on minimizing a convex and Lipschitzcontinuous loss achieves high breakdown [4]. [sent-15, score-0.793]

10 Such results have been extended to kernel-based regression under the analogous assumption of a bounded kernel [5, 6]. [sent-16, score-0.37]

11 Unfortunately, these results can no longer hold when the domain or kernel is unbounded: in such a case arbitrary leverage can occur [4, 7] and no (non-constant) convex loss, even Lipschitz-continuous, can ensure robustness against even a single outlier [3]. [sent-17, score-0.728]

12 Unfortunately, the inapplicability of convex losses in this situation means that computational tractability becomes a major challenge, and new computational strategies are required to achieve tractable robust estimators. [sent-19, score-0.667]

13 The main contribution of this paper is to develop a new robust regression strategy that can guarantee both polynomial run-time and bounded response to individual outliers, including leverage points. [sent-20, score-0.901]

14 The ﬁrst is a general formulation of adaptive M-estimation, Variational M-estimation, that uniﬁes a number of robust regression formulations, including convex and bounded M-estimators with certain subsetselection estimators such as Least Trimmed Loss [7]. [sent-22, score-0.85]

15 By incorporating Tikhonov regularization, these estimators can be extended to reproducing kernel Hilbert spaces (RKHSs). [sent-23, score-0.226]

16 The second development is a convex relaxation scheme that ensures bounded outlier inﬂuence on the ﬁnal estimator. [sent-24, score-0.636]

17 1 The overall estimation procedure is guaranteed to be tractable, robust to single outliers with unbounded leverage, and consistent under non-trivial conditions. [sent-25, score-0.506]

18 An experimental evaluation of the proposed estimator demonstrates effective performance compared to standard robust estimators. [sent-26, score-0.434]

19 The closest previous works are [8], which formulated variational representations of certain robust losses, and [9], which formulated a convex relaxation of bounded loss minimization. [sent-27, score-0.864]

20 Unfortunately, [8] did not offer a general characterization, while [9] did not prove their ﬁnal estimator was robust, nor was any form of consistency established. [sent-28, score-0.302]

21 The formulation we present in this paper generalizes [8] while the convex relaxation scheme we propose is simpler and tighter than [9]; we are thus able to establish non-trivial forms of both robustness and consistency while maintaining tractability. [sent-29, score-0.621]

22 Another notion of robustness is algorithmic stability under leave-one-out perturbation [13], which analyzes speciﬁc learning procedures rather than describing how a stable algorithm might be generally achieved. [sent-33, score-0.299]

23 If the noise distribution has a known density p(·), then a standard estimator is given by maximum likelihood ˆ θML ∈ 1 arg min n θ∈Θ n i=1 1 − log p(yi − Xi: θ) = arg min n θ∈Θ n i=1 − log p(ri ), (2) where ri = yi − Xi: θ is the ith residual. [sent-41, score-0.595]

24 Such a procedure is known as M -estimation in the robust statistics literature, and empirical risk minimization in the machine learning literature. [sent-43, score-0.4]

25 In particular we will use Tikhonov (“ridge”) regularization by adding a squared penalty ˆ θMR ∈ 1 arg min n 1T ρ(y − Xθ) + θ∈Θ λ 2 θ 2 2 for λ ≥ 0, (4) ˆ The signiﬁcance of Tikhonov regularization is that it ensures θMR = X T α for some α ∈ Rn [14]. [sent-45, score-0.257]

26 More generally, under Tikhonov regularization, the regression problem can be conveniently expressed in a reproducing kernel Hilbert space (RKHS). [sent-46, score-0.285]

27 2 where each xi is drawn from some unknown marginal probability measure Px , and yi are generated according to (5),2 the task is then to estimate the unknown deterministic function f ∗ ∈ H. [sent-57, score-0.231]

28 To do so we can express the estimator (4) more generally as n ˆ fMR ∈ arg min 1 ρ(yi − f (xi )) + λ f 2 . [sent-58, score-0.359]

29 (6) f ∈H n i=1 2 H n ˆ By the representer theorem [14], the solution to (6) can be expressed by fMR (x) = i=1 αi κ(xi , x) for some α ∈ Rn , and therefore (6) can be recovered by solving the ﬁnite dimensional problem 1 ˆ (7) αMR ∈ arg min n 1T ρ(y − Kα) + λ αT Kα such that Kij = κ(xi , xj ). [sent-59, score-0.259]

30 2 α Our interest is understanding the tractability, robustness and consistency aspects of such estimators. [sent-60, score-0.34]

31 Consistency: Much is known about the consistency properties of estimators expressed as regularized empirical risk minimizers. [sent-61, score-0.455]

32 3 The regularized M -estimator in RKHSs (6), is loss consistent under some general assumptions on the kernel, loss and training distribution. [sent-63, score-0.41]

33 For bounded kernel and bounded Lipschitz losses, one can similarly prove the loss consistency of the regularized M -estimator (6) (in RKHS). [sent-65, score-0.753]

34 Generally speaking, any estimator that can be expressed as a regularized empirical loss minimization is consistent under “reasonable” conditions. [sent-68, score-0.582]

35 That is, one can consider regularized loss minimization to be a (generally) sound principle for formulating regression estimators, at least from the perspective of consistency. [sent-69, score-0.499]

36 However, this is no longer the case when we consider robustness and tractability; here sharp distinctions begin to arise within this class of estimators. [sent-70, score-0.22]

37 Robustness: Although robustness is an intuitive notion, it has not been given a unique technical deﬁnition in the literature. [sent-71, score-0.22]

38 Although these analyses can be useful, we will focus on ﬁnite sample notions of robustness since these are most related to concerns of computational tractability. [sent-76, score-0.22]

39 In particular, we focus on the following deﬁnition related to the ﬁnite sample breakdown point [18, 19]. [sent-77, score-0.255]

40 Assuming the parameter set Θ is metrizable, an estimator has bounded response if for any ﬁnite data sample its output remains in a bounded interior subset of the closed parameter set Θ (or respectively H), no matter how a single observation pair is perturbed. [sent-79, score-0.682]

41 This is a much weaker deﬁnition than having a non-zero breakdown point: a breakdown of requires that bounded response be guaranteed when any fraction of the data is perturbed arbitrarily. [sent-80, score-0.863]

42 Bounded response is obviously a far more modest requirement. [sent-81, score-0.245]

43 However, importantly, the deﬁnition of bounded response allows the possibility of arbitrary leverage; that is, no bound is imposed on the magnitude of a perturbed input (i. [sent-82, score-0.353]

44 Surprisingly, we ﬁnd that even such a weak robustness property is difﬁcult to achieve while retaining computational tractability. [sent-85, score-0.267]

45 Computational Dilemma: The goals of robustness and computational tractability raise a dilemma: it is easy to achieve robustness (i. [sent-86, score-0.546]

46 5 Since a Tikhonov regularizer is automatically selfconcordant, the minimization problems outlined above can all be solved in polynomial time with Newton-type algorithms, provided ρ(r), ρ (r), and ρ (r) can all be evaluated in polynomial time for a self-concordant ρ [22, Ch. [sent-102, score-0.227]

47 Standard loss functions, such as squared error or Huber’s loss satisfy these conditions, hence the corresponding estimators are polynomial-time. [sent-104, score-0.493]

48 Unfortunately, loss minimization with a (non-constant) convex loss yields unbounded response to even a single outlier [3, Ch. [sent-105, score-1.094]

49 Empirical risk minimization based on a (non-constant) convex loss cannot have bounded response if the domain (or kernel) is unbounded, even under Tikhonov regularization. [sent-109, score-0.83]

50 ) By contrast, consider the case of a (non-constant) bounded loss function. [sent-111, score-0.358]

51 6 Bounded loss functions are a common choice in robust regression because they not only ensure bounded response, trivially, they can also ensure a high breakdown point of (n − p)/(2n) [3, Ch. [sent-112, score-0.972]

52 Unfortunately, estimators based on bounded losses are inherently intractable. [sent-114, score-0.442]

53 ) These difﬁculties with empirical risk minimization have led the ﬁeld of robust statistics to develop a variety of alternative estimators [4, Ch. [sent-118, score-0.535]

54 These estimators are known to have high breakdown [7],7 and obviously demonstrate bounded response to single outliers. [sent-123, score-0.781]

55 The key construction is a variational representation of M-estimation that can express a number of standard robust (and non-robust) methods in a common framework. [sent-126, score-0.265]

56 In particular, consider the following adaptive form of loss function ρ(r) = min η (r) + ψ(η). [sent-127, score-0.252]

57 (9) 0≤η≤1 where r is a residual value, is a closed convex base loss, η is an adaptive weight on the base loss, and ψ is a convex auxiliary function. [sent-128, score-0.514]

58 The weight can choose to ignore the base loss if (r) is large, but this is balanced against a prior penalty ψ(η). [sent-129, score-0.235]

59 Different choices of base loss and auxiliary function will yield different results, and one can represent a wide variety of loss functions ρ in this way [8]. [sent-130, score-0.462]

60 For example, any convex loss ρ can be trivially represented in the form (9) by setting = ρ, and ψ(η) = δ{1} (η). [sent-131, score-0.356]

61 A bounded function obviously cannot be convex over an unbounded domain unless it is constant. [sent-138, score-0.54]

62 7 When n approaches n/2 the breakdown of (8) approaches 1/2 [7]. [sent-139, score-0.255]

63 This particular form of regularization has two advantages: (i) it is a smooth function of η on 0 ≤ η ≤ 1 (since η 1 = 1T η in this case), and (ii) it enables a tight convex approximation strategy, as we will see below. [sent-145, score-0.287]

64 Note that other forms of robust regression can be expressed in a similar framework. [sent-146, score-0.406]

65 Least Trimmed Loss (8) can be expressed in the form (9) provided only that we add a shared constraint over η: ˆ θLT L ∈ arg min min θ∈Θ 0≤η≤1:1T η=n η T (r) + ψ(η) (17) where ψ(ηi ) = 1 − ηi and n < n speciﬁes the number of terms to consider in the sum of losses. [sent-148, score-0.263]

66 These formulations are all convex in the parameters given the auxiliary weights, and vice versa. [sent-152, score-0.32]

67 4 Computationally Efﬁcient Approximation We present a general approximation strategy for the variational regression estimators above that can guarantee polynomial run-time while ensuring certain robustness and consistency properties. [sent-157, score-0.825]

68 Although the problem is obviously convex in α given η, and vice versa, it is not jointly convex (recall the assumption that and ψ are both convex functions). [sent-164, score-0.666]

69 min min η T (y − Kα) + 1T ψ(η) + λ η 1 αT Kα (18) 2 0≤η≤1 α = min sup 1T ψ(η) − η T ( ∗ (ν) − ∆(y)ν) − 0≤η≤1 ν 1 T 2λ ν K ◦ (η η −1 T 1 η ) ν, (19) where the function evaluations are componentwise. [sent-171, score-0.27]

70 (25) = min min sup 1T ψ(η) − η T ( (ν) − ∆(y)ν) − 2λ ν T (K ◦ N ) ν 0≤η≤1 N ∈Nη ≥ min ν min sup 1T ψ(η) − η T ( ∗ (ν) − ∆(y)ν) − 0≤η≤1 M ∈Mη ν 1 T 2λ ν (K ◦ M ) ν. [sent-179, score-0.394]

71 Since a pointwise maximum of convex functions is convex, the problem is convex in (η, M ) [22, Ch. [sent-184, score-0.354]

72 Reoptimization: Finally, the rounded solution can be locally improved by alternating between η ˜ and α updates in (24) (or using any other local optimization method), yielding the ﬁnal estimate α. [sent-193, score-0.228]

73 5 Properties Although a tight a priori bound on the size of the optimality gap is difﬁcult to achieve, a rigorous bound on the optimality gap can be recovered post hoc once the re-optimized estimator is computed. [sent-194, score-0.487]

74 Tractability: Under mild assumptions on and ψ, computation of the approximate estimator (solving the relaxed problem, rounding, then re-optimizing) admits a polynomial-time solution; see Appendix E in the supplement. [sent-198, score-0.25]

75 Robustness: Despite the approximation, the relaxation remains sufﬁciently tight to preserve some of the robustness properties of bounded loss minimization. [sent-201, score-0.695]

76 To establish the robustness (and consistency) properties, we will need to make use of a speciﬁc technical deﬁnition of outliers and inliers. [sent-202, score-0.44]

77 For an L-Lipschitz loss , an outlier is a point (xi , yi ) that satisﬁes (yi ) > L2 Kii /(2λ) − ψ (0), while an inlier satisﬁes (yi ) + L2 Kii /(2λ) < −ψ (1). [sent-204, score-0.554]

78 Assume the loss ρ is bounded and has a variational representation (9) such that is Lipschitz-continuous and ψ is bounded. [sent-206, score-0.411]

79 Under the following conditions, the rounded (re-optimized) estimator maintains bounded response: (i) If either y1 remains bounded, or κ(x1 , x1 ) remains bounded. [sent-208, score-0.498]

80 01) Table 1: RMSE on clean test data for an artiﬁcial data set with 5 features and 100 training points, with outlier probability p, and 10000 test data points. [sent-273, score-0.267]

81 Note that the latter condition causes any convex loss to demonstrate unbounded response (see proof of Theorem 5 in Appendix B). [sent-276, score-0.612]

82 Therefore, the approximate estimator is strictly more robust (in terms of bounded response) than regularized empirical risk minimization with a convex loss . [sent-277, score-1.169]

83 Consistency: Finally, we can establish consistency of the approximate estimator in a limited albeit non-trivial setting, although we have yet to establish it generally. [sent-278, score-0.414]

84 Then the estimate θ produced by the rounded (re-optimized) method is loss consistent. [sent-282, score-0.316]

85 We then seeded the data set with outliers by randomly re-sampling each 2 yi and Xi: from N (0, 108 ) and N (0, 104 ) respectively, governed by an outlier probability p. [sent-289, score-0.475]

86 We compared to standard L2 and L1 loss minimization, as well as minimizing the Huber minimax loss (Huber) [4]. [sent-292, score-0.358]

87 We also considered standard methods from the robust statistics literature, including the least trimmed square method (LTS) [7, 24], and bounded loss minimization based on the Geman-McClure loss (GemMc) [8]. [sent-293, score-0.935]

88 Finally we also compared to the alternating minimization strategies outlined at the end of Section 3 (AltBndL2 and AltBndL1 for L2 and L1 losses respectively), and implemented the strategy described in [9]. [sent-294, score-0.308]

89 However, when outliers start to appear, the result of least squares is signiﬁcantly skewed, while the results of classic robust statistics methods, Huber, L1 and LTS, indeed turn out to be more robust than the least squares, but nevertheless are still affected signiﬁcantly. [sent-303, score-0.604]

90 Both implementations of the new method performs comparably to the the non-convex Geman-McClure loss while substantially improving the alternating strategy under the L1 loss. [sent-304, score-0.272]

91 It is clear that all methods based on convex losses (L2, L1, Huber) suffer signiﬁcantly from the added outliers. [sent-399, score-0.305]

92 7 Conclusion We have developed a new robust regression method that can guarantee a form of robustness (bounded response) while ensuring tractability (polynomial run-time). [sent-404, score-0.726]

93 Nevertheless, an empirical evaluation reveals that the method meets or surpasses the generalization ability of state-of-the-art robust regression methods in experimental studies. [sent-406, score-0.394]

94 We are investigating whether the proposed estimator achieves stronger robustness properties, such as high breakdown or bounded inﬂuence. [sent-408, score-0.836]

95 It would be interesting to extend the approach to also estimate scale in a robust and tractable manner. [sent-409, score-0.256]

96 Consistency and robustness of kernel-based regression in convex risk minimization. [sent-444, score-0.61]

97 On consistency and robustness properties of support vector machines for heavy-tailed distributions. [sent-450, score-0.34]

98 On the uniﬁcation of line processes, outlier rejection, and robust statistics with applications in early vision. [sent-460, score-0.428]

99 Relaxed clipping: A global training method for robust regression and classiﬁcation. [sent-468, score-0.359]

100 Learning theory: Stability is sufﬁcient for generalization and necessary and sufﬁcient for consistency of empirical risk minimization. [sent-494, score-0.221]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('breakdown', 0.255), ('lts', 0.243), ('robustness', 0.22), ('outlier', 0.216), ('robust', 0.212), ('estimator', 0.182), ('outliers', 0.18), ('loss', 0.179), ('bounded', 0.179), ('convex', 0.177), ('huber', 0.152), ('regression', 0.147), ('tikhonov', 0.142), ('response', 0.142), ('rounded', 0.137), ('estimators', 0.135), ('losses', 0.128), ('gemmc', 0.121), ('consistency', 0.12), ('unbounded', 0.114), ('appendix', 0.111), ('tractability', 0.106), ('trimmed', 0.099), ('fmr', 0.091), ('minimization', 0.087), ('rkhs', 0.086), ('inlier', 0.08), ('yi', 0.079), ('rousseeuw', 0.074), ('min', 0.073), ('minima', 0.072), ('leverage', 0.071), ('arg', 0.07), ('obviously', 0.07), ('polynomial', 0.07), ('gap', 0.07), ('relaxed', 0.068), ('risk', 0.066), ('rmse', 0.066), ('rounding', 0.066), ('unfortunately', 0.064), ('relaxation', 0.064), ('formulations', 0.063), ('pumadyn', 0.061), ('dilemma', 0.059), ('gaps', 0.059), ('mr', 0.057), ('regularization', 0.057), ('base', 0.056), ('alternating', 0.054), ('caramanis', 0.054), ('christmann', 0.054), ('inliers', 0.054), ('rkhss', 0.054), ('variational', 0.053), ('tight', 0.053), ('regularized', 0.052), ('clean', 0.051), ('sup', 0.051), ('hopefully', 0.049), ('kii', 0.049), ('auxiliary', 0.048), ('ri', 0.048), ('expressed', 0.047), ('reproducing', 0.047), ('weak', 0.047), ('granted', 0.046), ('residuals', 0.046), ('perturbation', 0.045), ('abalone', 0.044), ('kernel', 0.044), ('xi', 0.044), ('tractable', 0.044), ('componentwise', 0.042), ('restarts', 0.042), ('guarantee', 0.041), ('demonstrates', 0.04), ('establish', 0.04), ('optimality', 0.04), ('optimizations', 0.039), ('trapped', 0.039), ('strategy', 0.039), ('deviations', 0.039), ('unknown', 0.038), ('wiley', 0.038), ('conducted', 0.038), ('solution', 0.037), ('rank', 0.036), ('px', 0.035), ('empirical', 0.035), ('formulating', 0.034), ('generally', 0.034), ('xu', 0.034), ('modest', 0.033), ('jointly', 0.033), ('deterministic', 0.032), ('vice', 0.032), ('perturbed', 0.032), ('although', 0.032), ('recovered', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

2 0.14899227 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suﬀer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for suﬃciently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed oﬄine. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suﬀer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

3 0.14677058 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

Author: Xianghang Liu, James Petterson, Tibério S. Caetano

Abstract: We present a new formulation for binary classiﬁcation. Instead of relying on convex losses and regularizers such as in SVMs, logistic regression and boosting, or instead non-convex but continuous formulations such as those encountered in neural networks and deep belief networks, our framework entails a non-convex but discrete formulation, where estimation amounts to ﬁnding a MAP conﬁguration in a graphical model whose potential functions are low-dimensional discrete surrogates for the misclassiﬁcation loss. We argue that such a discrete formulation can naturally account for a number of issues that are typically encountered in either the convex or the continuous non-convex approaches, or both. By reducing the learning problem to a MAP inference problem, we can immediately translate the guarantees available for many inference settings to the learning problem itself. We empirically demonstrate in a number of experiments that this approach is promising in dealing with issues such as severe label noise, while still having global optimality guarantees. Due to the discrete nature of the formulation, it also allows for direct regularization through cardinality-based penalties, such as the 0 pseudo-norm, thus providing the ability to perform feature selection and trade-oﬀ interpretability and predictability in a principled manner. We also outline a number of open problems arising from the formulation. 1

4 0.14244859 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish ﬁnite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1

5 0.13804917 254 nips-2012-On the Sample Complexity of Robust PCA

Author: Matthew Coudron, Gilad Lerman

Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1

6 0.13492443 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

7 0.13471144 227 nips-2012-Multiclass Learning with Simplex Coding

8 0.12236422 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

9 0.11350474 330 nips-2012-Supervised Learning with Similarity Functions

10 0.1133765 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

11 0.10704451 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

12 0.1018545 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

13 0.10085572 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

14 0.097688302 271 nips-2012-Pointwise Tracking the Optimal Regression Function

15 0.096456505 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

16 0.095511489 280 nips-2012-Proper losses for learning from partial labels

17 0.095324017 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

18 0.093262926 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

19 0.090407021 324 nips-2012-Stochastic Gradient Descent with Only One Projection

20 0.086525261 319 nips-2012-Sparse Prediction with the $k$-Support Norm

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.256), (1, 0.038), (2, 0.136), (3, -0.074), (4, 0.111), (5, 0.074), (6, -0.017), (7, 0.131), (8, -0.034), (9, -0.073), (10, 0.025), (11, 0.015), (12, -0.016), (13, -0.031), (14, 0.072), (15, -0.065), (16, 0.062), (17, -0.043), (18, -0.022), (19, -0.001), (20, -0.011), (21, 0.031), (22, 0.006), (23, 0.071), (24, -0.062), (25, 0.002), (26, 0.037), (27, 0.023), (28, -0.024), (29, 0.039), (30, 0.017), (31, -0.104), (32, -0.007), (33, 0.023), (34, 0.027), (35, 0.106), (36, 0.02), (37, -0.028), (38, 0.035), (39, -0.038), (40, 0.035), (41, -0.069), (42, 0.004), (43, -0.107), (44, -0.006), (45, -0.047), (46, -0.012), (47, 0.011), (48, -0.001), (49, -0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97577286 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

2 0.78694081 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

3 0.76607174 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization

Author: Yuchen Zhang, Martin J. Wainwright, John C. Duchi

Abstract: We study two communication-efﬁcient algorithms for distributed statistical optimization on large-scale data. The ﬁrst algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as √ O(N −1 + (N/m)−2 ). Whenever m ≤ N , this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as O(N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on largescale problems from the internet search domain. In particular, we show that our methods efﬁciently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which consists of N ≈ 2.4 × 108 samples and d ≥ 700, 000 dimensions. 1

4 0.75239241 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Author: Aharon Birnbaum, Shai S. Shwartz

Abstract: Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassiﬁcation error rate of at most (1 + α) L∗ + ϵ, where L∗ is the γ γ optimal γ-margin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hinge-loss. For α = 0, Shalev-Shwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in ˜ time exp(O(1/γ)). An immediate question, which this paper tackles, is what is achievable if α ∈ (0, 1/γ). We derive positive results interpolating between the polynomial time for α = 1/γ and the exponential time for α = 0. In particular, we show that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time. Our results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. 1

5 0.73537672 217 nips-2012-Mixability in Statistical Learning

Author: Tim V. Erven, Peter Grünwald, Mark D. Reid, Robert C. Williamson

Abstract: Statistical learning and sequential prediction are two different but related formalisms to study the quality of predictions. Mapping out their relations and transferring ideas is an active area of investigation. We provide another piece of the puzzle by showing that an important concept in sequential prediction, the mixability of a loss, has a natural counterpart in the statistical setting, which we call stochastic mixability. Just as ordinary mixability characterizes fast rates for the worst-case regret in sequential prediction, stochastic mixability characterizes fast rates in statistical learning. We show that, in the special case of log-loss, stochastic mixability reduces to a well-known (but usually unnamed) martingale condition, which is used in existing convergence theorems for minimum description length and Bayesian inference. In the case of 0/1-loss, it reduces to the margin condition of Mammen and Tsybakov, and in the case that the model under consideration contains all possible predictors, it is equivalent to ordinary mixability. 1

6 0.71104252 254 nips-2012-On the Sample Complexity of Robust PCA

7 0.70842093 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

8 0.70789897 271 nips-2012-Pointwise Tracking the Optimal Regression Function

9 0.70616353 161 nips-2012-Interpreting prediction markets: a stochastic approach

10 0.70231837 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

11 0.69280428 221 nips-2012-Multi-Stage Multi-Task Feature Learning

12 0.67913401 95 nips-2012-Density-Difference Estimation

13 0.67762101 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

14 0.67502844 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

15 0.67351723 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

16 0.66808814 227 nips-2012-Multiclass Learning with Simplex Coding

17 0.66109705 361 nips-2012-Volume Regularization for Binary Classification

18 0.66060174 319 nips-2012-Sparse Prediction with the $k$-Support Norm

19 0.6597839 86 nips-2012-Convex Multi-view Subspace Learning

20 0.65886641 34 nips-2012-Active Learning of Multi-Index Function Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.039), (21, 0.024), (36, 0.01), (38, 0.192), (42, 0.044), (53, 0.011), (54, 0.018), (55, 0.037), (74, 0.057), (76, 0.158), (79, 0.167), (80, 0.127), (92, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9824391 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

Author: Bonnie Kirkpatrick, Alexandre Bouchard-côté

Abstract: Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of < 100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage analysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difﬁcult to interpret and nontrivial to extend to consider interactions among sites. These developments and difﬁculties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and computationally efﬁcient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction. 1

2 0.94959956 128 nips-2012-Fast Resampling Weighted v-Statistics

Author: Chunxiao Zhou, Jiseong Park, Yun Fu

Abstract: In this paper, a novel and computationally fast algorithm for computing weighted v-statistics in resampling both univariate and multivariate data is proposed. To avoid any real resampling, we have linked this problem with ﬁnite group action and converted it into a problem of orbit enumeration. For further computational cost reduction, an efﬁcient method is developed to list all orbits by their symmetry orders and calculate all index function orbit sums and data function orbit sums recursively. The computational complexity analysis shows reduction in the computational cost from n! or nn level to low-order polynomial level. 1

same-paper 3 0.90434515 16 nips-2012-A Polynomial-time Form of Robust Regression

Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu

4 0.85908812 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of inﬁnite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

5 0.85674506 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

Author: Emile Richard, Stephane Gaiffas, Nicolas Vayatis

Abstract: In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efﬁciently using proximal methods through a generalized forward-backward agorithm. 1

6 0.85629505 227 nips-2012-Multiclass Learning with Simplex Coding

7 0.8552317 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

8 0.85498923 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

9 0.85454959 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

10 0.85437667 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

11 0.85416019 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

12 0.85406744 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

13 0.8535521 179 nips-2012-Learning Manifolds with K-Means and K-Flats

14 0.85354787 358 nips-2012-Value Pursuit Iteration

15 0.85290855 187 nips-2012-Learning curves for multi-task Gaussian process regression

16 0.85245973 292 nips-2012-Regularized Off-Policy TD-Learning

17 0.85218608 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

18 0.85156155 206 nips-2012-Majorization for CRFs and Latent Likelihoods

19 0.85139209 330 nips-2012-Supervised Learning with Similarity Functions

20 0.85131705 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model