nips nips2009 nips2009-173 knowledge-graph by maker-knowledge-mining

173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem


Source: pdf

Author: Han Liu, Xi Chen

Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Nonparametric Greedy Algorithms for the Sparse Learning Problem Han Liu and Xi Chen School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract This paper studies the forward greedy strategy in sparse nonparametric regression. [sent-1, score-0.389]

2 For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. [sent-2, score-0.693]

3 Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. [sent-3, score-0.304]

4 We also provide some theoretical justifications of specific versions of the additive forward regression. [sent-5, score-0.297]

5 At present, there are two major approaches to fit sparse linear models: convex regularization and greedy pursuit. [sent-7, score-0.25]

6 The convex regularization approach regularizes the model by adding a sparsity constraint, leading to methods like LASSO [19, 7] or the Dantzig selector [6]. [sent-8, score-0.169]

7 The greedy pursuit approach regularizes the model by iteratively selecting the current optimal approximation according to some criteria, leading to methods like the matching pursuit [14] or orthogonal matching pursuit (OMP) [20]. [sent-9, score-0.529]

8 Substantial progress has been made recently on applying the convex regularization idea to fit sparse additive models. [sent-10, score-0.297]

9 For splines, Lin and Zhang [12] propose a method called COSSO, which uses the sum of reproducing kernel Hilbert space norms as a sparsity inducing penalty, and can simultaneously conduct estimation and variable selection; Ravikumar et al. [sent-11, score-0.198]

10 All these methods can be viewed as different nonparametric variants of LASSO. [sent-15, score-0.081]

11 They have similar drawbacks: (i) it is hard to extend them to handle general multivariate regression where the mean functions are no longer additive; (ii) due to the large bias induced by the regularization penalty, the model estimation is suboptimal. [sent-16, score-0.211]

12 In contrast to the convex regularization methods, the greedy pursuit approaches do not suffer from such problems. [sent-18, score-0.288]

13 Instead of trying to formulate the whole learning task to be a global convex optimization, the greedy pursuit approaches adopt iterative algorithms with a local view. [sent-19, score-0.276]

14 Thus they naturally extend to the general multivariate regression and do not induce large estimation bias, which makes them especially suitable for high dimensional nonparametric inference. [sent-21, score-0.279]

15 However, the greedy pursuit approaches do not attract as 1 much attention as the convex regularization approaches in the nonparametric literature. [sent-22, score-0.369]

16 For additive models, the only work we know of are the sparse boosting [4] and multivariate adaptive regression splines (MARS) [9]. [sent-23, score-0.444]

17 These methods mainly target on additive models or lower-order functional ANOVA models, but without much theoretical analysis. [sent-24, score-0.173]

18 For general multivariate regression, the only available method we are aware of is rodeo [11]. [sent-25, score-0.107]

19 However, rodeo requires the total number of variables to be no larger than a double-logarithmic of the data sample size, and does not explicitly identify relevant variables. [sent-26, score-0.077]

20 In this paper, we propose two new greedy algorithms for sparse nonparametric learning in high dimensions. [sent-27, score-0.265]

21 Both of them can simultaneously conduct estimation and variable selection in high dimensions. [sent-29, score-0.137]

22 (iii) We report thorough numerical results on both simulated and real-world datasets to demonstrate the superior performance of these two methods over the state-of-the-art competitors, including LASSO, SpAM, and an adaptive parametric forward-backward algorithm called Foba [22]. [sent-31, score-0.07]

23 Assuming n data points (X i , Y i ) i=1 are observed from a high dimensional regression model Y i = m(X i ) + i , i ∼ N (0, σ 2 ) i = 1, . [sent-37, score-0.106]

24 In this case, m decomposes into the sum of r univariate functions {mj }j∈S : (Additive) m(x) = α + j∈S mj (xj ), (3) where each component function mj is assumed to lie in a second order Sobolev ball with finite radius so that each element in the space is smooth enough. [sent-60, score-0.42]

25 Given the models in (2) or (3), we have two tasks: function estimation and variable selection. [sent-65, score-0.063]

26 In general, if the true index set for the relevant variables is known, the backfitting algorithm can be directly 2 applied to estimate m [10]. [sent-71, score-0.08]

27 In particular, we denote the estimates on the jth variable Xj to be 1 n mj ≡ (mj (Xj ), . [sent-73, score-0.223]

28 Then mj can be estimated by regressing the partial residual vector Rj = Y − α − k=j mk on the variable Xj . [sent-77, score-0.263]

29 This can be calculated by mj = Sj Rj , where Sj : Rn → Rn is a smoothing matrix, which only depends on X 1 , . [sent-78, score-0.253]

30 Once mj is updated, the algorithm holds it fixed and repeats this process by cycling through each variable until convergence. [sent-82, score-0.223]

31 Under mild conditions on the smoothing matrices S1 , . [sent-83, score-0.058]

32 , Sp , the backfitting algorithm is a first order algorithm that guarantees to converge [5] and achieves the minimax rate of convergence as if only estimating a univariate function. [sent-86, score-0.073]

33 However, for sparse learning problems, since the true index set is unknown, the backfitting algorithm no longer works due to the uncontrolled estimation variance. [sent-87, score-0.125]

34 By extending the idea of the orthogonal matching pursuit to sparse additive models, we design a forward greedy algorithm called the additive forward regression (AFR), which only involves a few variables in each iteration. [sent-88, score-1.059]

35 The stopping criterion is controlled by a predefined parameter η which is equivalent to the regularization tuning parameter in convex regularization methods. [sent-97, score-0.107]

36 Moreover, the smoothing matrix Sj can be fairly general, e. [sent-100, score-0.058]

37 univariate local linear smoothers as described below, kernel smoothers or spline smoothers [21], etc. [sent-102, score-0.632]

38 To estimate the general multivariate mean function m(x), one of the most popular methods is the local linear regression: given an evaluation point x = (x1 , . [sent-107, score-0.107]

39 , 0)T is the first canonical vector in Rp+1 and p  Wx = diag   j=1 p 1 Khj (Xj − xj ), . [sent-124, score-0.095]

40 n T (X − x) Here, Sx is the local linear smoothing matrix. [sent-134, score-0.087]

41 Note that if we constrain βx = 0, then the local linear estimate reduces to the kernel estimate. [sent-135, score-0.085]

42 To handle the large p case, we again extend the idea of the orthogonal matching pursuit to this setting. [sent-137, score-0.155]

43 Given these definitions, the generalized forward regression (GFR) algorithm is described in Figure 2. [sent-142, score-0.202]

44 Similar to AFR, GFR also uses an active set A to index the variables included in the model. [sent-143, score-0.09]

45 The GFR algorithm using the multivariate local linear smoother can be computationally heavy for very high dimensional problems. [sent-145, score-0.193]

46 The only reason we use the local linear smoother as an illustrative example in this paper is due to its popularity and potential advantage on correcting the boundary bias. [sent-150, score-0.108]

47 4 5 Theoretical Properties In this section, we provide the theoretical properties of the additive forward regression estimates using the spline smoother. [sent-151, score-0.463]

48 Due to the asymptotic equivalence of the spline smoother and the local linear smoother [18], we deduce that these results should also hold for the local linear smoother. [sent-152, score-0.304]

49 Our main result in Theorem 1 says when using the spline smoother with certain truncation rate to implement AFR algorithm, the resulting estimator is consistent with a certain rate. [sent-153, score-0.186]

50 When the underlying true component functions do not go to zeroes too fast, we also achieve variable selection consistency. [sent-154, score-0.074]

51 , p}, we assume mj lies in a second-order Sobolev ball with finite radius, and p m = α + j=1 mj . [sent-162, score-0.39]

52 For the additive forward regression algorithm using the spline smoother with 1/2 a truncation rate at n1/4 , after (n/log n) m−m steps, we obtain that 2 = OP Furthermore, if we also assume minj∈S mj = Ω log n n log n n . [sent-163, score-0.756]

53 The rate for m − m 2 obtained from Theorem 1 is only O(n−1/2 ), which is slower than the minimax rate O(n−4/5 ). [sent-166, score-0.062]

54 This is mainly an artifact of our analysis instead of a drawback of the additive forward regression algorithm. [sent-167, score-0.375]

55 Sketch of Proof: We first describe an algorithm called group orthogonal greedy algorithm (GOGA), which solves a noiseless function approximation problem in a direct-sum Hilbert space. [sent-170, score-0.179]

56 GOGA is a group extension of the orthogonal greedy algorithm (OGA) in [3]. [sent-172, score-0.158]

57 m(k) can then be calculated by projecting m onto the additive function space generated by A(k) = Dj (1) + · · · + Dj (k) : m(k) = arg min m − g 2. [sent-191, score-0.173]

58 g∈span(A(k) ) AFR using regression splines is exactly GOGA when there is no noise. [sent-192, score-0.131]

59 The projection of the current residual vector onto each dictionary Dj is replaced by the corresponding nonparametric smoothers. [sent-194, score-0.121]

60 The desired results of the theorem follow from a simple argument on bounding the random random covering number of spline spaces. [sent-196, score-0.088]

61 The main conclusion is that, in many cases, their performance on both function estimation and variable selection can clearly outperform those of LASSO, Foba, and SpAM. [sent-198, score-0.109]

62 For all the reported experiments, we use local linear smoothers to implement AFR and GFR. [sent-199, score-0.179]

63 The results for other smoothers, such as smoothing splines, are similar. [sent-200, score-0.058]

64 Note that different bandwidth parameters will have big effects on the performances of local linear smoothers. [sent-201, score-0.1]

65 Our experiments simply use the plug-in bandwidths according to [8] and set the bandwidth for each variable to be the same. [sent-202, score-0.066]

66 For an estimate m, the estimation performance for the synthetic data is measured by the mean square 2 n 1 error (MSE), which is defined as MSE(m) = n i=1 m(X i ) − m(X i ) . [sent-206, score-0.093]

67 We assume the true regression functions have r = 4 relevant variables: Y = m(X) + = m(X1 , . [sent-221, score-0.078]

68 (6) To evaluate the variable selection performance of different methods, we generate 50 designs and 50 trials for each design. [sent-225, score-0.074]

69 For each trial, we run the greedy forward algorithm r steps. [sent-226, score-0.25]

70 If all the relevant variables are included in, the variable selection task for this trial is said to be successful. [sent-227, score-0.132]

71 We report the mean and standard deviation of the success rate in variable selection for various correlation between covariates by varying the values of t. [sent-228, score-0.115]

72 And Table 1 shows the rate of success for variable selection of these models with different correlations controlled by t. [sent-240, score-0.093]

73 From Figure 3, we see that AFR and GFR methods provide very good estimates for the underlying true regression functions as compared to others. [sent-241, score-0.078]

74 This is because they are convex regularization based approaches: to obtain a very sparse model, they induce very large estimation bias. [sent-243, score-0.159]

75 On the other hand, the greedy pursuit based methods like Foba, AFR and GFR do not suffer from such a problem. [sent-244, score-0.222]

76 For the nonlinear true regression 6 Model 1 Model 2 GFR AFR SpAM Foba LASSO GFR AFR SpAM Foba LASSO 13 12 11 MSE 4 3 0. [sent-246, score-0.078]

77 5 5 0 1 2 3 4 5 6 7 8 9 10 4 1 2 3 sparsity 4 5 6 7 8 9 10 0 sparsity 1 2 3 4 5 6 7 8 9 10 0. [sent-256, score-0.102]

78 2 1 2 sparsity 3 4 5 6 7 8 9 10 sparsity Figure 3: Performance of the different algorithms on synthetic data: MSE versus sparsity level function, AFR, GFR and SpAM outperform LASSO and Foba. [sent-257, score-0.19]

79 Furthermore, we notice that when the true model is additive (Model 2) or nearly additive (Model 4), AFR performs the best. [sent-259, score-0.346]

80 However, for the non-additive general multivariate regression function (Model 3), GFR performs the best. [sent-260, score-0.135]

81 For all examples, when more and more irrelevant variables are included in the model, SpAM has a better generalization performance due to the regularization effect. [sent-261, score-0.099]

82 Table 1: Comparison of variable selection Model 1 t=0 t=1 t=2 LASSO(sd) 1. [sent-262, score-0.074]

83 1067) The variable selection performances of different methods in Table 1 are very similar to their estimation performances. [sent-383, score-0.142]

84 However, when comparing the variable selection performance, GFR is the best. [sent-389, score-0.074]

85 This suggests that for nonparametric inference, the goals of estimation consistency and variable selection consistency might not be always coherent. [sent-390, score-0.238]

86 We treat Ionosphere as a regression problem although the 1 Available from UCI Machine Learning Database Repository: http:archive. [sent-395, score-0.078]

87 06 1 5 10 15 20 25 30 sparsity Figure 4: Performance of the different algorithms on real datasets: CV error versus sparsity level From Figure 4, since all the error bars are tiny, we deem all the results significant. [sent-411, score-0.102]

88 For all these datasets, if we prefer very sparse models, the performance of the greedy methods are much better than the convex regularization methods due to the much less bias being induced. [sent-413, score-0.25]

89 Both AFR and GFR on this dataset achieve the best performances when there are no more than 10 variables included; while SpAM achieves the best CV score with 25 variables. [sent-415, score-0.06]

90 The main reason that SpAM can achieve good generalization performance when many variables included is due to its regularization effect. [sent-417, score-0.099]

91 7 Conclusions and Discussions We presented two new greedy algorithms for nonparametric regression with either additive mean functions or general multivariate regression functions. [sent-420, score-0.593]

92 Both methods utilize the iterative forward stepwise strategy, which guarantees the model inference is always conducted in low dimensions in each iteration. [sent-421, score-0.148]

93 One thing worthy to note is: people sometimes criticize the forward greedy algorithms since they can never have the chance to correct the errors made in the early steps. [sent-423, score-0.25]

94 We addressed a similar question: Whether a forward-backward procedure also helps in the nonparametric settings? [sent-425, score-0.081]

95 However, the backward step happens very rarely and the empirical performance is almost the same as the purely forward algorithm. [sent-428, score-0.157]

96 In summary, in the nonparametric settings, the backward ingredients will cost much more computational efforts with very tiny performance improvement. [sent-430, score-0.134]

97 A very recent research strand is to learn nonlinear models by the multiple kernel learning machinery [1, 2], another future work is to compare our methods with the multiple kernel learning approach from both theoretical and computational perspectives. [sent-432, score-0.07]

98 Consistency of the group lasso and multiple kernel learning. [sent-436, score-0.186]

99 On the estimation consistency of the group lasso and its applications. [sent-480, score-0.21]

100 On the consistency of feature selection usinggreedy least squares regression. [sent-518, score-0.093]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('afr', 0.533), ('gfr', 0.483), ('foba', 0.262), ('spam', 0.249), ('mj', 0.195), ('additive', 0.173), ('lasso', 0.151), ('smoothers', 0.15), ('greedy', 0.126), ('forward', 0.124), ('mse', 0.097), ('pursuit', 0.096), ('xj', 0.095), ('spline', 0.088), ('nonparametric', 0.081), ('smoother', 0.079), ('regression', 0.078), ('dj', 0.07), ('autompg', 0.067), ('goga', 0.067), ('han', 0.059), ('smoothing', 0.058), ('sparse', 0.058), ('multivariate', 0.057), ('cv', 0.054), ('housing', 0.053), ('splines', 0.053), ('sparsity', 0.051), ('ionosphere', 0.05), ('larry', 0.05), ('khj', 0.05), ('rodeo', 0.05), ('boston', 0.047), ('selection', 0.046), ('xx', 0.045), ('regularization', 0.041), ('ravikumar', 0.04), ('sd', 0.04), ('pradeep', 0.04), ('residual', 0.04), ('bandwidth', 0.038), ('annals', 0.037), ('synthetic', 0.037), ('sobolev', 0.036), ('estimation', 0.035), ('kernel', 0.035), ('backward', 0.033), ('lgorithm', 0.033), ('oga', 0.033), ('orward', 0.033), ('performances', 0.033), ('orthogonal', 0.032), ('tting', 0.032), ('index', 0.032), ('included', 0.031), ('back', 0.03), ('wx', 0.03), ('hilbert', 0.03), ('univariate', 0.03), ('sj', 0.03), ('lafferty', 0.03), ('local', 0.029), ('regularizes', 0.029), ('egression', 0.029), ('variable', 0.028), ('dimensional', 0.028), ('conduct', 0.028), ('xs', 0.027), ('meier', 0.027), ('matching', 0.027), ('variables', 0.027), ('fj', 0.026), ('sin', 0.025), ('competitors', 0.025), ('sx', 0.025), ('adaptive', 0.025), ('convex', 0.025), ('consistency', 0.024), ('conducted', 0.024), ('minimax', 0.024), ('simulated', 0.024), ('liu', 0.024), ('squares', 0.023), ('selector', 0.023), ('francis', 0.023), ('correlation', 0.022), ('dantzig', 0.022), ('called', 0.021), ('methodological', 0.021), ('tong', 0.021), ('crossvalidation', 0.021), ('hp', 0.021), ('estimate', 0.021), ('inner', 0.02), ('rj', 0.02), ('tiny', 0.02), ('john', 0.02), ('op', 0.02), ('trevor', 0.02), ('rate', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

Author: Han Liu, Xi Chen

Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1

2 0.12912895 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright

Abstract: We study minimax rates for estimating high-dimensional nonparametric regression models with sparse additive structure and smoothness constraints. More precisely, our goal is to estimate a function f ∗ : Rp → R that has an additive decomposition of the form ∗ ∗ f ∗ (X1 , . . . , Xp ) = j∈S hj (Xj ), where each component function hj lies in some class H of “smooth” functions, and S ⊂ {1, . . . , p} is an unknown subset with cardinality s = |S|. Given n i.i.d. observations of f ∗ (X) corrupted with additive white Gaussian noise where the covariate vectors (X1 , X2 , X3 , ..., Xp ) are drawn with i.i.d. components from some distribution P, we determine lower bounds on the minimax rate for estimating the regression function with respect to squared-L2 (P) error. Our main result is a lower bound on the minimax rate that scales as max s log(p/s) , s ǫ2 (H) . The first term reflects the sample size required for n n performing subset selection, and is independent of the function class H. The second term s ǫ2 (H) is an s-dimensional estimation term corresponding to the sample size required for n estimating a sum of s univariate functions, each chosen from the function class H. It depends linearly on the sparsity index s but is independent of the global dimension p. As a special case, if H corresponds to functions that are m-times differentiable (an mth -order Sobolev space), then the s-dimensional estimation term takes the form sǫ2 (H) ≍ s n−2m/(2m+1) . Either of n the two terms may be dominant in different regimes, depending on the relation between the sparsity and smoothness of the additive decomposition.

3 0.096829832 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

Author: Shuheng Zhou

Abstract: Given n noisy samples with p dimensions, where n ≪ p, we show that the multistep thresholding procedure can accurately estimate a sparse vector β ∈ Rp in a linear model, under the restricted eigenvalue conditions (Bickel-Ritov-Tsybakov 09). Thus our conditions for model selection consistency are considerably weaker than what has been achieved in previous works. More importantly, this method allows very significant values of s, which is the number of non-zero elements in the true parameter. For example, it works for cases where the ordinary Lasso would have failed. Finally, we show that if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector (Cand` se Tao 07) achieves the ℓ2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle which would supply perfect information about which coordinates are non-zero and which are above the noise level, while selecting a sufficiently sparse model. 1

4 0.089960158 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

Author: Arthur Gretton, Peter Spirtes, Robert E. Tillman

Abstract: The recently proposed additive noise model has advantages over previous directed structure learning approaches since it (i) does not assume linearity or Gaussianity and (ii) can discover a unique DAG rather than its Markov equivalence class. However, for certain distributions, e.g. linear Gaussians, the additive noise model is invertible and thus not useful for structure learning, and it was originally proposed for the two variable case with a multivariate extension which requires enumerating all possible DAGs. We introduce weakly additive noise models, which extends this framework to cases where the additive noise model is invertible and when additive noise is not present. We then provide an algorithm that learns an equivalence class for such models from data, by combining a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the Markov equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. 1

5 0.088980421 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar

Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1

6 0.085213587 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

7 0.075401559 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

8 0.070653655 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis

9 0.067697205 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

10 0.063909329 157 nips-2009-Multi-Label Prediction via Compressed Sensing

11 0.063130654 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

12 0.056640293 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

13 0.055496138 161 nips-2009-Nash Equilibria of Static Prediction Games

14 0.053043209 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

15 0.052611735 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

16 0.051781803 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

17 0.050566528 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

18 0.049550269 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

19 0.048670221 55 nips-2009-Compressed Least-Squares Regression

20 0.044525284 249 nips-2009-Tracking Dynamic Sources of Malicious Activity at Internet Scale


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.15), (1, 0.083), (2, 0.003), (3, 0.073), (4, -0.022), (5, -0.049), (6, 0.092), (7, -0.093), (8, 0.068), (9, 0.04), (10, 0.003), (11, -0.065), (12, -0.012), (13, 0.029), (14, 0.028), (15, 0.07), (16, -0.027), (17, -0.019), (18, -0.024), (19, 0.034), (20, -0.059), (21, 0.048), (22, -0.052), (23, 0.053), (24, -0.034), (25, 0.063), (26, 0.026), (27, 0.027), (28, -0.104), (29, -0.007), (30, -0.078), (31, 0.082), (32, -0.062), (33, 0.058), (34, 0.034), (35, 0.024), (36, -0.134), (37, 0.05), (38, -0.029), (39, -0.012), (40, 0.125), (41, 0.038), (42, -0.032), (43, -0.062), (44, -0.089), (45, -0.039), (46, 0.039), (47, 0.107), (48, 0.115), (49, 0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91273904 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

Author: Han Liu, Xi Chen

Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1

2 0.68512136 105 nips-2009-Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction

Author: Grzegorz Swirszcz, Naoki Abe, Aurelie C. Lozano

Abstract: We consider the problem of variable group selection for least squares regression, namely, that of selecting groups of variables for best regression performance, leveraging and adhering to a natural grouping structure within the explanatory variables. We show that this problem can be efficiently addressed by using a certain greedy style algorithm. More precisely, we propose the Group Orthogonal Matching Pursuit algorithm (Group-OMP), which extends the standard OMP procedure (also referred to as “forward greedy feature selection algorithm” for least squares regression) to perform stage-wise group variable selection. We prove that under certain conditions Group-OMP can identify the correct (groups of) variables. We also provide an upperbound on the l∞ norm of the difference between the estimated regression coefficients and the true coefficients. Experimental results on simulated and real world datasets indicate that Group-OMP compares favorably to Group Lasso, OMP and Lasso, both in terms of variable selection and prediction accuracy. 1

3 0.64040321 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis

Author: Sundeep Rangan, Alyson K. Fletcher

Abstract: A well-known analysis of Tropp and Gilbert shows that orthogonal matching pursuit (OMP) can recover a k-sparse n-dimensional real vector from m = 4k log(n) noise-free linear measurements obtained through a random Gaussian measurement matrix with a probability that approaches one as n → ∞. This work strengthens this result by showing that a lower number of measurements, m = 2k log(n − k), is in fact sufficient for asymptotic recovery. More generally, when the sparsity level satisfies kmin ≤ k ≤ kmax but is unknown, m = 2kmax log(n − kmin ) measurements is sufficient. Furthermore, this number of measurements is also sufficient for detection of the sparsity pattern (support) of the vector with measurement errors provided the signal-to-noise ratio (SNR) scales to infinity. The scaling m = 2k log(n − k) exactly matches the number of measurements required by the more complex lasso method for signal recovery in a similar SNR scaling.

4 0.62694865 144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright

Abstract: We study minimax rates for estimating high-dimensional nonparametric regression models with sparse additive structure and smoothness constraints. More precisely, our goal is to estimate a function f ∗ : Rp → R that has an additive decomposition of the form ∗ ∗ f ∗ (X1 , . . . , Xp ) = j∈S hj (Xj ), where each component function hj lies in some class H of “smooth” functions, and S ⊂ {1, . . . , p} is an unknown subset with cardinality s = |S|. Given n i.i.d. observations of f ∗ (X) corrupted with additive white Gaussian noise where the covariate vectors (X1 , X2 , X3 , ..., Xp ) are drawn with i.i.d. components from some distribution P, we determine lower bounds on the minimax rate for estimating the regression function with respect to squared-L2 (P) error. Our main result is a lower bound on the minimax rate that scales as max s log(p/s) , s ǫ2 (H) . The first term reflects the sample size required for n n performing subset selection, and is independent of the function class H. The second term s ǫ2 (H) is an s-dimensional estimation term corresponding to the sample size required for n estimating a sum of s univariate functions, each chosen from the function class H. It depends linearly on the sparsity index s but is independent of the global dimension p. As a special case, if H corresponds to functions that are m-times differentiable (an mth -order Sobolev space), then the s-dimensional estimation term takes the form sǫ2 (H) ≍ s n−2m/(2m+1) . Either of n the two terms may be dominant in different regimes, depending on the relation between the sparsity and smoothness of the additive decomposition.

5 0.62139004 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

Author: Sundeep Rangan, Vivek Goyal, Alyson K. Fletcher

Abstract: The replica method is a non-rigorous but widely-accepted technique from statistical physics used in the asymptotic analysis of large, random, nonlinear problems. This paper applies the replica method to non-Gaussian maximum a posteriori (MAP) estimation. It is shown that with random linear measurements and Gaussian noise, the asymptotic behavior of the MAP estimate of an n-dimensional vector “decouples” as n scalar MAP estimators. The result is a counterpart to Guo and Verd´ ’s replica analysis of minimum mean-squared error estimation. u The replica MAP analysis can be readily applied to many estimators used in compressed sensing, including basis pursuit, lasso, linear estimation with thresholding, and zero norm-regularized estimation. In the case of lasso estimation the scalar estimator reduces to a soft-thresholding operator, and for zero normregularized estimation it reduces to a hard-threshold. Among other benefits, the replica method provides a computationally-tractable method for exactly computing various performance metrics including mean-squared error and sparsity pattern recovery probability.

6 0.61656833 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

7 0.56773317 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

8 0.5284636 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

9 0.4989523 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

10 0.49163756 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

11 0.46201876 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

12 0.46115759 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

13 0.45950958 7 nips-2009-A Data-Driven Approach to Modeling Choice

14 0.45005506 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties

15 0.42178819 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison

16 0.41577449 67 nips-2009-Directed Regression

17 0.40497717 206 nips-2009-Riffled Independence for Ranked Data

18 0.39479032 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

19 0.38626158 138 nips-2009-Learning with Compressible Priors

20 0.36142358 11 nips-2009-A General Projection Property for Distribution Families


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.094), (25, 0.094), (26, 0.25), (35, 0.05), (36, 0.124), (39, 0.036), (58, 0.1), (61, 0.035), (71, 0.034), (81, 0.011), (86, 0.055), (91, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82211161 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

Author: Han Liu, Xi Chen

Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1

2 0.77557951 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

Author: Arnak Dalalyan, Renaud Keriven

Abstract: We propose a new approach to the problem of robust estimation in multiview geometry. Inspired by recent advances in the sparse recovery problem of statistics, we define our estimator as a Bayesian maximum a posteriori with multivariate Laplace prior on the vector describing the outliers. This leads to an estimator in which the fidelity to the data is measured by the L∞ -norm while the regularization is done by the L1 -norm. The proposed procedure is fairly fast since the outlier removal is done by solving one linear program (LP). An important difference compared to existing algorithms is that for our estimator it is not necessary to specify neither the number nor the proportion of the outliers. We present strong theoretical results assessing the accuracy of our procedure, as well as a numerical example illustrating its efficiency on real data. 1

3 0.65643573 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar

Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1

4 0.64973223 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

Author: Matthias Hein

Abstract: Motivated by recent developments in manifold-valued regression we propose a family of nonparametric kernel-smoothing estimators with metric-space valued output including several robust versions. Depending on the choice of the output space and the metric the estimator reduces to partially well-known procedures for multi-class classification, multivariate regression in Euclidean space, regression with manifold-valued output and even some cases of structured output learning. In this paper we focus on the case of regression with manifold-valued input and output. We show pointwise and Bayes consistency for all estimators in the family for the case of manifold-valued output and illustrate the robustness properties of the estimators with experiments. 1

5 0.64933366 180 nips-2009-On the Convergence of the Concave-Convex Procedure

Author: Gert R. Lanckriet, Bharath K. Sriperumbudur

Abstract: The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.

6 0.64831805 94 nips-2009-Fast Learning from Non-i.i.d. Observations

7 0.64825445 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

8 0.64601153 122 nips-2009-Label Selection on Graphs

9 0.64563316 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

10 0.64433187 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

11 0.64250475 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

12 0.6423896 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

13 0.63985485 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

14 0.63953137 71 nips-2009-Distribution-Calibrated Hierarchical Classification

15 0.63894147 97 nips-2009-Free energy score space

16 0.63826627 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization

17 0.63758397 76 nips-2009-Efficient Learning using Forward-Backward Splitting

18 0.63657606 239 nips-2009-Submodularity Cuts and Applications

19 0.63622743 157 nips-2009-Multi-Label Prediction via Compressed Sensing

20 0.63469392 220 nips-2009-Slow Learners are Fast