nips nips2007 nips2007-179 knowledge-graph by maker-knowledge-mining

179 nips-2007-SpAM: Sparse Additive Models

Source: pdf

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classiﬁcation called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for ﬁtting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in ﬁtting sparse nonparametric models in high dimensional data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Our methods combine ideas from sparse linear modeling and additive nonparametric regression. [sent-2, score-0.426]

2 We derive a method for ﬁtting the models that is effective even when the number of covariates is larger than the sample size. [sent-3, score-0.082]

3 A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in ﬁtting sparse nonparametric models in high dimensional data. [sent-4, score-0.233]

4 1 Introduction Substantial progress has been made recently on the problem of ﬁtting high dimensional linear regression models of the form Yi = X iT β + i , for i = 1, . [sent-5, score-0.09]

5 Finding an estimate of β when p > n that is both statistically well-behaved and computationally efﬁcient has proved challenging; however, the lasso estimator (Tibshirani (1996)) has been remarkably successful. [sent-10, score-0.157]

6 The lasso estimator β minimizes the 1 -penalized sums of squares p i (Yi − X iT β) + λ j=1 (1) |β j | with the 1 penalty β 1 encouraging sparse solutions, where many components β j are zero. [sent-11, score-0.338]

7 The good empirical success of this estimator has been recently backed up by results conﬁrming that it has strong theoretical properties; see (Greenshtein and Ritov, 2004; Zhao and Yu, 2007; Meinshausen and Yu, 2006; Wainwright, 2006). [sent-12, score-0.074]

8 The nonparametric regression model Yi = m(X i )+ i , where m is a general smooth function, relaxes the strong assumptions made by a linear model, but is much more challenging in high dimensions. [sent-13, score-0.203]

9 An extension of the additive model is the functional ANOVA model Yi = 1≤ j≤ p m j (X i j ) + j < m j,k, (X i j , X ik , X i ) + · · · + i (3) which allows interactions among the variables. [sent-15, score-0.246]

10 Unfortunately, additive models only have good statistical and computational behavior when the number of variables p is not large relative to the sample size n. [sent-16, score-0.247]

11 In this paper we introduce sparse additive models (SpAM) that extend the advantages of sparse linear models to the additive, nonparametric setting. [sent-17, score-0.515]

12 The underlying model is the same as in (2), but constraints are placed on the component functions {m j }1≤ j≤ p to simultaneously encourage smoothness of each component and sparsity across components; the penalty is similar to that used by the COSSO of Lin and Zhang (2006). [sent-18, score-0.361]

13 The SpAM estimation procedure we introduce allows the use of arbitrary nonparametric smoothing techniques, and in the case where the underlying component functions are linear, it reduces to the lasso. [sent-19, score-0.305]

14 It naturally extends to classiﬁcation problems using generalized additive models. [sent-20, score-0.222]

15 We ﬁrst present a population version of the procedure that intuitively suggests how sparsity is achieved. [sent-23, score-0.179]

16 In the following section we derive a backﬁtting procedure for solving this optimization problem in the ﬁnite sample setting. [sent-25, score-0.1]

17 To motivate our approach, we ﬁrst consider a formulation that scales each component function g j by a scalar β j , and then imposes an 1 constraint on β = (β1 , . [sent-26, score-0.142]

18 , x p ) that have an additive form: f (x) = j f j (x j ). [sent-40, score-0.222]

19 The standard additive model optimization problem, in the population setting, is min f j ∈H j , 1≤ j≤ p E Y− p j=1 f j (X j ) 2 (5) and m(x) = E(Y | X = x) is the unknown regression function. [sent-41, score-0.358]

20 Now consider the following modiﬁcation of this problem that imposes additional constraints: (P) min β∈R p ,g j ∈H j E Y− 2 p j=1 β j g j (X j ) (6a) p subject to j=1 |β j | ≤ L (6b) E g 2 = 1, j = 1, . [sent-42, score-0.033]

21 Intuitively, the constraint that β lies in the 1 -ball {β : β 1 ≤ L} encourages sparsity of the estimated β, just as for the parametric lasso. [sent-49, score-0.235]

22 When β is p p sparse, the estimated additive function f (x) = j=1 f j (x j ) = j=1 β j g j (x j ) will also be sparse, meaning that many of the component functions f j (·) = β j g j (·) are identically zero. [sent-50, score-0.323]

23 While this optimization problem makes plain the role 1 regularization of β to achieve sparsity, it has the unfortunate drawback of not being convex. [sent-52, score-0.072]

24 More speciﬁcally, while the optimization problem is convex in β and {g j } separately, it is not convex in β and {g j } jointly. [sent-53, score-0.107]

25 2 However, consider the following related optimization problem: (Q) min f j ∈H j p j=1 E Y− f j (X j ) 2 (7a) p subject to j=1 E( f j2 (X j )) ≤ L (7b) E( f j ) = 0, j = 1, . [sent-54, score-0.045]

26 Moreover, the solutions to problems (P) and (Q) are equivalent: β ∗ , g∗ j j optimizes (P) implies f j∗ optimizes (Q) implies β∗ = ( f j j 2) T f j∗ = β ∗ g ∗ optimizes (Q); j j , g ∗ = f j∗ / f j∗ j optimizes (P). [sent-59, score-0.25]

27 2 While optimization problem (Q) has the important virtue of being convex, the way it encourages sparsity is not intuitive; the following observation provides some insight. [sent-60, score-0.227]

28 Then the projec- tion π12 C onto the ﬁrst two components is an 2 ball. [sent-62, score-0.077]

29 However, the projection π13 C onto the ﬁrst and third components is an 1 ball. [sent-63, score-0.109]

30 In this way, it can be seen that the constraint j f j 2 ≤ L acts as an 1 constraint across components to encourage sparsity, while it acts as an 2 constraint within components to encourage smoothness, as in a ridge regression penalty. [sent-64, score-0.448]

31 It is thus crucial that 2 the norm f j 2 appears in the constraint, and not its square f j 2 . [sent-65, score-0.029]

32 For the purposes of sparsity, this constraint could be replaced by j f j q ≤ L for any q ≥ 1. [sent-66, score-0.045]

33 , xn j ), the optimization problem reduces to the lasso. [sent-73, score-0.045]

34 The use of scaling coefﬁcients together with a nonnegative garrote penalty, similar to our problem (P), is considered by Yuan (2007). [sent-74, score-0.072]

35 However, the component functions g j are ﬁxed, so that the procedure is not asymptotically consistent. [sent-75, score-0.094]

36 The form of the optimization problem (Q) is similar to that of the COSSO for smoothing spline ANOVA models (Lin and Zhang, 2006); however, our method differs signiﬁcantly from the COSSO, as discussed below. [sent-76, score-0.141]

37 3 A Backﬁtting Algorithm for SpAM We now derive a coordinate descent algorithm for ﬁtting a sparse additive model. [sent-78, score-0.311]

38 We write the Lagrangian for the optimization problem (Q) as 1 L( f, λ, µ) = E Y − 2 p j=1 f j (X j ) 2 p +λ j=1 E( f j2 (X j )) + µ j E( f j ). [sent-80, score-0.045]

39 Letting P j = E[R j | X j ] denote the projection of the residual onto H j , the solution satisﬁes   λ 1 +  f j = P j if E(P j2 ) > λ 2) E( f j 3 (10) (11) Input: Data (X i , Yi ), regularization parameter λ. [sent-84, score-0.112]

40 , p: Compute the residual: R j = Y − k= j f k (X k ); Estimate the projection P j = E[R j | X j ] by smoothing: P j = S j R j ; Estimate the norm s j = E[P j ]2 using, for example, (15) or (35); λ Pj ; sj + Center: f j ← f j − mean( f j ). [sent-92, score-0.061]

41 Output: Component functions f j and estimator m(X i ) = Soft-threshold: f j = 1 − j f j (X i j ). [sent-93, score-0.074]

42 Condition (11), in turn, implies   λ  E( f 2 ) = E(P 2 ) or 1 + j j 2) E( f j E( f j2 ) = E(P j2 ) − λ. [sent-95, score-0.025]

43 In the ﬁnite sample case, as in standard backﬁtting (Hastie and Tibshirani, 1999), we estimate the projection E[R j | X j ] by a smooth of the residuals: Pj = S j R j (14) where S j is a linear smoother, such as a local linear or kernel smoother. [sent-97, score-0.084]

44 While the motivating optimization problem (Q) is similar to that considered in the COSSO (Lin and Zhang, 2006) for smoothing splines, the SpAM backﬁtting algorithm decouples smoothing and sparsity, through a combination of soft-thresholding and smoothing. [sent-102, score-0.237]

45 In particular, SpAM backﬁtting can be carried out with any nonparametric smoother; it is not restricted to splines. [sent-103, score-0.115]

46 Moreover, by iteratively estimating over the components and using soft thresholding, our procedure is simple to implement and scales to high dimensions. [sent-104, score-0.081]

47 1 SpAM for Nonparametric Logistic Regression The SpAM backﬁtting procedure can be extended to nonparametric logistic regression for classiﬁcation. [sent-106, score-0.296]

48 The additive logistic model is exp P(Y = 1 | X ) ≡ p(X ; f ) = 4 1 + exp p j=1 f j (X j ) p j=1 (16) f j (X j ) where Y ∈ {0, 1}, and the population log-likelihood is ( f ) = E Y f (X ) − log (1 + exp f (X )) . [sent-107, score-0.342]

49 Recall that in the local scoring algorithm for generalized additive models (Hastie and Tibshirani, 1999) in the logistic case, one runs the backﬁtting procedure within Newton’s method. [sent-108, score-0.369]

50 The weighted smooth is given by S j (w R j ) Pj = . [sent-110, score-0.027]

51 As in the unregularized case, this condition is nonlinear in f , and so we linearize the gradient of the log-likelihood around f 0 . [sent-112, score-0.039]

52 This yields the linearized condition E w(X )( f (X ) − Z ) | X j + λv j = 0. [sent-113, score-0.039]

53 When E( f j2 ) = 0, this implies the condition   λ E w | X j +  f j (X j ) = E(w R j | X j ). [sent-114, score-0.064]

54 (20) 2 E( f j ) In the ﬁnite sample case, in terms of the smoothing matrix S j , this becomes S j (w R j ) . [sent-115, score-0.121]

55 fj = Sjw + λ E( f j2 ) (21) If S j (w R j ) 2 < λ, then f j = 0. [sent-116, score-0.04]

56 Otherwise, this implicit, nonlinear equation for f j cannot be solved explicitly, so we propose to iterate until convergence: S j (w R j ) fj ← . [sent-117, score-0.066]

57 (22) √ Sjw + λ n fj 2 When λ = 0, this yields the standard local scoring update (18). [sent-118, score-0.067]

58 An example of logistic SpAM is given in Section 5. [sent-119, score-0.09]

59 1 Properties of SpAM SpAM is Persistent The notion of risk consistency, or persistence, was studied by Juditsky and Nemirovski (2000) and Greenshtein and Ritov (2004) in the context of linear models. [sent-121, score-0.027]

60 Let (X, Y ) denote a new pair (independent of the observed data) and deﬁne the predictive risk when predicting Y with f (X ) by R( f ) = E(Y − f (X ))2 . [sent-122, score-0.027]

61 (23) Since we consider predictors of the form f (x) = β j g j (x j ) we also write the risk as R(β, g) j where β = (β1 , . [sent-123, score-0.027]

62 Following Greenshtein and Ritov (2004), we say that an estimator m n is persistent relative to a class of functions Mn if P R(m n ) − R(m ∗ ) → 0 (24) n ∗ = argmin where m n R( f ) is the predictive oracle. [sent-130, score-0.15]

63 Greenshtein and Ritov (2004) showed f ∈Mn that the lasso is persistent for the class of linear models Mn = { f (x) = x T β : β 1 ≤ L n } if L n = o((n/ log n)1/4 ). [sent-131, score-0.159]

64 Then SpAM is persistent relative to the p class of additive models Mn = f (x) = j=1 β j g j (x j ) : β 1 ≤ L n if L n = o n (1−ξ )/4 . [sent-136, score-0.298]

65 We show a similar result for SpAM with the sparse backﬁtting procedure. [sent-139, score-0.089]

66 For the purpose of analysis, we use orthogonal function regression as the smoothing procedure. [sent-140, score-0.181]

67 We truncate the basis to ﬁnite dimension dn , and let dn → ∞ such that dn /n → 0. [sent-145, score-0.24]

68 The SpAM optimization problem can then be written as min β 1 Y− 2n p j=1 2 jβj p + λn 1 T β n j j=1 T j jβj (25) where each β j is a d-dimensional vector. [sent-151, score-0.045]

69 Let Sn = {j : β j = 0} denote the estimated set of variables from the minimizer βn of (25). [sent-153, score-0.037]

70 5 Experiments In this section we present experimental results for SpAM applied to both synthetic and real data, including regression and classiﬁcation examples that illustrate the behavior of the algorithm in various conditions. [sent-158, score-0.061]

71 We ﬁrst use simulated data to investigate the performance of the SpAM backﬁtting algorithm, where the true sparsity pattern is known. [sent-159, score-0.147]

72 If not explicitly stated otherwise, the data are always rescaled to lie in a d-dimensional cube [0, 1]d , and a kernel smoother with Gaussian kernel is used. [sent-161, score-0.044]

73 To tune the penalization parameter λ, we use a C p statistic, which is deﬁned as C p( f ) = 1 n n i=1 Yi − p j=1 f j (X j ) 2 + 2σ 2 n p j=1 trace(S j ) 1[ f j = 0] (29) where S j is the smoothing matrix for the j-th dimension and σ 2 is the estimated variance. [sent-162, score-0.218]

74 0 0 10 20 30 40 50 60 70 80 90 110 130 150 sample size zero zero 0. [sent-195, score-0.025]

75 0 Figure 2: (Simulated data) Upper left: The empirical 2 norm of the estimated components as plotted against the tuning parameter λ; the value on the x-axis is proportional to j f j 2 . [sent-235, score-0.117]

76 Upper center: The C p scores against the tuning parameter λ; the dashed vertical line corresponds to the value of λ which has the smallest C p score. [sent-236, score-0.025]

77 Upper right: The proportion of 200 trials where the correct relevant variables are selected, as a function of sample size n. [sent-237, score-0.025]

78 Lower (from left to right): Estimated (solid lines) versus true additive component functions (dashed lines) for the ﬁrst 6 dimensions; the remaining components are zero. [sent-238, score-0.337]

79 2 Boston Housing The Boston housing data was collected to study house values in the suburbs of Boston; there are altogether 506 observations with 10 covariates. [sent-240, score-0.094]

80 To explore the sparsistency properties of our method, we add 20 irrelevant variables. [sent-243, score-0.075]

81 Ten of them are randomly drawn from Uniform(0, 1), the remaining ten are a random permutation of the original ten covariates, so that they have the same empirical densities. [sent-244, score-0.062]

82 It correctly zeros out both types of irrelevant variables. [sent-247, score-0.085]

83 The importance of variables nox and b are borderline. [sent-249, score-0.054]

84 However, using C p as the selection criterion, the variables indux, age, dist, and tax are estimated to be irrelevant, a result not seen in other studies. [sent-252, score-0.12]

85 3 SpAM for Spam Here we consider an email spam classiﬁcation problem, using the logistic SpAM backﬁtting algorithm from Section 3. [sent-254, score-0.902]

86 (2001), using a set of 3,065 emails as a training set, and conducting hypothesis tests to choose signiﬁcant variables; there are a total of 4,601 observations with p = 57 attributes, all numeric. [sent-257, score-0.043]

87 The attributes measure the percentage of speciﬁc words or characters in the email, the average and maximum run lengths of upper case letters, and the total number of such letters. [sent-258, score-0.026]

88 To demonstrate how SpAM performs well with sparse data, we only sample n = 300 emails as the training set, with the remaining 4301 data points used as the test set. [sent-259, score-0.157]

89 We also use the test data as the hold-out set to tune the penalization parameter λ. [sent-260, score-0.085]

90 The results of a typical run of logistic SpAM are summarized in Figure 4, using plug-in bandwidths. [sent-261, score-0.09]

91 20 Figure 3: (Boston housing) Left: The empirical 2 norm of the estimated components versus the regularization parameter λ. [sent-323, score-0.144]

92 Center: The C p scores against λ; the dashed vertical line corresponds to best C p score. [sent-324, score-0.025]

93 5 penalization parameter Figure 4: (Email spam) Classiﬁcation accuracies and variable selection for logistic SpAM. [sent-334, score-0.176]

94 Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. [sent-339, score-0.058]

95 Sharp thresholds for high-dimensional and noisy recovery of sparsity. [sent-388, score-0.044]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('spam', 0.752), ('additive', 0.222), ('tting', 0.13), ('sparsity', 0.119), ('back', 0.115), ('nonparametric', 0.115), ('cosso', 0.108), ('rdle', 0.108), ('boston', 0.1), ('smoothing', 0.096), ('greenshtein', 0.094), ('housing', 0.094), ('logistic', 0.09), ('sparse', 0.089), ('pj', 0.086), ('ritov', 0.086), ('lasso', 0.083), ('ibshirani', 0.081), ('sjw', 0.081), ('dn', 0.08), ('persistent', 0.076), ('estimator', 0.074), ('sparsistent', 0.071), ('anova', 0.071), ('supp', 0.064), ('mn', 0.064), ('component', 0.064), ('yi', 0.062), ('regression', 0.061), ('email', 0.06), ('penalization', 0.057), ('covariates', 0.057), ('lin', 0.056), ('tibshirani', 0.055), ('astie', 0.054), ('lstat', 0.054), ('nox', 0.054), ('ptratio', 0.054), ('tax', 0.054), ('irrelevant', 0.051), ('hastie', 0.051), ('components', 0.051), ('optimizes', 0.05), ('zhang', 0.05), ('encourage', 0.048), ('garrote', 0.047), ('cmax', 0.047), ('cmin', 0.047), ('constraint', 0.045), ('optimization', 0.045), ('recovery', 0.044), ('smoother', 0.044), ('emails', 0.043), ('penalty', 0.041), ('fj', 0.04), ('condition', 0.039), ('wainwright', 0.038), ('age', 0.038), ('estimated', 0.037), ('cp', 0.036), ('zeros', 0.034), ('subgradient', 0.034), ('encourages', 0.034), ('imposes', 0.033), ('projection', 0.032), ('convex', 0.031), ('ten', 0.031), ('population', 0.03), ('procedure', 0.03), ('norm', 0.029), ('selection', 0.029), ('virtue', 0.029), ('dimensional', 0.029), ('simulated', 0.028), ('stationary', 0.028), ('norms', 0.028), ('tune', 0.028), ('scoring', 0.027), ('acts', 0.027), ('risk', 0.027), ('lagrangian', 0.027), ('residual', 0.027), ('regularization', 0.027), ('smooth', 0.027), ('iterate', 0.026), ('attributes', 0.026), ('onto', 0.026), ('sn', 0.026), ('sample', 0.025), ('implies', 0.025), ('nonnegative', 0.025), ('smoothness', 0.025), ('dashed', 0.025), ('uc', 0.024), ('orthogonal', 0.024), ('functional', 0.024), ('riedman', 0.024), ('pradeep', 0.024), ('ravikumar', 0.024), ('sparsistency', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

2 0.16962747 62 nips-2007-Convex Learning with Invariances

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1

3 0.11461718 99 nips-2007-Hierarchical Penalization

Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux

Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the ﬁtting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the ﬁnal combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the inﬂuence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1

4 0.10607809 53 nips-2007-Compressed Regression

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefﬁcients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1

5 0.075386167 43 nips-2007-Catching Change-points with Lasso

Author: Céline Levy-leduc, Zaïd Harchaoui

Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1

6 0.071430095 8 nips-2007-A New View of Automatic Relevance Determination

7 0.07042633 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

8 0.065604582 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

9 0.061685845 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

10 0.058244042 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

11 0.054054819 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

12 0.053903904 145 nips-2007-On Sparsity and Overcompleteness in Image Models

13 0.051284675 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

14 0.04501674 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

15 0.042345461 32 nips-2007-Bayesian Co-Training

16 0.042306274 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis

17 0.040877957 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

18 0.040150948 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

19 0.039359689 195 nips-2007-The Generalized FITC Approximation

20 0.038795453 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, 0.035), (2, -0.053), (3, 0.064), (4, -0.008), (5, 0.027), (6, -0.018), (7, 0.001), (8, -0.065), (9, 0.068), (10, 0.117), (11, -0.026), (12, 0.086), (13, 0.001), (14, 0.078), (15, 0.022), (16, 0.073), (17, 0.075), (18, 0.014), (19, -0.127), (20, -0.059), (21, -0.042), (22, 0.106), (23, 0.011), (24, 0.033), (25, 0.071), (26, -0.094), (27, 0.053), (28, -0.091), (29, 0.09), (30, 0.05), (31, 0.037), (32, -0.055), (33, 0.128), (34, 0.004), (35, 0.016), (36, -0.088), (37, 0.049), (38, 0.148), (39, 0.008), (40, -0.086), (41, 0.017), (42, -0.071), (43, -0.164), (44, -0.04), (45, -0.064), (46, -0.063), (47, -0.053), (48, -0.112), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92676479 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

2 0.79989535 53 nips-2007-Compressed Regression

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

3 0.67078161 99 nips-2007-Hierarchical Penalization

Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux

4 0.65769881 43 nips-2007-Catching Change-points with Lasso

Author: Céline Levy-leduc, Zaïd Harchaoui

5 0.57931882 62 nips-2007-Convex Learning with Invariances

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

6 0.50077081 8 nips-2007-A New View of Automatic Relevance Determination

7 0.46610758 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

8 0.45304167 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

9 0.42143285 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

10 0.37932348 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

11 0.35877663 159 nips-2007-Progressive mixture rules are deviation suboptimal

12 0.34382743 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

13 0.34099475 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

14 0.33985043 118 nips-2007-Learning with Transformation Invariant Kernels

15 0.3343133 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

16 0.33217421 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

17 0.32097632 145 nips-2007-On Sparsity and Overcompleteness in Image Models

18 0.3170763 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

19 0.3080543 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

20 0.30493194 40 nips-2007-Bundle Methods for Machine Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.288), (5, 0.048), (13, 0.033), (16, 0.039), (19, 0.026), (21, 0.073), (31, 0.013), (34, 0.02), (35, 0.038), (47, 0.109), (49, 0.032), (83, 0.1), (85, 0.027), (87, 0.013), (90, 0.058)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77591354 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

Author: Pierre Ferrez, José Millán

Abstract: Brain-computer interfaces (BCIs), as any other interaction modality based on physiological signals and body channels (e.g., muscular activity, speech and gestures), are prone to errors in the recognition of subject’s intent. An elegant approach to improve the accuracy of BCIs consists in a veriﬁcation procedure directly based on the presence of error-related potentials (ErrP) in the EEG recorded right after the occurrence of an error. Six healthy volunteer subjects with no prior BCI experience participated in a new human-robot interaction experiment where they were asked to mentally move a cursor towards a target that can be reached within a few steps using motor imagination. This experiment conﬁrms the previously reported presence of a new kind of ErrP. These “Interaction ErrP” exhibit a ﬁrst sharp negative peak followed by a positive peak and a second broader negative peak (∼290, ∼350 and ∼470 ms after the feedback, respectively). But in order to exploit these ErrP we need to detect them in each single trial using a short window following the feedback associated to the response of the classiﬁer embedded in the BCI. We have achieved an average recognition rate of correct and erroneous single trials of 81.8% and 76.2%, respectively. Furthermore, we have achieved an average recognition rate of the subject’s intent while trying to mentally drive the cursor of 73.1%. These results show that it’s possible to simultaneously extract useful information for mental control to operate a brain-actuated device as well as cognitive states such as error potentials to improve the quality of the braincomputer interaction. Finally, using a well-known inverse model (sLORETA), we show that the main focus of activity at the occurrence of the ErrP are, as expected, in the pre-supplementary motor area and in the anterior cingulate cortex. 1

same-paper 2 0.74279273 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

3 0.7290253 39 nips-2007-Boosting the Area under the ROC Curve

Author: Phil Long, Rocco Servedio

Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efﬁciently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassiﬁcation noise, given access to a noise-tolerant weak ranker.

4 0.54841626 167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach

Author: José M. Hernández-lobato, Tjeerd Dijkstra, Tom Heskes

Abstract: We introduce a hierarchical Bayesian model for the discovery of putative regulators from gene expression data only. The hierarchy incorporates the knowledge that there are just a few regulators that by themselves only regulate a handful of genes. This is implemented through a so-called spike-and-slab prior, a mixture of Gaussians with different widths, with mixing weights from a hierarchical Bernoulli model. For efﬁcient inference we implemented expectation propagation. Running the model on a malaria parasite data set, we found four genes with signiﬁcant homology to transcription factors in an amoebe, one RNA regulator and three genes of unknown function (out of the top ten genes considered).

5 0.53775436 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

Author: Matthias Bethge, Philipp Berens

Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1

6 0.52869475 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

7 0.52767378 158 nips-2007-Probabilistic Matrix Factorization

8 0.52665657 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

9 0.52460623 53 nips-2007-Compressed Regression

10 0.52384114 86 nips-2007-Exponential Family Predictive Representations of State

11 0.52291042 195 nips-2007-The Generalized FITC Approximation

12 0.52237856 24 nips-2007-An Analysis of Inference with the Universum

13 0.52203768 100 nips-2007-Hippocampal Contributions to Control: The Third Way

14 0.5213964 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

15 0.52089489 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

16 0.52074379 63 nips-2007-Convex Relaxations of Latent Variable Training

17 0.51953858 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

18 0.51945847 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

19 0.5191828 156 nips-2007-Predictive Matrix-Variate t Models

20 0.51914144 174 nips-2007-Selecting Observations against Adversarial Objectives