nips nips2007 nips2007-53 knowledge-graph by maker-knowledge-mining

53 nips-2007-Compressed Regression


Source: pdf

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. [sent-2, score-1.585]

2 A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. [sent-3, score-0.423]

3 We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence. [sent-4, score-0.804]

4 ” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence. [sent-5, score-0.838]

5 ” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. [sent-6, score-1.503]

6 It is often important to allow researchers to analyze data without compromising the privacy of customers or leaking confidential information outside the organization. [sent-10, score-0.264]

7 In this paper we show that sparse regression for high dimensional data can be carried out directly on a compressed form of the data, in a manner that can be shown to guard privacy in an information theoretic sense. [sent-11, score-1.192]

8 These compressed data can then be made available for statistical analyses; we focus on the problem of sparse linear regression for high dimensional data. [sent-13, score-0.912]

9 Informally, our theory ensures that the relevant predictors can be learned from the compressed data as well as they could be from the original uncompressed data. [sent-14, score-1.07]

10 However, the original data are not recoverable from the compressed data, and the compressed data effectively reveal no more information than would be revealed by a completely new sample. [sent-16, score-1.486]

11 At the same time, the inference algorithms run faster and require fewer resources than the much larger uncompressed data would require. [sent-17, score-0.312]

12 The data are compressed by a random linear transformation X → X ≡ X , where is a random m × n matrix with m n. [sent-21, score-0.774]

13 Such transformations have been called “matrix masking” in the privacy literature [6]. [sent-23, score-0.28]

14 However, even with = 0 and known, recovering X from X requires solving a highly under-determined linear system and comes with information theoretic privacy guarantees, as we demonstrate. [sent-26, score-0.37]

15 In compressed regression, we assume that the response is also compressed, resulting in the transformed response Y ∈ Rm given by Y → Y ≡ Y = Xβ + = X β + . [sent-28, score-0.74]

16 In the sparse setting, the parameter β ∈ R p is sparse, with a relatively small number s = β 0 of nonzero coefficients in β. [sent-33, score-0.086]

17 The method we focus on is 1 -regularized least squares, also known as the lasso [17]. [sent-34, score-0.352]

18 We study the ability of the compressed lasso estimator to identify the correct sparse set of relevant variables and to predict well. [sent-35, score-1.201]

19 Our first result shows that the lasso is sparsistent under compression, meaning that the correct sparse set of relevant variables is identified asymptotically. [sent-37, score-0.501]

20 Our second result shows that the lasso is persistent under compression. [sent-40, score-0.432]

21 Roughly speaking, persistence [10] means that the procedure predicts well, as measured by the predictive risk R(β) = 2 E Y − β T X , where X ∈ R p is a new input vector and Y is the associated response. [sent-41, score-0.21]

22 P β 1 ≤L n,m R(β) −→ 0, as Our third result analyzes the privacy properties of compressed regression. [sent-45, score-0.997]

23 We evaluate privacy in information theoretic terms by bounding the average mutual information I ( X ; X )/np per matrix entry in the original data matrix X , which can be viewed as a communication rate. [sent-46, score-0.444]

24 Bounding this mutual information is intimately connected with the problem of computing the channel capacity of certain multiple-antenna wireless communication systems [13]. [sent-47, score-0.186]

25 2): The rate at which information about X is revealed by the compressed data X satisfies rn,m = sup I (X ; X ) = O m → 0, where the np n supremum is over distributions on the original data X . [sent-50, score-0.889]

26 As summarized by these results, compressed regression is a practical procedure for sparse learning in high dimensional data that has provably good properties. [sent-51, score-0.896]

27 Analyses of sparsistence, persistence and privacy properties appear in Section 3–5. [sent-53, score-0.386]

28 Simulations for sparsistence and persistence of the compressed lasso are presented in Section 6. [sent-54, score-1.252]

29 2 2 Background and Related Work In this section we briefly review related work in high dimensional statistical inference, compressed sensing, and privacy, to place our work in context. [sent-58, score-0.773]

30 An estimator that has received much attention in the recent literature is the 1 lasso βn [17], defined as βn = arg min 2n Y − Xβ 2 + λn β 1 , where λn is a regularization param2 eter. [sent-60, score-0.5]

31 In [14] it was shown that the lasso is consistent in the high dimensional setting under certain assumptions. [sent-61, score-0.413]

32 d, one cannot simply apply this result to the compressed case. [sent-70, score-0.712]

33 Persistence for the lasso was first defined and studied by Greenshtein and Ritov in [10]; we review their result in Section 4. [sent-71, score-0.352]

34 Compressed regression has close connections to, and draws motivation from compressed sensing [4, 2]. [sent-73, score-0.834]

35 However, in a sense, our motivation is the opposite of compressed sensing. [sent-74, score-0.73]

36 While compressed sensing of X allows a sparse X to be reconstructed from a small number of random measurements, our goal is to reconstruct a sparse function of X . [sent-75, score-0.9]

37 Indeed, from the point of view of privacy, approximately reconstructing X , which compressed sensing shows is possible if X is sparse, should be viewed as undesirable; we return to this point in Section ? [sent-76, score-0.762]

38 Several authors have considered variations on compressed sensing for statistical signal processing tasks [5, 11]. [sent-79, score-0.79]

39 They focus on certain hypothesis testing problems under sparse random measurements, and a generalization to classification of a signal into two or more classes. [sent-80, score-0.097]

40 Research on privacy in statistical data analysis has a long history, going back at least to [3]. [sent-86, score-0.264]

41 The privacy analysis is restricted to observing that recovering X from X requires solving an under-determined linear system. [sent-90, score-0.338]

42 We are not aware of previous work that analyzes the asymptotic properties of a statistical estimator under random projection in the high dimensional setting, giving information-theoretic guarantees, although an information-theoretic quantification of privacy was proposed in [1]. [sent-91, score-0.41]

43 We cast privacy in terms of the rate of information communicated about X through X , maximizing over all distributions on X , and identify this with the problem of bounding the Shannon capacity of a multi-antenna wireless channel, as modeled in [13]. [sent-92, score-0.402]

44 Finally, it is important to mention the active area of cryptographic approaches to privacy from the theoretical computer science community, for instance [9, 7]; however, this line of work is quite different from our approach. [sent-93, score-0.264]

45 The lasso refers to the following: (P1 ) min Y − Xβ 2 such that β 1 ≤ L. [sent-96, score-0.389]

46 In compressed regression we project each column X j ∈ Rn of X to a subspace of m dimensions, using an m × n random projection matrix . [sent-99, score-0.81]

47 Let X = X be the compressed design matrix, and 3 let Y = Y be the compressed response. [sent-100, score-1.443]

48 The compressed lasso is the following optimization problem, for Y = Xβ + = X + , with m being the set of optimal solutions: (a) ( P2 ) min 1 Y − Xβ 2m 2 2 + λm β 1 , (b) m = arg min β∈R p 1 Y − Xβ 2m 2 2 + λm β 1 . [sent-105, score-1.184]

49 (1) Although sparsistency is the primary goal in selecting the correct variables, our analysis establishes conditions for the stronger property of sign consistency: Definition 3. [sent-106, score-0.104]

50 (Sign Consistency) A set of estimators n is sign consistent with the true β if P ∃βn ∈ n s. [sent-108, score-0.08]

51 Clearly, if a set of estimators is sign consistent then it is sparsistent. [sent-112, score-0.08]

52 All recent work establishing results on sparsity recovery assumes some form of incoherence condition on the data matrix X . [sent-113, score-0.168]

53 Assume that X is S -incoherent, where S = supp (β ∗ ), and define s = |S| and ρm = mini∈S |βi∗ |. [sent-137, score-0.091]

54 (4) Then the compressed lasso is sparsistent: P supp (βm ) = supp (β) → 1 as m → ∞. [sent-143, score-1.246]

55 Roughly speaking, persistence implies that a procedure predicts well. [sent-146, score-0.139]

56 We review the arguments in [10] first; we then adapt it to the compressed case. [sent-147, score-0.712]

57 The predictive risk using predictor β T X is R(β) = E(Y − β T X )2 . [sent-150, score-0.097]

58 n (6) Given Bn = {β : β 1 ≤ L n } for L n = o (n/ log n)1/4 , we define the oracle predictor β∗,n = arg min β 1 ≤L n R(β), and the uncompressed lasso estimator βn = arg min β 1 ≤L n Rn (β). [sent-169, score-0.979]

59 Following arguments in [10], it can be shown that under Assumption 1 and given a sequence of sets of estimators Bn = {β : β 1 ≤ L n } for L n = o (n/ log n)1/4 , the sequence of uncompressed P lasso estimators βn = arg min β∈Bn Rn (β) is persistent, i. [sent-173, score-0.93]

60 For the compressed case, again we want to predict (X, Y ), but now the estimator βn,m is based on the lasso from the compressed data of size m n . [sent-177, score-1.821]

61 (7) 1/4 mn Given compressed sample size m n , let Bn,m = {β : β 1 ≤ L n,m }, where L n,m = o log(npn ) . [sent-182, score-0.731]

62 We define the compressed oracle predictor β∗,n,m = arg min β : β 1 ≤L n,m R(β) and the compressed lasso estimator βn,m = arg min β : β 1 ≤L n,m Rn,m (β). [sent-183, score-2.052]

63 The main difference between the sequence of compressed lasso estimators and the original uncompressed sequence is that n and m n together define the sequence of estimators for the compressed data. [sent-188, score-2.278]

64 In Section 6 we illustrate the compressed lasso persistency via simulations to compare the empirical risks with the oracle risks on such a subsequence for a fixed n. [sent-190, score-1.227]

65 5 Information Theoretic Analysis of Privacy Next we derive bounds on the rate at which the compressed data X reveal information about the uncompressed data X . [sent-191, score-1.086]

66 Our general approach is to consider the mapping X → X + as a noisy communication channel, where the channel is characterized by multiplicative noise and additive noise . [sent-192, score-0.14]

67 Since the number of symbols in X is np we normalize by this effective block length to define the information rate rn,m per symbol as rn,m = sup p(X ) I (X ; X ) . [sent-193, score-0.138]

68 Thus, we seek bounds on np the capacity of this channel. [sent-194, score-0.149]

69 A privacy guarantee is given in terms of bounds on the rate rn,m → 0 decaying to zero. [sent-195, score-0.303]

70 Intuitively, if the mutual information satisfies I (X ; X ) = H (X ) − H (X | X ) ≈ 0, then the compressed data X reveal, on average, no more information about the original data X than could be obtained from an independent sample. [sent-196, score-0.756]

71 The underlying channel is equivalent to the multiple antenna model for wireless communication [13], where there are n transmitter and m receiver antennas in a Raleigh flat-fading environment. [sent-197, score-0.219]

72 The propagation coefficients between pairs of transmitter and receiver antennas are modeled by the matrix entries i j ; they remain constant for a coherence interval of p time periods. [sent-198, score-0.137]

73 Computing the 5 channel capacity over multiple intervals requires optimization of the joint density of pn transmitted signals, the problem studied in [13]. [sent-199, score-0.123]

74 Suppose that E[X 2 ] ≤ P and the compressed data are formed by Z = X + γ , j where is m ×n with independent entries i j ∼ N (0, 1/n) and is m × p with independent entries ij ∼ N (0, 1). [sent-203, score-0.76]

75 Then the information rate rn,m satisfies rn,m = sup p(X ) I (X ; Z ) np ≤ m n log 1 + P γ2 . [sent-204, score-0.177]

76 When = 0, or equivalently γ = 0, which is the case assumed in our sparsistence and persistence results, the above analysis yields the trivial bound rn,m ≤ ∞. [sent-206, score-0.188]

77 Suppose that E[X 2 ] ≤ P and the compressed data are formed by Z = X , where j is m × n with independent entries i j ∼ N (0, 1/n). [sent-210, score-0.736]

78 Then the information rate rn,m satisfies m rn,m = sup p(X ) I (X ; Z ) ≤ 2n log (2πe P) . [sent-211, score-0.083]

79 np Under our sparsistency lower bound on m, the above upper bounds are rn,m = O(log(np)/n). [sent-212, score-0.172]

80 We note that these bounds may not be the best possible since they are obtained assuming knowledge of the compression matrix , when in fact the privacy protocol requires that and are not public. [sent-213, score-0.428]

81 We first present results that show the compressed lasso is comparable to the uncompressed lasso in recovering the sparsity pattern of the true linear model. [sent-215, score-1.835]

82 We then show results on persistence that are in close agreement with the theoretical results of Section 4. [sent-216, score-0.122]

83 Here we run simulations to compare the compressed lasso with the uncompressed lasso in terms of the probability of success in recovering the sparsity pattern of β ∗ . [sent-219, score-1.877]

84 A design parameter n is the compression factor f = m , which indicates how much the original data are compressed. [sent-221, score-0.16]

85 The results show that when the compression factor f is large enough, the thresholding behaviors as specified in (8) and (9) for the uncompressed lasso carry over to the compressed lasso, when X is drawn from a Gaussian ensemble. [sent-222, score-1.494]

86 In general, the compression factor f is well below the requirement that we have in Theorem 3. [sent-223, score-0.118]

87 Conversely, if then the probability of recovering the true variables using the lasso approaches zero. [sent-231, score-0.41]

88 In the following simulations, we carry out the lasso using procedure lars(Y, X ) that implements the LARS algorithm of [8] to calculate the full regularization path. [sent-232, score-0.372]

89 For the uncompressed case, we run lars(Y, X ) such that Y = Xβ ∗ + , and for the compressed case we run lars( Y, X ) such that Y = Xβ ∗ + . [sent-233, score-1.024]

90 The results show that the behavior under compression is close to the uncompressed case. [sent-235, score-0.43]

91 Here we solve the following 1 -constrained optimization problem β = arg min β 1 ≤L Y − Xβ 2 directly, based on algorithms described by [15]. [sent-237, score-0.083]

92 0 Figure 1: Plots of the number of samples versus the probability of success for recovering sgn(β ∗ ). [sent-281, score-0.085]

93 In the plots on the right, each curve has a compression factor f ∈ {5, 10, 20, 40, 80, 120} for the compressed lasso, thus n = f m; dashed lines mark θ = 1. [sent-287, score-0.83]

94 46 [19], for the uncompressed lasso in (8) and in (9). [sent-291, score-0.664]

95 18 n=9000, p=128, s=9 14 8 10 12 Risk 16 Uncompressed predictive Compressed predictive Compressed empirical 0 2000 4000 6000 8000 Compressed dimension m Figure 2: Risk versus compressed dimension. [sent-292, score-0.766]

96 , 0)T so ∗ ∗ that βb 1 > L n and βb ∈ Bn , and the uncompressed oracle predictive risk is R = 9. [sent-306, score-0.422]

97 For the compressed lasso, given n and pn , and a varying compressed sample size m, we take the ball Bn,m = {β : β 1 ≤ L n,m } where L n,m = m 1/4 / log(npn ). [sent-315, score-1.467]

98 The compressed lasso estimator βn,m for log2 (npn ) ≤ m ≤ n, is persistent over Bn,m by Theorem 4. [sent-316, score-1.189]

99 On the design and quantification of privacy preserving data mining algorithms. [sent-324, score-0.319]

100 Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. [sent-398, score-0.3]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('compressed', 0.712), ('lasso', 0.352), ('uncompressed', 0.312), ('privacy', 0.264), ('persistence', 0.122), ('compression', 0.118), ('sgn', 0.103), ('np', 0.094), ('supp', 0.091), ('npn', 0.082), ('persistent', 0.08), ('sparse', 0.069), ('sparsistence', 0.066), ('channel', 0.064), ('bn', 0.06), ('recovering', 0.058), ('sparsistency', 0.057), ('sparsistent', 0.057), ('regression', 0.054), ('sensing', 0.05), ('estimators', 0.049), ('arg', 0.046), ('lars', 0.046), ('estimator', 0.045), ('rn', 0.044), ('toeplitz', 0.044), ('risk', 0.044), ('dimensional', 0.043), ('prob', 0.043), ('greenshtein', 0.043), ('oracle', 0.039), ('log', 0.039), ('min', 0.037), ('preserving', 0.036), ('recovery', 0.035), ('communication', 0.034), ('capacity', 0.034), ('sparsity', 0.033), ('antennas', 0.033), ('communicated', 0.033), ('dential', 0.033), ('incoherence', 0.033), ('transmitter', 0.033), ('wireless', 0.033), ('theoretic', 0.032), ('simulations', 0.031), ('sign', 0.031), ('transformed', 0.028), ('ensemble', 0.028), ('signal', 0.028), ('success', 0.027), ('predictive', 0.027), ('predictor', 0.026), ('persistency', 0.026), ('ritov', 0.026), ('fp', 0.026), ('sup', 0.026), ('establishing', 0.025), ('pn', 0.025), ('matrix', 0.025), ('entries', 0.024), ('satis', 0.024), ('sequence', 0.023), ('original', 0.023), ('risks', 0.023), ('reveal', 0.023), ('relevant', 0.023), ('receiver', 0.022), ('mutual', 0.021), ('transformation', 0.021), ('subsequence', 0.021), ('approaching', 0.021), ('analyzes', 0.021), ('bounds', 0.021), ('noise', 0.021), ('theorem', 0.02), ('regularization', 0.02), ('bounding', 0.02), ('august', 0.019), ('mn', 0.019), ('projection', 0.019), ('proofs', 0.019), ('rm', 0.019), ('design', 0.019), ('rate', 0.018), ('motivation', 0.018), ('high', 0.018), ('speaking', 0.018), ('ball', 0.018), ('analyses', 0.017), ('condition', 0.017), ('nonzero', 0.017), ('predicts', 0.017), ('consistency', 0.017), ('transformations', 0.016), ('revealed', 0.016), ('es', 0.016), ('linear', 0.016), ('conditions', 0.016), ('suppose', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 53 nips-2007-Compressed Regression

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1

2 0.16048899 43 nips-2007-Catching Change-points with Lasso

Author: Céline Levy-leduc, Zaïd Harchaoui

Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1

3 0.10607809 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1

4 0.10150715 8 nips-2007-A New View of Automatic Relevance Determination

Author: David P. Wipf, Srikantan S. Nagarajan

Abstract: Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted 1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation. 1

5 0.098181151 99 nips-2007-Hierarchical Penalization

Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux

Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1

6 0.064920045 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

7 0.062869996 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

8 0.061244499 161 nips-2007-Random Projections for Manifold Learning

9 0.057858735 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

10 0.056134593 189 nips-2007-Supervised Topic Models

11 0.04612061 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

12 0.03889893 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

13 0.037908889 181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

14 0.034887608 147 nips-2007-One-Pass Boosting

15 0.033664495 61 nips-2007-Convex Clustering with Exemplar-Based Models

16 0.032916032 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

17 0.032815125 145 nips-2007-On Sparsity and Overcompleteness in Image Models

18 0.031876821 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis

19 0.031368852 37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning

20 0.027944075 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.121), (1, 0.018), (2, -0.033), (3, 0.025), (4, -0.01), (5, 0.012), (6, -0.019), (7, 0.024), (8, -0.053), (9, 0.046), (10, 0.045), (11, 0.026), (12, 0.05), (13, -0.027), (14, 0.102), (15, 0.023), (16, 0.113), (17, 0.1), (18, -0.063), (19, -0.223), (20, -0.04), (21, -0.06), (22, 0.06), (23, -0.008), (24, 0.087), (25, 0.12), (26, -0.031), (27, -0.015), (28, -0.129), (29, 0.174), (30, 0.042), (31, 0.02), (32, -0.165), (33, 0.177), (34, 0.103), (35, 0.012), (36, -0.061), (37, 0.012), (38, 0.108), (39, 0.087), (40, 0.013), (41, -0.056), (42, 0.046), (43, -0.117), (44, -0.053), (45, -0.034), (46, -0.098), (47, -0.016), (48, -0.055), (49, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94931269 53 nips-2007-Compressed Regression

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1

2 0.74013597 43 nips-2007-Catching Change-points with Lasso

Author: Céline Levy-leduc, Zaïd Harchaoui

Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1

3 0.72713143 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1

4 0.62012815 99 nips-2007-Hierarchical Penalization

Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux

Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1

5 0.51009375 8 nips-2007-A New View of Automatic Relevance Determination

Author: David P. Wipf, Srikantan S. Nagarajan

Abstract: Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted 1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation. 1

6 0.44800192 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

7 0.35252976 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

8 0.35215265 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

9 0.32586214 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

10 0.32026264 159 nips-2007-Progressive mixture rules are deviation suboptimal

11 0.28497306 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

12 0.27902576 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

13 0.25793025 96 nips-2007-Heterogeneous Component Analysis

14 0.24649426 37 nips-2007-Blind channel identification for speech dereverberation using l1-norm sparse learning

15 0.24540019 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

16 0.23981948 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

17 0.23122258 145 nips-2007-On Sparsity and Overcompleteness in Image Models

18 0.22880001 161 nips-2007-Random Projections for Manifold Learning

19 0.22832412 176 nips-2007-Sequential Hypothesis Testing under Stochastic Deadlines

20 0.22399543 61 nips-2007-Convex Clustering with Exemplar-Based Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.026), (5, 0.048), (13, 0.031), (18, 0.017), (19, 0.033), (21, 0.054), (31, 0.025), (34, 0.02), (35, 0.03), (47, 0.113), (49, 0.023), (64, 0.246), (83, 0.121), (85, 0.02), (87, 0.019), (90, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89202869 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

Author: Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V. Buenau, Motoaki Kawanabe

Abstract: A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance, is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Simulations illustrate the usefulness of our approach. 1

2 0.80397218 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data

Author: Peter Hoff

Abstract: This article discusses a latent variable model for inference and prediction of symmetric relational data. The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-specific vectors of latent characteristics. This “eigenmodel” generalizes other popular latent variable models, such as latent class and distance models: It is shown mathematically that any latent class or distance model has a representation as an eigenmodel, but not vice-versa. The practical implications of this are examined in the context of three real datasets, for which the eigenmodel has as good or better out-of-sample predictive performance than the other two models. 1

same-paper 3 0.78947866 53 nips-2007-Compressed Regression

Author: Shuheng Zhou, Larry Wasserman, John D. Lafferty

Abstract: Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original n input variables are compressed by a random linear transformation to m n examples in p dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for 1 -regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called “sparsistence.” In addition, we show that 1 -regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called “persistence.” Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero. 1

4 0.60721493 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1

5 0.60392302 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1

6 0.60177344 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

7 0.5996927 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

8 0.59952939 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

9 0.59884721 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection

10 0.5984239 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

11 0.59793311 86 nips-2007-Exponential Family Predictive Representations of State

12 0.59788156 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

13 0.59562689 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

14 0.59537411 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI

15 0.59483945 115 nips-2007-Learning the 2-D Topology of Images

16 0.59456259 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

17 0.59353524 43 nips-2007-Catching Change-points with Lasso

18 0.59178692 24 nips-2007-An Analysis of Inference with the Universum

19 0.59095877 158 nips-2007-Probabilistic Matrix Factorization

20 0.58869213 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model