nips nips2012 nips2012-43 knowledge-graph by maker-knowledge-mining

43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning


Source: pdf

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. [sent-13, score-0.241]

2 We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. [sent-14, score-0.437]

3 Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. [sent-15, score-0.188]

4 Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. [sent-19, score-0.421]

5 This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. [sent-20, score-0.324]

6 The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees. [sent-21, score-0.154]

7 1 Introduction Consider the estimation of a random vector x ∈ Rn from a measurement vector y ∈ Rm . [sent-22, score-0.12]

8 components xj ∼ PX , is passed through a known linear transform that outputs z = Ax ∈ Rm . [sent-26, score-0.08]

9 The components of y ∈ Rm are generated by a componentwise transfer function PY |Z . [sent-27, score-0.14]

10 prior PX (x|λx ) passes through the linear transform A ∈ Rm×n followed by a componentwise nonlinear channel PY |Z (y|z, λz ) to result in y ∈ Rm . [sent-36, score-0.162]

11 We propose adaptive GAMP to jointly estimate x and (λx , λz ) given the measurements y. [sent-38, score-0.163]

12 sensing for estimation of sparse vectors x from underdetermined measurements [3–5]. [sent-39, score-0.233]

13 In recent years, there has been considerable interest in so-called approximate message passing (AMP) methods for this estimation problem. [sent-41, score-0.226]

14 The AMP techniques use Gaussian and quadratic approximations of loopy belief propagation (LBP) to provide estimation methods that are computationally efficient, general and analytically tractable. [sent-42, score-0.116]

15 When the parameters λx and λz are unknown, various extensions have been proposed including combining AMP methods with Expectation Maximization (EM) estimation [9–12] and hybrid graphical models approaches [13]. [sent-44, score-0.067]

16 In this work, we present a novel method for joint parameter and vector estimation called adaptive generalized AMP (adaptive GAMP), that extends the GAMP method of [14]. [sent-45, score-0.215]

17 We present two major theoretical results related to adaptive GAMP: We first show that, similar to the analysis of the standard GAMP algorithm, the componentwise asymptotic behavior of adaptive GAMP can be exactly described by a simple scalar state evolution (SE) equations [14–18]. [sent-46, score-0.559]

18 An important consequence of this result is a theoretical justification to the EM-GAMP algorithm in [9–12] which is a special case of adaptive GAMP with a particular choice of adaptation functions. [sent-47, score-0.229]

19 Our second result demonstrates the asymptotic consistency of adaptive GAMP when adaptation functions correspond to the maximum-likelihood (ML) parameter estimation. [sent-48, score-0.374]

20 We show that when the ML estimation is computed exactly, the estimated parameters converge to the true values and the performance of adaptive GAMP asymptotically coincides with the performance of the oracle GAMP algorithms that knows correct parameter values. [sent-49, score-0.378]

21 Adaptive GAMP thus provides a computationally-efficient method for solving a wide variety of joint estimation and learning problems with a simple, exact performance characterization and provable conditions for asymptotic consistency. [sent-50, score-0.164]

22 2 Adaptive GAMP Approximate message passing (AMP) refers to a class of algorithms based on Gaussian approximations of loopy belief propagation (LBP) for the estimation of the vectors x and z according to the model described in Section 1. [sent-52, score-0.246]

23 These methods originated from CDMA multiuser detection problems in [15, 20, 21]; more recently, they have attracted considerable attention in compressed sensing [17, 18, 22]. [sent-53, score-0.208]

24 The key benefits of AMP methods are their computational performance, their large domain of application, and, for certain large random A, their exact asymptotic performance characterizations with testable conditions for optimality [15–18]. [sent-55, score-0.073]

25 This paper considers an adaptive version of the so-called generalized AMP (GAMP) method of [14] that extends the algorithm in [22] to arbitrary output distributions PY |Z . [sent-56, score-0.159]

26 We propose an adaptive GAMP, shown in Algorithm 1, to allow for simultaneous estimation of the distributions PX and PY |Z along with the estimation of x and z. [sent-58, score-0.293]

27 Algorithm 1 Adaptive GAMP t t Require: Matrix A, estimation functions Gt , Gt and Gt and adaptation functions Hx and Hz . [sent-61, score-0.22]

28 , n x t+1 t 16: τx ← (τr /n) 17: until Terminated j t t ∂Gt (rj , τr , λt )/∂rj x x The choice of the estimation and adaptation functions allows for considerable flexibility in the algorithm. [sent-71, score-0.226]

29 The adaptation functions can also be selected for a number of different parameter-estimation strategies. [sent-73, score-0.126]

30 Because of space limitation, we present only the estimation functions for the sum-product GAMP algorithm from [14] along with an ML-type adaptation. [sent-74, score-0.094]

31 (3b) The estimation functions (2) correspond to scalar estimates of random variables in additive white Gaussian noise (AWGN). [sent-77, score-0.169]

32 (λx , λz ) = (λx , λz )), the outputs xt and zt can be interpreted as sum products estimates of the conditional expectations E(x|y) and E(z|y). [sent-80, score-0.071]

33 The algorithm thus reduces the vector-valued estimation problem to a computationally simple sequence of scalar AWGN estimation problems along with linear transforms. [sent-81, score-0.181]

34 t t The estimation functions Hx and Hz in Algorithm 1 produce the estimates for the parameters λx t t and λz . [sent-82, score-0.122]

35 In the special case when Hx and Hz produce fixed outputs t t t Hz (pt , yt , τp ) = λz , 3 t t t Hx (rt , τr ) = λx , t t for pre-computed values of λz and λx , the adaptive GAMP algorithm reduces to the standard (nonadaptive) GAMP algorithm of [14]. [sent-83, score-0.156]

36 When the parameters λx and λz are unknown, it has been proposed in [9–12] that they can be estimated via an EM method that exploits that fact that GAMP provides estimates of the posterior distributions of x and z given the current parameter estimates. [sent-85, score-0.075]

37 As described in the full paper [19], this EM-GAMP method corresponds to a special case of the Adaptive GAMP method for a particular t t choice of the adaptation functions Hx and Hz . [sent-86, score-0.126]

38 However, in this work, we consider an alternate parameter estimation method based on ML adaptation. [sent-87, score-0.085]

39 Remarkably, the distributions of the components of rt and (pt , yt ) will follow (4) even if the estimation functions in the iterations prior to t used the incorrect parameter values. [sent-89, score-0.237]

40 1 Convergence and Asymptotic Consistency with Gaussian Transforms General State Evolution Analysis Before proving the asymptotic consistency of the adaptive GAMP method with ML adaptation, we first prove a more general convergence result. [sent-92, score-0.23]

41 Similar to the SE analyses in [14, 18] we consider the asymptotic behavior of the adaptive GAMP algorithm with large i. [sent-94, score-0.203]

42 Assumption 1 Consider the adaptive GAMP algorithm running on a sequence of problems indexed by the dimension n, satisfying the following: (a) For each n, the matrix A ∈ Rm×n has i. [sent-100, score-0.13]

43 (b) The input vectors x and initial condition x0 are deterministic sequences whose components converge empirically with bounded moments of order s = 2k − 2 as PL(s) lim (x, x0 ) = (X, X 0 ), n→∞ 4 (7) to some random vector (X, X 0 ) for k = 2. [sent-104, score-0.15]

44 (c) The output vectors z and y ∈ Rm are generated by z = Ax, y = h(z, w), (8) for some scalar function h(z, w) where the disturbance vector w is deterministic, but empirically converges as PL(s) lim w = W, (9) n→∞ with s = 2k−2, k = 2 and W is some random variable. [sent-106, score-0.135]

45 (d) Suitable continuity assumptions on the estimation functions Gt , Gt and Gt and adaptation x z s t t functions Hx and Hz – see [19] for details. [sent-108, score-0.247]

46 , n}, j t t θz := {(zi , zi , yi , pt ), i = 1, . [sent-112, score-0.069]

47 i (10) t The first vector set, θx , represents the components of the the “true,” but unknown, input vector x, its t adaptive GAMP estimate xt as well as rt . [sent-116, score-0.226]

48 The second vector, θz , contains the components of the “true,” but unknown, output vector z, its GAMP estimate zt , as well as pt and the observed input y. [sent-117, score-0.12]

49 Our main result, Theorem 1 below, characterizes the asymptotic joint distribution of the components of these two sets as n → ∞. [sent-119, score-0.107]

50 t t Specifically, we will show that the empirical distribution of the components of θx and θz converge to a random vectors of the form t t θx := (X, Rt , X t+1 ), θz := (Z, Z t , Y, P t ), (11) where X is the random variable in the initial condition (7). [sent-120, score-0.075]

51 The determinp istic constants above can be computed iteratively with the following state evolution (SE) equations shown in Algorithm 2. [sent-123, score-0.073]

52 Then, for any fixed t, almost surely, the components of θx and θz converge empirically with bounded moments of order k = 2 as PL(k) t t lim θx = θx , t lim θz n→∞ t n→∞ PL(k) t = θz . [sent-126, score-0.15]

53 In addition, for any t, the limits t lim λt = λx , x n t lim λt = λz , z t lim τr = τ t , r n n t lim τp = τ t , p n (18) also hold almost surely. [sent-128, score-0.173]

54 Similar to several other analyses of AMP algorithms such as [14–18], the theorem provides a scalar equivalent model for the componentwise behavior of the adaptive GAMP method. [sent-129, score-0.283]

55 That is, asympt t totically the components of the sets θx and θz in (10) are distributed identically to simple scalar random variables. [sent-130, score-0.081]

56 From this scalar equivalent model, one can compute a large class of componentwise performance metrics such as mean-squared error (MSE) or detection error rates. [sent-136, score-0.153]

57 Thus, the SE analysis shows that for, essentially arbitrary estimation and adaptation functions, and distributions on the true input and disturbance, we can exactly evaluate the asymptotic behavior of the adaptive GAMP algorithm. [sent-137, score-0.419]

58 In addition, when the parameter values λx and λz are fixed, the SE equations in Algorithm 2 reduce to SE equations for the standard (non-adaptive) GAMP algorithm described in [14]. [sent-138, score-0.074]

59 2 Asymptotic Consistency with ML Adaptation The general result, Theorem 1, can be applied to the adaptive GAMP algorithm with arbitrary estimation and adaptation function. [sent-140, score-0.296]

60 Here, we use the result to prove the asymptotic parameter consistency of Adaptive GAMP with ML adaptation. [sent-142, score-0.118]

61 The key point is to realize that the distributions (12) and (13) exactly match the distributions (4) assumed by the ML adaptation functions (5). [sent-143, score-0.184]

62 Thus, the ML adaptation should work provided that the maximizations in (5) yield the correct parameter estimates. [sent-144, score-0.134]

63 The functions φx and φz can be the log likelihood functions (6a) and (6b), although we permit other functions as well. [sent-157, score-0.081]

64 Consider the outputs of the adaptive GAMP algorithm using the ML adaptation functions (5) using the functions φx and φz and parameter sets in Definitions 1 and 2. [sent-160, score-0.327]

65 Then, under suitable continuity z z conditions (see [19] for details), for any fixed t, t t (a) The components of θx and θz in (10) converge empirically with bounded moments of order k = 2 as in (17) and the limits (18) hold almost surely. [sent-162, score-0.116]

66 p z z The theorem shows, remarkably, that for a very large class of the parameterized distributions, the adaptive GAMP algorithm with ML adaptation is able to asymptotically estimate the correct parameters. [sent-165, score-0.271]

67 Also, once the consistency limits in (b) and (c) hold, the SE equations in Algorithm 2 reduce to the SE equations for the non-adaptive GAMP method running with the true parameters. [sent-166, score-0.121]

68 Thus, we conclude there is asymptotically no performance loss between the adaptive GAMP algorithm and a corresponding oracle GAMP algorithm that knows the correct parameters in the sense that the empirical distributions of the algorithm outputs are described by the same SE equations. [sent-167, score-0.307]

69 4 Numerical Example: Estimation of a Gauss-Bernoulli input Recent results suggest that there is considerable value in learning of priors PX in the context of compressed sensing [25], which considers the estimation of sparse vectors x from underdetermined measurements (m < n) . [sent-168, score-0.333]

70 Here, we present a simple simulation to illustrate the performance gain of adaptive GAMP and its asymptotic consistency. [sent-172, score-0.203]

71 2 compares the performance of adaptive GAMP for estimation of a sparse Gauss-Bernoulli signal x ∈ Rn from m noisy measurements y = Ax + w, 7 7 MSE (dB) MSE (dB) 9 15 MSE (dB) MSE (dB) 8 10 State Evolution LASSO Oracle GAMP Adaptive GAMP 10 11 20 25 12 30 13 14 0. [sent-174, score-0.278]

72 5 Measurement ratio Measurement ratio ((m/n)) (a) 35 10 2 3 2 10 Noise variance Noise Variance (( 2) ) (b) 10 1 Figure 2: Reconstruction of a Gauss-Bernoulli signal from noisy measurements. [sent-176, score-0.076]

73 The average reconstruction MSE is plotted against (a) measurement ratio m/n and (b) AWGN variance σ 2 . [sent-177, score-0.122]

74 The plots illustrate that adaptive GAMP yields considerable improvement over 1 -based LASSO estimator. [sent-178, score-0.163]

75 Moreover, it exactly matches the performance of oracle GAMP that knows the prior parameters. [sent-179, score-0.08]

76 The signal of length n = 400 has 20% nonzero components drawn from the Gaussian distribution of variance 5. [sent-184, score-0.078]

77 Adaptive GAMP uses EM iterations, which are used to approximate ML parameter estimation, to jointly 2 recover the unknown signal x and the true parameters λx = (ρ = 0. [sent-185, score-0.108]

78 The performance of adaptive GAMP is compared to that of LASSO with MSE optimal regularization parameter, and oracle GAMP that knows the parameters of the prior exactly. [sent-187, score-0.21]

79 1 and plot the average MSE of the reconstruction against the measurement ratio m/n. [sent-193, score-0.105]

80 In Figure 2(b), we keep the measurement ratio fixed to m/n = 0. [sent-194, score-0.069]

81 For completeness, we also provide the asymptotic MSE values computed via SE recursion. [sent-196, score-0.073]

82 Moreover, the results corroborate the consistency of adaptive GAMP which achieves nearly identical quality of reconstruction with oracle GAMP. [sent-198, score-0.228]

83 The performance results here and in [19] indicate that adaptive GAMP can be an effective method for estimation when the parameters of the problem are difficult to characterize and must be estimated from data. [sent-199, score-0.197]

84 5 Conclusions and Future Work We have presented an adaptive GAMP method for the estimation of i. [sent-200, score-0.197]

85 vectors x observed through a known linear transforms followed by an arbitrary, componentwise random transform. [sent-203, score-0.154]

86 The procedure, which is a generalization of EM-GAMP methodology of [9, 10], estimates both the vector x as well as parameters in the source and componentwise output transform. [sent-204, score-0.134]

87 Gaussian transforms with ML parameter estimation, it is shown that the adaptive GAMP method is provably asymptotically consistent in that the parameter estimates converge to the true values. [sent-208, score-0.287]

88 Moreover, the algorithm is computationally efficient since it reduces the vector-valued estimation problem to a sequence of scalar estimation problems in Gaussian noise. [sent-210, score-0.181]

89 We have mentioned the use of the method for learning sparse priors in compressed sensing. [sent-212, score-0.088]

90 [10] ——, “Expectation-maximization Gaussian-mixture approximate message passing,” in Proc. [sent-283, score-0.074]

91 Zdeborov´ , “Statistical physics-based reconstruction e a in compressed sensing,” arXiv:1109. [sent-295, score-0.103]

92 [12] ——, “Probabilistic reconstruction in compressed sensing: Algorithms, phase diagrams, and threshold achieving matrices,” arXiv:1206. [sent-298, score-0.103]

93 Schniter, “Hybrid generalized approximation message passing with applications to structured sparsity,” in Proc. [sent-307, score-0.109]

94 Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. [sent-315, score-0.193]

95 Wang, “Asymptotic mean-square optimality of belief propagation for sparse linear systems,” in Proc. [sent-326, score-0.07]

96 Rangan, “Estimation with random linear mixing, belief propagation and compressed sensing,” in Proc. [sent-339, score-0.116]

97 Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. [sent-349, score-0.176]

98 Unser, “Approximate message passing with consistent parameter estimation and applications to sparse learning,” arXiv:1207. [sent-362, score-0.215]

99 Caire, “Iterative multiuser joint decoding: Unified framework and asymptotic analysis,” IEEE Trans. [sent-368, score-0.11]

100 Montanari, “Compressed sensing over mean square error,” in Proc. [sent-423, score-0.071]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gamp', 0.834), ('py', 0.17), ('px', 0.137), ('gt', 0.135), ('adaptive', 0.13), ('kp', 0.117), ('amp', 0.113), ('componentwise', 0.106), ('adaptation', 0.099), ('sx', 0.096), ('se', 0.092), ('ml', 0.092), ('hx', 0.087), ('sz', 0.077), ('hz', 0.074), ('asymptotic', 0.073), ('mse', 0.071), ('sensing', 0.071), ('rangan', 0.07), ('pt', 0.069), ('compressed', 0.067), ('estimation', 0.067), ('rt', 0.062), ('message', 0.057), ('measurement', 0.053), ('passing', 0.052), ('vx', 0.049), ('scalar', 0.047), ('knows', 0.045), ('evolution', 0.045), ('kt', 0.043), ('awgn', 0.042), ('lim', 0.039), ('fletcher', 0.037), ('multiuser', 0.037), ('reconstruction', 0.036), ('channel', 0.036), ('lbp', 0.036), ('oracle', 0.035), ('components', 0.034), ('vz', 0.034), ('rj', 0.034), ('measurements', 0.033), ('considerable', 0.033), ('db', 0.032), ('cascade', 0.029), ('montanari', 0.029), ('lasso', 0.029), ('pl', 0.029), ('distributions', 0.029), ('propagation', 0.029), ('equations', 0.028), ('rm', 0.028), ('estimates', 0.028), ('cdma', 0.028), ('disturbance', 0.028), ('kamilov', 0.028), ('petersburg', 0.028), ('unser', 0.028), ('consistency', 0.027), ('ax', 0.027), ('functions', 0.027), ('signal', 0.027), ('transforms', 0.027), ('continuity', 0.027), ('identi', 0.027), ('outputs', 0.026), ('asymptotically', 0.025), ('unknown', 0.025), ('russia', 0.025), ('schniter', 0.025), ('provable', 0.024), ('justify', 0.024), ('maleki', 0.023), ('te', 0.023), ('mmse', 0.021), ('gaussian', 0.021), ('sparse', 0.021), ('vectors', 0.021), ('true', 0.021), ('belief', 0.02), ('underdetermined', 0.02), ('transform', 0.02), ('converge', 0.02), ('epfl', 0.019), ('bayesian', 0.019), ('deterministic', 0.018), ('moments', 0.018), ('rigorously', 0.018), ('parameter', 0.018), ('nonlinearities', 0.017), ('zt', 0.017), ('limits', 0.017), ('limn', 0.017), ('approximate', 0.017), ('correct', 0.017), ('variance', 0.017), ('donoho', 0.017), ('gs', 0.017), ('ratio', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

2 0.087864697 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

Author: Sander M. Bohte

Abstract: Neural adaptation underlies the ability of neurons to maximize encoded information over a wide dynamic range of input stimuli. Recent spiking neuron models like the adaptive Spike Response Model implement adaptation as additive fixed-size fast spike-triggered threshold dynamics and slow spike-triggered currents. Such adaptation accurately models neural spiking behavior over a limited dynamic input range. To extend efficient coding over large changes in dynamic input range, we propose a multiplicative adaptive Spike Response Model where the spike-triggered adaptation dynamics are scaled multiplicatively by the adaptation state at the time of spiking. We show that, unlike the additive adaptation model, the firing rate in our multiplicative adaptation model saturates to a realistic maximum spike-rate regardless of input magnitude. Additionally, when simulating variance switching experiments, the model quantitatively fits experimental data over a wide dynamic range. Dynamic threshold models of adaptation furthermore suggest a straightforward interpretation of neural activity in terms of dynamic differential signal encoding with shifted and weighted exponential kernels. We show that when thus encoding rectified filtered stimulus signals, the multiplicative adaptive Spike Response Model achieves a high coding efficiency and maintains this efficiency over changes in the dynamic signal range of several orders of magnitude, without changing model parameters. 1

3 0.085443117 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

Author: Evan Archer, Il M. Park, Jonathan W. Pillow

Abstract: We consider the problem of estimating Shannon’s entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they provide natural priors for Bayesian entropy estimation, due to the analytic tractability of the moments of the induced posterior distribution over entropy H. We derive formulas for the posterior mean and variance of H given data. However, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the estimate in the under-sampled regime. We therefore define a family of continuous mixing measures such that the resulting mixture of Dirichlet or Pitman-Yor processes produces an approximately flat prior over H. We explore the theoretical properties of the resulting estimators and show that they perform well on data sampled from both exponential and power-law tailed distributions. 1

4 0.078065865 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

5 0.06654837 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

6 0.061447561 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

7 0.060474601 142 nips-2012-Generalization Bounds for Domain Adaptation

8 0.060461365 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

9 0.06013336 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

10 0.059575636 60 nips-2012-Bayesian nonparametric models for ranked data

11 0.057329562 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

12 0.055889454 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

13 0.053901598 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

14 0.050500959 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

15 0.04551369 187 nips-2012-Learning curves for multi-task Gaussian process regression

16 0.045375839 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

17 0.044372961 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

18 0.042464808 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

19 0.042088997 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

20 0.038921535 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.114), (1, 0.012), (2, 0.045), (3, 0.032), (4, -0.006), (5, 0.032), (6, 0.007), (7, 0.026), (8, -0.009), (9, -0.026), (10, 0.002), (11, 0.005), (12, -0.004), (13, -0.052), (14, -0.021), (15, -0.008), (16, 0.041), (17, -0.025), (18, -0.01), (19, -0.051), (20, 0.036), (21, 0.055), (22, 0.012), (23, 0.017), (24, 0.003), (25, 0.065), (26, 0.018), (27, 0.038), (28, 0.043), (29, -0.01), (30, 0.029), (31, -0.008), (32, 0.086), (33, -0.045), (34, -0.12), (35, -0.015), (36, 0.108), (37, 0.052), (38, 0.003), (39, -0.064), (40, 0.012), (41, -0.042), (42, 0.014), (43, 0.02), (44, -0.031), (45, 0.059), (46, -0.026), (47, 0.047), (48, -0.068), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87302786 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

2 0.57142943 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

3 0.5481894 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

Author: Evan Archer, Il M. Park, Jonathan W. Pillow

Abstract: We consider the problem of estimating Shannon’s entropy H in the under-sampled regime, where the number of possible symbols may be unknown or countably infinite. Dirichlet and Pitman-Yor processes provide tractable prior distributions over the space of countably infinite discrete distributions, and have found major applications in Bayesian non-parametric statistics and machine learning. Here we show that they provide natural priors for Bayesian entropy estimation, due to the analytic tractability of the moments of the induced posterior distribution over entropy H. We derive formulas for the posterior mean and variance of H given data. However, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior on H, meaning the prior strongly determines the estimate in the under-sampled regime. We therefore define a family of continuous mixing measures such that the resulting mixture of Dirichlet or Pitman-Yor processes produces an approximately flat prior over H. We explore the theoretical properties of the resulting estimators and show that they perform well on data sampled from both exponential and power-law tailed distributions. 1

4 0.54457057 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

5 0.53426617 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

6 0.47230226 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

7 0.45965582 142 nips-2012-Generalization Bounds for Domain Adaptation

8 0.45226958 221 nips-2012-Multi-Stage Multi-Task Feature Learning

9 0.45087606 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA

10 0.44478899 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

11 0.44468388 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

12 0.44356573 222 nips-2012-Multi-Task Averaging

13 0.44163418 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

14 0.43297365 211 nips-2012-Meta-Gaussian Information Bottleneck

15 0.42501205 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

16 0.42264098 95 nips-2012-Density-Difference Estimation

17 0.40100273 198 nips-2012-Learning with Target Prior

18 0.40028998 225 nips-2012-Multi-task Vector Field Learning

19 0.39913952 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

20 0.38922191 150 nips-2012-Hierarchical spike coding of sound


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.033), (1, 0.265), (21, 0.04), (38, 0.125), (39, 0.022), (42, 0.042), (54, 0.02), (55, 0.012), (74, 0.032), (76, 0.122), (80, 0.107), (92, 0.061), (94, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81282645 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1

2 0.80794001 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

Abstract: A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. In this paper, we represent patient risk as a time series. In doing so, patient risk stratification becomes a time-series classification task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must first be extracted. Thus, we begin by defining and extracting approximate risk processes, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classification with the goal of identifying high-risk patterns. We apply the classification to the specific task of identifying patients at risk of testing positive for hospital acquired Clostridium difficile. We achieve an area under the receiver operating characteristic curve of 0.79 on a held-out set of several hundred patients. Our two-stage approach to risk stratification outperforms classifiers that consider only a patient’s current state (p<0.05). 1

same-paper 3 0.75849378 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identification of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efficient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

4 0.71508712 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms. 1

5 0.70511937 188 nips-2012-Learning from Distributions via Support Measure Machines

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

6 0.6978792 34 nips-2012-Active Learning of Multi-Index Function Models

7 0.64934874 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

8 0.63613683 65 nips-2012-Cardinality Restricted Boltzmann Machines

9 0.63405895 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

10 0.63374603 292 nips-2012-Regularized Off-Policy TD-Learning

11 0.63332868 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

12 0.63301027 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

13 0.63226312 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.63164485 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

15 0.63135821 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

16 0.63093233 200 nips-2012-Local Supervised Learning through Space Partitioning

17 0.63001621 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

18 0.629592 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

19 0.62890661 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

20 0.6288377 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions