nips nips2012 nips2012-79 knowledge-graph by maker-knowledge-mining

79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities


Source: pdf

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. [sent-3, score-0.213]

2 The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. [sent-4, score-0.98]

3 The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. [sent-5, score-0.672]

4 Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. [sent-7, score-0.479]

5 Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. [sent-8, score-0.682]

6 If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. [sent-9, score-0.687]

7 Until recently, it was generally thought that encoding of sparse signals required dense sampling at a rate greater than or equal to signal bandwidth. [sent-20, score-0.296]

8 Here I apply such compression to sparse probability distributions over binary variables, which are, after all, just signals with some particular properties. [sent-22, score-0.388]

9 1 In most applications of compressive sensing, the ultimate goal is to reconstruct the original signal efficiently. [sent-23, score-0.63]

10 Instead, we use the guarantees that the signal could be reconstructed to ensure that the signal is accurately represented by its compressed version. [sent-25, score-0.526]

11 We don’t expect that the brain needs to explicitly reconstruct a probability distribution in some canonical mathematical representation in order to gain the advantages of probabilistic reasoning. [sent-27, score-0.259]

12 Traditional compressive sensing considers signals that lives in an N -dimensional space but have only S nonzero coordinates in some basis. [sent-28, score-0.797]

13 If we were told the location of the nonzero entries, then we would need only S measurements to characterize their coefficients and thus the entire signal. [sent-30, score-0.241]

14 But even if we don’t know where those entries are, it still takes little more than S linear measurements to perfectly reconstruct the signal. [sent-31, score-0.268]

15 Furthermore, those measurements can be fixed in advance without any knowledge of the structure of the signal. [sent-32, score-0.191]

16 The basic mathematical setup of compressive sensing is as follows. [sent-34, score-0.682]

17 We make M linear measurements y of this signal by applying the M × N matrix A: y = As (1) We would then like to recover the original signal s from these measurements. [sent-36, score-0.438]

18 Compressive sensing is generally robust to two deviations from this ideal setup. [sent-38, score-0.239]

19 Second, measurements may be corrupted by noise with bounded ˆ amplitude . [sent-42, score-0.191]

20 Under these conditions, the error of the 1 -reconstructed signal s is bounded by the error of the best S-sparse approximation sS plus a term proportional to the measurement noise: √ ˆ s − s 2 ≤ C0 sS − s 2 / S + C1 (3) for some constants C0 and C1 [8]. [sent-43, score-0.261]

21 Several conditions on A have been used in compressive sensing to guarantee good performance [4, 6, 9, 10, 11]. [sent-44, score-0.682]

22 Modulo various nuances, they all essentially ensure that most or all relevant sparse signals lie sufficiently far from the null space of A: It would be impossible to recover signals in the null space since their measurements are all zero and cannot therefore be distinguished. [sent-45, score-0.442]

23 For random matrices whose elements are independent and identically distributed Gaussian or Bernoulli variates, the RIP holds as long as the number of measurements M satisfies M ≥ CS log N/S (5) for some constant C that depends on δS [8]. [sent-47, score-0.191]

24 2 Compressing sparse probability distributions Compressive sensing allows us to use far fewer resources to accurately represent high-dimensional objects if they are sufficiently sparse. [sent-49, score-0.474]

25 2 Consider the signal to be a probability distribution over an n-dimensional binary vector x ∈ {−1, +1}n , which I will write sometimes as a function p(x) and sometimes as a vector p indexed by the binary state x. [sent-52, score-0.261]

26 The measurement matrix A for probability vectors has size M × 2n . [sent-55, score-0.22]

27 For example, if n = 3 then the column index would take the 8 values x ∈ {−−− ; −−+ ; −+− ; −++ ; +−− ; +−+ ; ++− ; +++} Each element of the measurement matrix, Ai (x), can be viewed as a function applied to the binary state. [sent-59, score-0.191]

28 For suitable measurement matrices A, we are guaranteed accurate reconstruction of S-sparse probability distributions as long as the number of measurements is M ≥ O(S log N/S) = O(Sn − S log S) (7) n The exponential size of the probability vector, N = 2 , is cancelled by the logarithm. [sent-61, score-0.613]

29 For distributions with a fixed sparseness S, the required number of measurements per variable, M/n, is then independent of the number of variables. [sent-62, score-0.265]

30 This allows us to use the performance limits from robust compressive sensing, which according to (3) creates an error in the reconstructed probabilities that is bounded by C1 ˆ (9) p − p 2 ≤ C0 pS − p 2 + √ T where pS is a vector with the top S probabilities preserved and the rest set to zero. [sent-65, score-0.745]

31 1 Measurements by random perceptrons In compressive sensing it is common to use a matrix with independent Bernoulli-distributed random values, Ai (x) ∼ B( 1 ), which guarantees A satisfies the RIP [12]. [sent-69, score-1.068]

32 These are simple threshold functions of a weighted average of the input Ai (x) = sgn j Wij xj − θj (10) 1 Depending on the problem, the number of significant nonzero entries S may grow with the number of variables. [sent-73, score-0.357]

33 An example measurement matrix for random perceptrons is shown in Figure 1. [sent-81, score-0.537]

34 These functions are readily implemented by individual neurons, where xj is the instantaneous activity of neuron j, Wij is the synaptic weight between neurons i and j, and the sgn function approximates a spiking threshold at θ. [sent-82, score-0.455]

35 State vector x Figure 1: Example measurement matrix Ai (x) for M = 100 random perceptrons applied to all 29 possible binary vectors of length n = 9. [sent-83, score-0.577]

36 The step nonlinearity sgn is not essential, but some type of nonlinearity is: Using a purely linear function of the states, A = W x, would result in measurements y = Ap = W x . [sent-84, score-0.462]

37 This provides at most n linearly independent measurements of p(x), even when M > n. [sent-85, score-0.191]

38 Although the dimensionality of W is merely M × n, 2 which is much smaller than the 2n -dimensional space of probabilities, (10) can generate O(2n ) distinct perceptrons [13]. [sent-88, score-0.386]

39 This shows that random perceptrons generate the canonical basis and can thus span the space of possible p(x). [sent-90, score-0.393]

40 In the Appendix I prove that random perceptrons with zero threshold satisfy the requirements for compressive sensing in the limit of large n. [sent-92, score-1.077]

41 Present research is directed toward deriving the condition number of these measurement matrices for finite n, in order to provide rigorous bounds on the number of measurements required in practice. [sent-93, score-0.342]

42 Below I present empirical evidence that even a small number of random perceptrons largely preserves the information about sparse distributions. [sent-94, score-0.546]

43 1 Experiments Fidelity of compressed sparse distributions To test random perceptrons in compressive sensing of probabilities, I generated sparse distributions using small Boltzmann machines [14], and compressed them using random perceptrons driven by samples from the Boltzmann machine. [sent-96, score-2.1]

44 In a Boltzmann Machine, binary states x occur with probabilities given by the Boltzmann distribution with energy function E(x), p(x) ∝ e−E(x) E(x) = −b x − x Jx (11) determined by biases b and pairwise couplings J. [sent-98, score-0.271]

45 I then fixed only the visible units and allowed the hidden units to fluctuate according to the conditional probability p(h|v) to be represented. [sent-103, score-0.356]

46 This distribution of couplings produced sparse posterior distributions whose rankordered probabilities fell faster than rank−1 and were thus compressible [4]. [sent-106, score-0.467]

47 The compression was accomplished by passing the hidden unit activities h through random perceptrons a with weights W , according to a = sgn (W h). [sent-107, score-0.806]

48 The mean activity of these perceptron units compressively senses the probability distribution according to (8). [sent-109, score-0.397]

49 Perceptrons a feedforward W recurrent Jhh feedforward Jvh Inputs v neurons Samplers h time Figure 2: Compressive sensing of a probability distribution by model neurons. [sent-111, score-0.547]

50 Sparse posterior probability distribution are generated by a Boltzmann Machine with visible units v (Inputs), hidden units h (Samplers), feedforward couplings Jvh from visible to hidden units, and recurrent connections between hidden units Jhh . [sent-114, score-0.869]

51 The hidden units are stochastic, and sample from a probability distribution p(h|v). [sent-116, score-0.221]

52 The samples are recoded by feedforward weights W to random perceptrons a. [sent-117, score-0.429]

53 The mean activity y of the time-dependent perceptron responses captures the sparse joint distribution of the hidden units. [sent-118, score-0.398]

54 We are not ultimately interested in reconstruction of the large, sparse distribution, but rather the distribution’s compressed representation. [sent-119, score-0.389]

55 I reconstruct sparse probabilities using nonnegative 1 minimization with measurement constraints [16, 17], minimizing p 1 + λ Ap − y 2 2 (13) where λ is a regularization parameter that was set to 2T in all simulations. [sent-121, score-0.41]

56 Even with far fewer measurements than signal dimensions, reconstruction accuracy is limited only by the sampling of the posterior. [sent-123, score-0.414]

57 Enough random perceptrons do not lose any available information. [sent-124, score-0.359]

58 In the context of probability distributions, 1 reconstruction has a serious flaw: All distributions have the same 1 norm: p 1 = x p(x) = 1! [sent-125, score-0.229]

59 Figure 3B shows that the shortfall is small: 1 reconstruction recovers over 90% of the total probability mass. [sent-128, score-0.199]

60 Since marginalization is a linear operation on the probability distribution, this is readily implementable in the linearly compressed domain. [sent-132, score-0.227]

61 In contrast, evidence integration is a multiplicative process acting in the canonical basis, so this operation will be more complicated after the linear distortions of compressive measurement A. [sent-133, score-0.665]

62 Nonetheless, such computations should be feasible as long as the informative relationships are preserved in the compressed space: Similar distributions should have similar compressive representations, and dissimilar 5 B C Measurements M 20 State x 0 . [sent-134, score-0.856]

63 (A) A sparse posterior distribution over 10 nodes in a Boltzmann machine is sampled 1000 times, fed to 50 random perceptrons, and reconstructed by nonnegative 1 minimization. [sent-138, score-0.251]

64 (B) A histogram of the sum of reconstructed probabilities reveals the small shortfall from a proper normalization of 1. [sent-139, score-0.198]

65 Each box uses different numbers of compressive measurements M and numbers of samples T . [sent-141, score-0.634]

66 (D) With increasing numbers of compressive measurements, the mean squared reconstruction error falls to 1/T = 10−3 , the limit imposed by finite sampling. [sent-142, score-0.582]

67 In fact, that is precisely the guarantee of compressive sensing: topological properties of the underlying space are preserved in the compressive domain [18]. [sent-144, score-0.97]

68 Figure 4 illustrates how not only are individual sparse distributions recoverable despite significant compression, but the topology of the set of all such distributions is retained. [sent-145, score-0.239]

69 Each pattern in X is a translation of a single binary template x0 whose elements are generated by thresholding a noisy sinusoid (Figure 4A): x0 = sgn [4 sin (2πj/n) + ηj ] with ηj ∼ N (0, 1). [sent-147, score-0.422]

70 On j each trial, one of these possible patterns is drawn randomly with equal probability 1/|X |, and then is measured by a noisy process that randomly flips bits with a probability η = 0. [sent-148, score-0.207]

71 Just as the underlying set of inputs has a translation symmetry, the set of all possible posterior distributions has a cyclic permutation symmetry. [sent-154, score-0.244]

72 This symmetry can be revealed by a nonlinear embedding [19] of the set of posteriors into two dimensions (Figure 4B). [sent-155, score-0.2]

73 Compressive sensing of these posteriors by 10 random perceptrons produces a much lowerdimensional embedding that preserves this symmetry. [sent-156, score-0.791]

74 In compressive sensing, similarity is measured by Euclidean distance. [sent-158, score-0.443]

75 When applied to probability distributions it will be interesting to examine instead how well information-geometric measures like the Kullback-Leibler divergence are preserved under this dimensionality reduction [20]. [sent-159, score-0.2]

76 (A) The process of generating posterior distributions: (i) A set of 100 possible patterns is generated as cyclic translations of a binary pattern (only 9 shown). [sent-164, score-0.262]

77 From such noisy patterns, an observer can infer posterior probability distributions over possible inputs (iv). [sent-167, score-0.257]

78 (C) This circular structure is retained even after each posterior is compressed into the mean output of 10 random perceptrons. [sent-171, score-0.279]

79 Amazingly, the brain need not measure any correlations between the perceptron outputs to capture the joint statistics of the sparse input distribution. [sent-173, score-0.292]

80 Successful encoding in this compressed representation requires that the input distribution be sparse. [sent-176, score-0.244]

81 Posterior distributions over sensory stimuli like natural images are indeed expected to be highly sparse: the features are sparse [21], the prior over images is sparse [22], and the likelihood produced by sensory evidence is usually restrictive, so the posteriors should be even sparser. [sent-177, score-0.468]

82 The results presented here show that purely random connections are sufficient to ensure that a sparse probability distribution is properly encoded. [sent-181, score-0.192]

83 Surprisingly, more structured connections cannot allow a network with the same computational elements to encode distributions with substantially fewer neurons, since compressive sensing is already nearly optimal [8]. [sent-182, score-0.756]

84 Note that unknown randomness is not an impediment to further processing, as reconstruction can be performed even without explicit knowledge of random perceptron measurement matrix [23]. [sent-184, score-0.441]

85 Since compressive sensing preserves the essential geometric relationships of the signal space, learning and inference based on these relationships may be no harder after the compression, and could even be more efficient due to the reduced dimensionality. [sent-186, score-0.939]

86 Biologically plausible mechanisms for implementing probabilistic computations in the compressed representation is important work for the future. [sent-187, score-0.248]

87 Appendix: Asymptotic orthogonality of random perceptron matrix To evaluate the quality of the compressive sensing matrix A, we need to ensure that S-sparse vectors are not projected to zero by the action of A. [sent-188, score-0.916]

88 Here I show that the random √ perceptrons are asympˆ ˆ ˆ totically well-conditioned: A A → I for large n and M , where A = A/ M . [sent-189, score-0.359]

89 For compactness I will write wi for the ith row of the perceptron weight matrix W . [sent-192, score-0.299]

90 This angle is related to the Hamming distance h = h(x, x ): θ(h) = cos−1 (x · x /n) = cos−1 (1 − 2h/n) (17) The signs of wi ·x and wi ·x agree within this wedge region and its reflection about W = 0, and disagree in the supplementary wedges. [sent-195, score-0.277]

91 The mean inner product depends only on the Hamming distance h between x and x , which for sparse signals with random support has a binomial distribution, p(h) = n 2−n with mean n/2 and variance n/4. [sent-199, score-0.288]

92 Consequently, with enough measurements the random perceptrons asymptotically provide an isometry. [sent-202, score-0.55]

93 Future work will investigate how the measurement matrix behaves for finite n and M , which will determine the number of measurements required in practice to capture a signal of a given sparseness. [sent-203, score-0.479]

94 [3] Cand` s E, Romberg J, Tao T (2006) Robust uncertainty principles: Exact signal reconstruction from e highly incomplete frequency information. [sent-209, score-0.223]

95 [6] Cand` s E, Plan Y (2011) A probabilistic and RIPless theory of compressed sensing. [sent-215, score-0.211]

96 [8] Cand` s E, Wakin M (2008) An introduction to compressive sampling. [sent-219, score-0.443]

97 [9] Kueng R, Gross D (2012) RIPless compressed sensing from anisotropic measurements. [sent-221, score-0.424]

98 [10] Calderbank R, Howard S, Jafarpour S (2010) Construction of a large class of deterministic sensing matrices that satisfy a statistical isometry property. [sent-223, score-0.271]

99 [16] Yang J, Zhang Y (2011) Alternating direction algorithms for L1 problems in compressive sensing. [sent-237, score-0.443]

100 [23] Isely G, Hillar CJ, Sommer FT (2010) Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication. [sent-252, score-0.494]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('compressive', 0.443), ('perceptrons', 0.359), ('sgn', 0.271), ('cxx', 0.24), ('sensing', 0.239), ('measurements', 0.191), ('compressed', 0.185), ('measurement', 0.151), ('perceptron', 0.15), ('wi', 0.122), ('reconstruction', 0.113), ('signal', 0.11), ('units', 0.099), ('ai', 0.098), ('neurons', 0.097), ('boltzmann', 0.097), ('sparse', 0.091), ('probabilities', 0.091), ('posteriors', 0.087), ('rip', 0.083), ('couplings', 0.08), ('reconstruct', 0.077), ('compression', 0.076), ('distributions', 0.074), ('jhh', 0.074), ('feedforward', 0.07), ('posterior', 0.068), ('visible', 0.065), ('signals', 0.065), ('reconstructed', 0.063), ('preserves', 0.059), ('reconstructions', 0.059), ('patterns', 0.059), ('preserved', 0.057), ('activity', 0.051), ('inner', 0.051), ('hidden', 0.051), ('brain', 0.051), ('wij', 0.05), ('cos', 0.05), ('nonzero', 0.05), ('jvh', 0.049), ('ripless', 0.049), ('activities', 0.049), ('cand', 0.048), ('embedding', 0.047), ('sensory', 0.044), ('glauber', 0.044), ('shortfall', 0.044), ('probability', 0.042), ('population', 0.041), ('binary', 0.04), ('wakin', 0.04), ('pouget', 0.04), ('arxiv', 0.04), ('nonlinear', 0.038), ('rochester', 0.038), ('pattern', 0.038), ('noisy', 0.038), ('computations', 0.037), ('evidence', 0.037), ('threshold', 0.036), ('translation', 0.035), ('hamming', 0.035), ('inputs', 0.035), ('compressing', 0.034), ('compressible', 0.034), ('canonical', 0.034), ('products', 0.034), ('angle', 0.033), ('nonetheless', 0.033), ('uctuate', 0.033), ('cyclic', 0.032), ('isometry', 0.032), ('ps', 0.032), ('recovery', 0.031), ('states', 0.031), ('encoding', 0.03), ('dissimilar', 0.03), ('ensure', 0.03), ('relationships', 0.03), ('variance', 0.029), ('donoho', 0.029), ('ss', 0.029), ('distribution', 0.029), ('boolean', 0.029), ('tao', 0.029), ('symmetry', 0.028), ('accurately', 0.028), ('essential', 0.028), ('dimensionality', 0.027), ('highdimensional', 0.027), ('modest', 0.027), ('topological', 0.027), ('matrix', 0.027), ('mean', 0.026), ('variations', 0.026), ('probabilistic', 0.026), ('bits', 0.026), ('translations', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

2 0.19667403 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain

Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1

3 0.17958996 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

4 0.16785878 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

Author: Chen Chen, Junzhou Huang

Abstract: In Compressive Sensing Magnetic Resonance Imaging (CS-MRI), one can reconstruct a MR image with good quality from only a small number of measurements. This can significantly reduce MR scanning time. According to structured sparsity theory, the measurements can be further reduced to O(K + log n) for tree-sparse data instead of O(K + K log n) for standard K-sparse data with length n. However, few of existing algorithms have utilized this for CS-MRI, while most of them model the problem with total variation and wavelet sparse regularization. On the other side, some algorithms have been proposed for tree sparse regularization, but few of them have validated the benefit of wavelet tree structure in CS-MRI. In this paper, we propose a fast convex optimization algorithm to improve CS-MRI. Wavelet sparsity, gradient sparsity and tree sparsity are all considered in our model for real MR images. The original complex problem is decomposed into three simpler subproblems then each of the subproblems can be efficiently solved with an iterative scheme. Numerous experiments have been conducted and show that the proposed algorithm outperforms the state-of-the-art CS-MRI algorithms, and gain better reconstructions results on real MR images than general tree based solvers or algorithms. 1

5 0.14809704 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

6 0.13033879 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

7 0.11049869 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

8 0.099321142 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

9 0.091732107 195 nips-2012-Learning visual motion in recurrent neural networks

10 0.08124651 200 nips-2012-Local Supervised Learning through Space Partitioning

11 0.080585681 65 nips-2012-Cardinality Restricted Boltzmann Machines

12 0.080295734 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

13 0.079387367 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

14 0.078065865 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

15 0.074477039 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

16 0.071468897 126 nips-2012-FastEx: Hash Clustering with Exponential Families

17 0.066914529 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

18 0.064371958 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

19 0.063624933 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

20 0.06275975 179 nips-2012-Learning Manifolds with K-Means and K-Flats


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.207), (1, 0.052), (2, -0.046), (3, 0.028), (4, -0.003), (5, 0.134), (6, -0.007), (7, 0.03), (8, 0.027), (9, 0.009), (10, -0.039), (11, -0.019), (12, 0.059), (13, 0.015), (14, -0.037), (15, -0.037), (16, 0.05), (17, -0.001), (18, 0.016), (19, -0.112), (20, -0.034), (21, 0.143), (22, -0.116), (23, 0.042), (24, 0.049), (25, 0.063), (26, -0.02), (27, 0.106), (28, 0.106), (29, 0.063), (30, -0.111), (31, -0.072), (32, 0.051), (33, -0.012), (34, -0.183), (35, -0.101), (36, 0.052), (37, 0.101), (38, -0.056), (39, -0.087), (40, -0.05), (41, 0.057), (42, -0.043), (43, -0.031), (44, -0.098), (45, 0.027), (46, 0.066), (47, 0.05), (48, 0.099), (49, 0.134)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94081068 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

2 0.73334217 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

Author: Henrik Ohlsson, Allen Yang, Roy Dong, Shankar Sastry

Abstract: While compressive sensing (CS) has been one of the most vibrant research fields in the past few years, most development only applies to linear models. This limits its application in many areas where CS could make a difference. This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique – CPRL, which relaxes the NP-hard problem to a nonsmooth semidefinite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation. 1

3 0.63947982 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

4 0.60747749 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

Author: Chen Chen, Junzhou Huang

Abstract: In Compressive Sensing Magnetic Resonance Imaging (CS-MRI), one can reconstruct a MR image with good quality from only a small number of measurements. This can significantly reduce MR scanning time. According to structured sparsity theory, the measurements can be further reduced to O(K + log n) for tree-sparse data instead of O(K + K log n) for standard K-sparse data with length n. However, few of existing algorithms have utilized this for CS-MRI, while most of them model the problem with total variation and wavelet sparse regularization. On the other side, some algorithms have been proposed for tree sparse regularization, but few of them have validated the benefit of wavelet tree structure in CS-MRI. In this paper, we propose a fast convex optimization algorithm to improve CS-MRI. Wavelet sparsity, gradient sparsity and tree sparsity are all considered in our model for real MR images. The original complex problem is decomposed into three simpler subproblems then each of the subproblems can be efficiently solved with an iterative scheme. Numerous experiments have been conducted and show that the proposed algorithm outperforms the state-of-the-art CS-MRI algorithms, and gain better reconstructions results on real MR images than general tree based solvers or algorithms. 1

5 0.55826163 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain

Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1

6 0.52921396 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

7 0.48582694 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

8 0.48328549 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

9 0.4724991 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

10 0.45660916 34 nips-2012-Active Learning of Multi-Index Function Models

11 0.43663767 65 nips-2012-Cardinality Restricted Boltzmann Machines

12 0.43587559 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks

13 0.41958186 39 nips-2012-Analog readout for optical reservoir computers

14 0.41487199 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

15 0.41063207 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

16 0.40929377 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter

17 0.4051125 179 nips-2012-Learning Manifolds with K-Means and K-Flats

18 0.40498242 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks

19 0.39750695 224 nips-2012-Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

20 0.39706472 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (17, 0.021), (21, 0.044), (38, 0.124), (39, 0.01), (42, 0.041), (44, 0.013), (54, 0.043), (55, 0.025), (74, 0.036), (76, 0.138), (80, 0.126), (92, 0.061), (94, 0.212)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98088551 362 nips-2012-Waveform Driven Plasticity in BiFeO3 Memristive Devices: Model and Implementation

Author: Christian Mayr, Paul Stärke, Johannes Partzsch, Love Cederstroem, Rene Schüffny, Yao Shuai, Nan Du, Heidemarie Schmidt

Abstract: Memristive devices have recently been proposed as efficient implementations of plastic synapses in neuromorphic systems. The plasticity in these memristive devices, i.e. their resistance change, is defined by the applied waveforms. This behavior resembles biological synapses, whose plasticity is also triggered by mechanisms that are determined by local waveforms. However, learning in memristive devices has so far been approached mostly on a pragmatic technological level. The focus seems to be on finding any waveform that achieves spike-timing-dependent plasticity (STDP), without regard to the biological veracity of said waveforms or to further important forms of plasticity. Bridging this gap, we make use of a plasticity model driven by neuron waveforms that explains a large number of experimental observations and adapt it to the characteristics of the recently introduced BiFeO3 memristive material. Based on this approach, we show STDP for the first time for this material, with learning window replication superior to previous memristor-based STDP implementations. We also demonstrate in measurements that it is possible to overlay short and long term plasticity at a memristive device in the form of the well-known triplet plasticity. To the best of our knowledge, this is the first implementations of triplet plasticity on any physical memristive device. 1

2 0.88354635 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

Author: Jonathan Huang, Daniel Alexander

Abstract: Accurate and detailed models of neurodegenerative disease progression are crucially important for reliable early diagnosis and the determination of effective treatments. We introduce the ALPACA (Alzheimer’s disease Probabilistic Cascades) model, a generative model linking latent Alzheimer’s progression dynamics to observable biomarker data. In contrast with previous works which model disease progression as a fixed event ordering, we explicitly model the variability over such orderings among patients which is more realistic, particularly for highly detailed progression models. We describe efficient learning algorithms for ALPACA and discuss promising experimental results on a real cohort of Alzheimer’s patients from the Alzheimer’s Disease Neuroimaging Initiative. 1

same-paper 3 0.85361695 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

Author: Xaq Pitkow

Abstract: This paper shows how sparse, high-dimensional probability distributions could be represented by neurons with exponential compression. The representation is a novel application of compressive sensing to sparse probability distributions rather than to the usual sparse signals. The compressive measurements correspond to expected values of nonlinear functions of the probabilistically distributed variables. When these expected values are estimated by sampling, the quality of the compressed representation is limited only by the quality of sampling. Since the compression preserves the geometric structure of the space of sparse probability distributions, probabilistic computation can be performed in the compressed domain. Interestingly, functions satisfying the requirements of compressive sensing can be implemented as simple perceptrons. If we use perceptrons as a simple model of feedforward computation by neurons, these results show that the mean activity of a relatively small number of neurons can accurately represent a highdimensional joint distribution implicitly, even without accounting for any noise correlations. This comprises a novel hypothesis for how neurons could encode probabilities in the brain. 1

4 0.78594458 364 nips-2012-Weighted Likelihood Policy Search with Model Selection

Author: Tsuyoshi Ueno, Kohei Hayashi, Takashi Washio, Yoshinobu Kawahara

Abstract: Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs). Although they have brought much progress in practical applications of RL, there still remains an unsolved problem in DPS related to model selection for the policy. In this paper, we propose a novel DPS method, weighted likelihood policy search (WLPS), where a policy is efficiently learned through the weighted likelihood estimation. WLPS naturally connects DPS to the statistical inference problem and thus various sophisticated techniques in statistics can be applied to DPS problems directly. Hence, by following the idea of the information criterion, we develop a new measurement for model comparison in DPS based on the weighted log-likelihood.

5 0.78540373 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

6 0.78246737 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

7 0.75805807 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

8 0.73683536 197 nips-2012-Learning with Recursive Perceptual Representations

9 0.73679006 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

10 0.73611909 65 nips-2012-Cardinality Restricted Boltzmann Machines

11 0.73523885 292 nips-2012-Regularized Off-Policy TD-Learning

12 0.73492217 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

13 0.73472023 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

14 0.73462486 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

15 0.7343747 200 nips-2012-Local Supervised Learning through Space Partitioning

16 0.73407245 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

17 0.73334837 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

18 0.73279649 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

19 0.73274046 279 nips-2012-Projection Retrieval for Classification

20 0.73128182 190 nips-2012-Learning optimal spike-based representations