nips nips2005 nips2005-15 knowledge-graph by maker-knowledge-mining

15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels


Source: pdf

Author: Eizaburo Doi, Doru C. Balcan, Michael S. Lewicki

Abstract: Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. [sent-5, score-0.36]

2 This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. [sent-6, score-0.335]

3 Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. [sent-7, score-0.701]

4 Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. [sent-8, score-0.238]

5 The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. [sent-9, score-0.655]

6 In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. [sent-10, score-0.434]

7 We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. [sent-11, score-0.779]

8 1 Introduction In neural systems, the representational capacity of a single neuron is estimated to be as low as 1 bit/spike [1, 2]. [sent-12, score-0.32]

9 The characteristics of the optimal coding strategy under such conditions, however, remains an open question. [sent-13, score-0.385]

10 Recent efficient coding models for sensory coding such as sparse coding and ICA have provided many insights into visual sensory coding (for a review, see [3]), but those models made the implicit assumption that the representational capacity of individual neurons was infinite. [sent-14, score-1.458]

11 Intuitively, such a limit on representational precision should strongly influence the form of the optimal code. [sent-15, score-0.282]

12 In particular, it should be possible to increase the number of limited capacity units in a population to form a more precise representation of the sensory signal. [sent-16, score-0.595]

13 For simplicity, we assume that the encoder and decoder are both linear, and that the goal is to minimize the mean squared error (MSE) of the reconstruction. [sent-19, score-0.147]

14 In contrast to our previous report, which examined noisy overcomplete representations [4], the cost function does not contain a sparsity prior. [sent-20, score-0.258]

15 For each data point x, its representation r in the model is the linear transform of x through matrix W, 2 perturbed by the additive noise (i. [sent-23, score-0.163]

16 , channel noise) n ∼ N (0, σn IM ): r = Wx + n = u + n. [sent-25, score-0.227]

17 (1) We refer to W as the encoding matrix and its row vectors as encoding vectors. [sent-26, score-0.498]

18 The reconstruction of a data point from its representation is simply the linear transform of the latter, using matrix A: x = Ar = AWx + An. [sent-27, score-0.332]

19 ˆ (2) We refer to A as the decoding matrix and its column vectors as decoding vectors. [sent-28, score-0.678]

20 2 determines how the reconstruction depends on the data, while An reflects the channel noise in the reconstruction. [sent-30, score-0.466]

21 When there is no channel noise (n = 0), AW = I is equivalent to perfect reconstruction. [sent-31, score-0.29]

22 The goal of the system is to form an accurate representation of the data that is robust to the presence of channel noise. [sent-35, score-0.42]

23 We quantify the accuracy of the reconstruction by the mean squared error (MSE) over a set of data. [sent-36, score-0.236]

24 The error of each sample is = x − x = ˆ (IN − AW)x − An, and the MSE is expressed in matrix form: 2 E(A, W) = tr{(IN − AW)Σx (IN − AW)T } + σn tr{AAT }, (3) where we used E = T = tr( T ). [sent-37, score-0.103]

25 Note that, due to the MSE objective along with the zero-mean assumptions, the optimal solution depends solely on second-order statistics of the data and the noise. [sent-38, score-0.228]

26 Since the SNR is limited in the neural representation [1, 2], we assume that each coding 2 unit has a limited variance u2 = σu so that the SNR is limited to the same constant value i 1 2 2 2 γ = σu /σn . [sent-39, score-0.58]

27 As the channel capacity of information is defined by C = 2 ln(γ 2 + 1), this is equivalent to limiting the capacity of each unit to the same level. [sent-40, score-0.709]

28 We will call this constraint as channel capacity constraint. [sent-41, score-0.507]

29 As we will see shortly, 2 it is convenient to define V ≡ WES/σu , then the condition u2 = σu implies that i 2 VVT = Cu = uuT /σu , (4) where Cu is the correlation matrix of the representation u. [sent-46, score-0.163]

30 3 The optimal solutions and their characteristics In this section we analyze the optimal solutions in some simple cases, namely for 1dimensional (1-D) and 2-dimensional (2-D) data. [sent-49, score-0.58]

31 By solving the necessary condition for the minimum, ∂E/∂a = 0, with the channel capacity constraint (eq. [sent-53, score-0.542]

32 4), the entries of the optimal solutions are σu 1 γ2 , ai = · , σx wi M · γ 2 + 1 and the smallest value of the MSE is 2 σx E = . [sent-54, score-0.325]

33 2+1 M· γ wi = ± (6) (7) This minimum depends on the SNR (γ 2 ) and on the number of units (M ), and it is monotonically decreasing with respect to both. [sent-55, score-0.242]

34 5 leads the optimal a into having as small norm as possible, while the first term prevents it from being arbitrarily small. [sent-59, score-0.178]

35 2 2-D data In the 2-D case, the channel capacity constraint (eq. [sent-62, score-0.507]

36 4) restricts V such that the row vectors of V should be on the unit circle. [sent-63, score-0.178]

37 Therefore V can be parameterized as   cos θ1 sin θ1   . [sent-64, score-0.117]

38 cos θM sin θM where θi ∈ [0, 2π) is the angle between i-th row of V and the principal eigenvector of the data e1 (E = [e1 , e2 ], λ1 ≥ λ2 > 0). [sent-70, score-0.248]

39 The necessary condition for the minimum ∂E/∂A = O implies 2 2 A = σu ESVT (σu VVT + σn IM )−1 . [sent-71, score-0.103]

40 8 and 9, the MSE can be expressed as E = (λ1 + λ2 ) γ2 2 (λ1 − λ2 ) Re(Z) , 2 1 − 4 γ 4 |Z|2 2 2 Mγ M 2 2 γ +1 − = M k=1 [cos(2θk ) +1 (10) where by definition Z = M k=1 zk + i sin(2θk )]. [sent-73, score-0.137]

41 In the following we analyze the problem in two complementary cases: when the data variance is isotropic (i. [sent-78, score-0.15]

42 , λ1 = λ2 ), and when it is anisotropic (λ1 > λ2 ). [sent-80, score-0.147]

43 As we will see, the solutions are qualitatively different in these two cases. [sent-81, score-0.145]

44 11), yielding the optimal solutions W= σx γ2 σu V, A = · 2 VT , σx σu γ + 1 (13) where V = V(θ1 ), ∀ θ1 ∈ [0, 2π). [sent-88, score-0.337]

45 13 means that the orientation of the encoding and decoding vectors is arbitrary, and that the length of those vectors is adjusted exactly as in the 1-D case (eq. [sent-90, score-0.696]

46 7 with M = 1), corresponding to the error component along the axis that the encoding/decoding vectors represent, while the second term is the whole data variance along the axis orthogonal to the encoding/decoding vectors, along which no reconstruction is made. [sent-95, score-1.095]

47 Accordingly, the optimal solution is W= σx γ2 σu V, A = ·M 2 VT , σx σu 2 γ + 1 (15) where the optimal V = V(θ1 , · · · , θM ) is given by such θ1 , . [sent-99, score-0.29]

48 Specifically, if M = 2, then z1 and z2 must be antiparallel but are not otherwise constrained, making the pair of decoding vectors (and that of encoding vectors) orthogonal, yet free to rotate. [sent-103, score-0.556]

49 Note that both the encoding and the decoding vectors are parallel to the rows of V (eq. [sent-104, score-0.556]

50 15), and the angle of zk from the real axis is twice as large as that of ak (or wk ). [sent-105, score-0.298]

51 Likewise, if M = 3, the decoding vectors should be evenly distributed yet still free to rotate; if M = 4, the four vectors should just be two pairs of orthogonal vectors (not necessarily evenly distributed); if M ≥ 5, there is no obvious regularity. [sent-106, score-0.788]

52 We present examples in Fig 2: given M = 2, the reconstruction gets worse by lowering the SNR from 10 to 1; however, the reconstruction can be improved by increasing the number of units for a fixed SNR (γ 2 = 1). [sent-111, score-0.583]

53 Just as in the 1-D case, the norm of the decoding vectors gets smaller by increasing M or decreasing γ 2 , which is explicitly described by eq. [sent-112, score-0.462]

54 M=1 γ2=1 M=2 M=3 γ2=1 γ2=1 M=4 γ2=1 M=5 γ2=1 Z-Diagram Decoding Encoding Variance γ2=10 Figure 2: The optimal solutions for isotropic data. [sent-114, score-0.378]

55 M is the number of units and γ 2 is the SNR in the representation. [sent-115, score-0.106]

56 “Variance” shows the variance ellipses for the data (gray) and the reconstruction (magenta). [sent-116, score-0.312]

57 “Encoding” and “Decoding” show encoding vectors (red) and decoding vectors (blue), respectively. [sent-118, score-0.696]

58 The gray vectors show the principal axes of the data, e1 and e2 . [sent-119, score-0.278]

59 11) in the complex plane, where each unit length bar corresponds to a zk , and the end point indicated by “×” represents the coordinates of Z. [sent-121, score-0.136]

60 The set of green dots in a plot corresponds to optimal values of Z; when this set reduces to a single dot, the optimal Z is unique. [sent-122, score-0.29]

61 In general there could be multiple configurations of bars for a single Z, implying multiple equivalent solutions of A and W for a given Z. [sent-123, score-0.224]

62 For M = 2 and γ 2 = 10, we drew with gray dotted bars an example of Z that is not optimal (corresponding encoding and decoding vectors not shown). [sent-124, score-0.817]

63 2 Anisotropic case In the anisotropic condition λ1 > λ2 , the MSE (eq. [sent-127, score-0.182]

64 17 is minimized iff θ1 = 0, yielding the optimal solutions √ σu T λ1 γ2 W = √ e1 , A = · 2 e1 . [sent-132, score-0.432]

65 (18) σu γ + 1 λ1 In contrast to the isotropic case with M = 1, the encoding and decoding vectors are specified along the principal axis (e1 ) as illustrated in Fig. [sent-133, score-0.998]

66 If M ≥ 2, then we can derive the optimal y from the necessary condition for the minimum, dE/dy = 0, which yields √ √ √ √ λ1 − λ 2 2 λ 1 + λ2 2 √ √ √ M + 2 −y √ M + 2 − y = 0. [sent-139, score-0.18]

67 Accordingly the optimal solutions are given by √ √ √ λ1 + λ2 γ2 σu / λ1 0 T √ W=V ·M 2 EVT , E , A= 0 σu / λ2 2σu γ +1 2 (21) (22) (23) where the optimal V = V(θ1 , · · · , θM ) is given by the Z-diagram as illustrated in Fig. [sent-143, score-0.435]

68 The minimum is therefore obtained when y = M , yielding the optimal solutions √ σu λ1 γ2 T W = √ 1M e 1 , A = e 1 1T , (25) · M σu M γ 2 + 1 λ1 where 1M = (1, · · · , 1)T ∈ RM , and the minimum is given by E = λ1 + λ2 . [sent-151, score-0.473]

69 To summarize, if the representational resource is too limited either by M or γ 2 , the best strategy is to represent only the principal axis. [sent-154, score-0.247]

70 Now we describe the optimal solutions using the Z-diagram (Fig. [sent-155, score-0.29]

71 First, the optimal so2 lutions differ depending on the SNR. [sent-157, score-0.145]

72 If γ 2 > γc , the optimal Z is a certain point between 0 and M on the real axis. [sent-158, score-0.145]

73 If γ 2 ≤ γc , the optimal Z is M , and the optimal configuration is obtained only when all the bars align on the real axis. [sent-160, score-0.369]

74 In this case, encoding/decoding vectors are all parallel to the principal axis (e1 ), as described by eq. [sent-161, score-0.411]

75 Such a degenerate 2 representation is unique for the anisotropic case and is determined by γc (eq. [sent-163, score-0.335]

76 We can γ2=10 M=2 γ2=2 M=3 γ2=1 γ2=10 γ2=1 M=8 γ2=1 Z-Diagram Decoding Encoding Variance M=1 γ2=1 Figure 3: The optimal solutions for anisotropic data. [sent-165, score-0.437]

77 3, M = 2 with different γ 2 ) or by increasing the number of units (γ 2 = 1 with different M ). [sent-175, score-0.145]

78 Also, the optimal solutions for the overcomplete representation are, in general, not obtained by simple replication (except in the degenerate case). [sent-176, score-0.652]

79 3, the optimal solution for M = 8 is not identical to the replication of the optimal solution for M = 2, and we can formally prove it by using eq. [sent-178, score-0.384]

80 For M = 1 and for the degenerate case, where only one axis in two dimensional space is represented, the optimal strategy is to preserve information along the principal axis at the cost of losing all information along the minor axis. [sent-180, score-0.923]

81 3 that the data along the principal axis is more accurately reconstructed than that along the minor axis; if there is no bias, the ellipse for the reconstruction should be similar to that of the data. [sent-183, score-0.691]

82 More precisely, we can √ √ prove that the error ratio along e1 is smaller than that along e2 at the ratio of λ2 : λ1 (note the switch of the subscripts), which describes the representation bias toward the main axis. [sent-184, score-0.33]

83 4 Application to image coding In the case of high-dimensional data we can employ an algorithm similar to the one in [4], to numerically compute optimal solutions that minimizes the MSE subject to the channel capacity constraint. [sent-185, score-1.046]

84 4 presents the performance of our model when applied to image coding in the presence of channel noise. [sent-187, score-0.503]

85 The robust coding model shows a dramatic reduction in the reconstruction error, when compared to alternatives such as ICA and wavelet codes. [sent-191, score-0.64]

86 This underscores the importance of taking into account the channel capacity constraint for better understanding the neural representation. [sent-192, score-0.507]

87 6% Figure 4: Reconstruction using one bit channel capacity representations. [sent-197, score-0.497]

88 0 bit for each coefficient, we added Gaussian noise to the coefficients of the ICA and “Daubechies 9/7” wavelet codes as in the robust coding. [sent-199, score-0.305]

89 5 Discussion In this study we measured the accuracy of the reconstruction by the MSE. [sent-202, score-0.204]

90 However, we can prove that this measure does not yield optimal solutions for the robust coding problem. [sent-204, score-0.663]

91 Assuming the data is Gaussian and the representation is complete, we can prove that the mutual information is upper- bounded, N 1 ˆ ln(γ 2 + 1), (27) I(x, x) = ln det(γ 2 VVT + IN ) ≤ 2 2 with equality iff VVT = I, i. [sent-205, score-0.236]

92 This result holds even for anisotropic data, which is different from the optimal MSE code that can employ correlated, or even degenerate, representation. [sent-209, score-0.345]

93 The optimal MSE code over noisy channels was examined previously in [6] for N dimensional data. [sent-212, score-0.362]

94 However, the capacity constraint was defined for a population and only examined the case of undercomplete codes. [sent-213, score-0.376]

95 In the model studied here, motivated by the neural representation, the capacity constraint is imposed for individual units. [sent-214, score-0.28]

96 Furthermore, the model allows for arbitrary number of units, which provides a way to arbitrarily improve the robustness of the code using a population code. [sent-215, score-0.14]

97 The theoretical analysis for oneand two- dimensional cases quantifies the amount of error reduction as a function of the SNR and the number of units along with the data covariance matrix. [sent-216, score-0.277]

98 Finally, our numerical results for higher- dimensional image data demonstrate a dramatic improvement in the robustness of the code over both conventional transforms such as wavelets and also representations optimized for statistical efficiency such as ICA. [sent-217, score-0.286]

99 Sparse coding of natural images using an overcomplete set of limited capacity units. [sent-241, score-0.626]

100 Optimal linear compression under unreliable representation and robust PCA neural models. [sent-257, score-0.193]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mse', 0.416), ('decoding', 0.253), ('coding', 0.24), ('snr', 0.234), ('channel', 0.227), ('capacity', 0.222), ('reconstruction', 0.204), ('axis', 0.17), ('encoding', 0.163), ('anisotropic', 0.147), ('optimal', 0.145), ('solutions', 0.145), ('vectors', 0.14), ('vvt', 0.133), ('aw', 0.132), ('overcomplete', 0.116), ('ica', 0.109), ('units', 0.106), ('principal', 0.101), ('representational', 0.098), ('zk', 0.098), ('robust', 0.097), ('representation', 0.096), ('degenerate', 0.092), ('isotropic', 0.088), ('along', 0.083), ('bars', 0.079), ('sensory', 0.074), ('cos', 0.073), ('minimum', 0.068), ('awx', 0.066), ('decoder', 0.066), ('codes', 0.064), ('variance', 0.062), ('wavelet', 0.061), ('constraint', 0.058), ('replication', 0.058), ('minimized', 0.057), ('noisy', 0.056), ('tr', 0.056), ('code', 0.053), ('cu', 0.053), ('delity', 0.053), ('wavelets', 0.053), ('minor', 0.05), ('encoder', 0.049), ('whitening', 0.049), ('population', 0.049), ('limited', 0.048), ('bit', 0.048), ('yielding', 0.047), ('examined', 0.047), ('ellipses', 0.046), ('sin', 0.044), ('evenly', 0.04), ('representations', 0.039), ('expressed', 0.039), ('increasing', 0.039), ('guration', 0.039), ('precision', 0.039), ('robustness', 0.038), ('unit', 0.038), ('iff', 0.038), ('dramatic', 0.038), ('vt', 0.038), ('ln', 0.037), ('gray', 0.037), ('prove', 0.036), ('image', 0.036), ('accordingly', 0.035), ('noise', 0.035), ('condition', 0.035), ('orthogonal', 0.035), ('wi', 0.035), ('plane', 0.034), ('origin', 0.033), ('im', 0.033), ('term', 0.033), ('monotonically', 0.033), ('matrix', 0.032), ('error', 0.032), ('channels', 0.032), ('root', 0.031), ('minimizes', 0.031), ('gurations', 0.03), ('gets', 0.03), ('insights', 0.03), ('angle', 0.03), ('analytically', 0.03), ('dimensional', 0.029), ('con', 0.029), ('mutual', 0.029), ('uut', 0.029), ('atick', 0.029), ('degeneration', 0.029), ('doi', 0.029), ('hyvarinen', 0.029), ('perfect', 0.028), ('theoretical', 0.027), ('reduced', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

Author: Eizaburo Doi, Doru C. Balcan, Michael S. Lewicki

Abstract: Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. 1

2 0.14913349 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

Author: Yan Karklin, Michael S. Lewicki

Abstract: Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.

3 0.14768055 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting

Author: Martin J. Wainwright

Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of variational methods. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. 1 Keywords: Markov random fields; variational method; message-passing algorithms; sum-product; belief propagation; parameter estimation; learning. 1

4 0.11969766 106 nips-2005-Large-scale biophysical parameter estimation in single neurons via constrained linear regression

Author: Misha Ahrens, Liam Paninski, Quentin J. Huys

Abstract: Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell’s membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels’ reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g. via voltage sensitive imaging techniques), b) an approximate kinetic description of the channels and synapses present in each compartment, and c) the morphology of the part of the neuron under investigation. The key observation is that, given data a)-c), all of the parameters 1)-4) may be simultaneously inferred by a version of constrained linear regression; this regression, in turn, is efficiently solved using standard algorithms, without any “local minima” problems despite the large number of parameters and complex dynamics. The noise level 5) may also be estimated by standard techniques. We demonstrate the method’s accuracy on several model datasets, and describe techniques for quantifying the uncertainty in our estimates. 1

5 0.11574968 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

Author: Amir Navot, Lavi Shpigelman, Naftali Tishby, Eilon Vaadia

Abstract: We present a non-linear, simple, yet effective, feature subset selection method for regression and use it in analyzing cortical neural activity. Our algorithm involves a feature-weighted version of the k-nearest-neighbor algorithm. It is able to capture complex dependency of the target function on its input and makes use of the leave-one-out error as a natural regularization. We explain the characteristics of our algorithm on synthetic problems and use it in the context of predicting hand velocity from spikes recorded in motor cortex of a behaving monkey. By applying feature selection we are able to improve prediction quality and suggest a novel way of exploring neural data.

6 0.10577202 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

7 0.095706485 125 nips-2005-Message passing for task redistribution on sparse graphs

8 0.091242783 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions

9 0.073266953 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

10 0.071187191 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

11 0.070982486 102 nips-2005-Kernelized Infomax Clustering

12 0.069983192 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers

13 0.066974349 88 nips-2005-Gradient Flow Independent Component Analysis in Micropower VLSI

14 0.065828592 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis

15 0.064372107 158 nips-2005-Products of ``Edge-perts

16 0.063034937 2 nips-2005-A Bayes Rule for Density Matrices

17 0.061028212 167 nips-2005-Robust design of biological experiments

18 0.05987278 50 nips-2005-Convex Neural Networks

19 0.058096316 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms

20 0.056883227 23 nips-2005-An Application of Markov Random Fields to Range Sensing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.196), (1, -0.013), (2, -0.032), (3, 0.053), (4, -0.018), (5, -0.009), (6, -0.177), (7, -0.124), (8, 0.105), (9, -0.053), (10, 0.021), (11, -0.012), (12, -0.027), (13, -0.03), (14, 0.102), (15, 0.022), (16, -0.065), (17, 0.05), (18, -0.164), (19, -0.132), (20, -0.071), (21, -0.076), (22, 0.005), (23, 0.109), (24, 0.098), (25, 0.007), (26, -0.027), (27, -0.171), (28, 0.282), (29, -0.099), (30, -0.155), (31, 0.036), (32, -0.124), (33, -0.002), (34, 0.028), (35, -0.061), (36, 0.022), (37, -0.021), (38, -0.055), (39, -0.101), (40, 0.036), (41, 0.067), (42, -0.083), (43, -0.105), (44, 0.021), (45, 0.032), (46, -0.136), (47, 0.063), (48, -0.125), (49, -0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97400033 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

Author: Eizaburo Doi, Doru C. Balcan, Michael S. Lewicki

Abstract: Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. 1

2 0.55382508 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers

Author: Juan J. Murillo-fuentes, Sebastian Caro, Fernando Pérez-Cruz

Abstract: In this paper we propose a new receiver for digital communications. We focus on the application of Gaussian Processes (GPs) to the multiuser detection (MUD) in code division multiple access (CDMA) systems to solve the near-far problem. Hence, we aim to reduce the interference from other users sharing the same frequency band. While usual approaches minimize the mean square error (MMSE) to linearly retrieve the user of interest, we exploit the same criteria but in the design of a nonlinear MUD. Since the optimal solution is known to be nonlinear, the performance of this novel method clearly improves that of the MMSE detectors. Furthermore, the GP based MUD achieves excellent interference suppression even for short training sequences. We also include some experiments to illustrate that other nonlinear detectors such as those based on Support Vector Machines (SVMs) exhibit a worse performance. 1

3 0.53310943 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

Author: Yan Karklin, Michael S. Lewicki

Abstract: Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.

4 0.50666195 158 nips-2005-Products of ``Edge-perts

Author: Max Welling, Peter V. Gehler

Abstract: Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images. 1

5 0.4987675 106 nips-2005-Large-scale biophysical parameter estimation in single neurons via constrained linear regression

Author: Misha Ahrens, Liam Paninski, Quentin J. Huys

Abstract: Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell’s membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels’ reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g. via voltage sensitive imaging techniques), b) an approximate kinetic description of the channels and synapses present in each compartment, and c) the morphology of the part of the neuron under investigation. The key observation is that, given data a)-c), all of the parameters 1)-4) may be simultaneously inferred by a version of constrained linear regression; this regression, in turn, is efficiently solved using standard algorithms, without any “local minima” problems despite the large number of parameters and complex dynamics. The noise level 5) may also be estimated by standard techniques. We demonstrate the method’s accuracy on several model datasets, and describe techniques for quantifying the uncertainty in our estimates. 1

6 0.49212849 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

7 0.45300454 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting

8 0.44817349 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

9 0.41481462 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

10 0.40122968 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

11 0.39138728 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

12 0.386774 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms

13 0.37944803 167 nips-2005-Robust design of biological experiments

14 0.37733361 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis

15 0.36011353 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

16 0.3224414 2 nips-2005-A Bayes Rule for Density Matrices

17 0.31735617 125 nips-2005-Message passing for task redistribution on sparse graphs

18 0.31265518 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions

19 0.309724 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction

20 0.3095918 172 nips-2005-Selecting Landmark Points for Sparse Manifold Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.054), (10, 0.034), (27, 0.067), (31, 0.046), (34, 0.086), (39, 0.01), (55, 0.025), (65, 0.013), (69, 0.06), (73, 0.033), (77, 0.36), (88, 0.092), (91, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83934844 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

Author: Eizaburo Doi, Doru C. Balcan, Michael S. Lewicki

Abstract: Biological sensory systems are faced with the problem of encoding a high-fidelity sensory signal with a population of noisy, low-fidelity neurons. This problem can be expressed in information theoretic terms as coding and transmitting a multi-dimensional, analog signal over a set of noisy channels. Previously, we have shown that robust, overcomplete codes can be learned by minimizing the reconstruction error with a constraint on the channel capacity. Here, we present a theoretical analysis that characterizes the optimal linear coder and decoder for one- and twodimensional data. The analysis allows for an arbitrary number of coding units, thus including both under- and over-complete representations, and provides a number of important insights into optimal coding strategies. In particular, we show how the form of the code adapts to the number of coding units and to different data and noise conditions to achieve robustness. We also report numerical solutions for robust coding of highdimensional image data and show that these codes are substantially more robust compared against other image codes such as ICA and wavelets. 1

2 0.80100608 62 nips-2005-Efficient Estimation of OOMs

Author: Herbert Jaeger, Mingjie Zhao, Andreas Kolling

Abstract: A standard method to obtain stochastic models for symbolic time series is to train state-emitting hidden Markov models (SE-HMMs) with the Baum-Welch algorithm. Based on observable operator models (OOMs), in the last few months a number of novel learning algorithms for similar purposes have been developed: (1,2) two versions of an ”efficiency sharpening” (ES) algorithm, which iteratively improves the statistical efficiency of a sequence of OOM estimators, (3) a constrained gradient descent ML estimator for transition-emitting HMMs (TE-HMMs). We give an overview on these algorithms and compare them with SE-HMM/EM learning on synthetic and real-life data. 1

3 0.71096665 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression

Author: Lacey Gunter, Ji Zhu

Abstract: In this paper we derive an algorithm that computes the entire solution path of the support vector regression, with essentially the same computational cost as fitting one SVR model. We also propose an unbiased estimate for the degrees of freedom of the SVR model, which allows convenient selection of the regularization parameter. 1

4 0.44840884 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis

Author: Tatsuto Murayama, Peter Davis

Abstract: This paper provides a system-level analysis of a scalable distributed sensing model for networked sensors. In our system model, a data center acquires data from a bunch of L sensors which each independently encode their noisy observations of an original binary sequence, and transmit their encoded data sequences to the data center at a combined rate R, which is limited. Supposing that the sensors use independent LDGM rate distortion codes, we show that the system performance can be evaluated for any given finite R when the number of sensors L goes to infinity. The analysis shows how the optimal strategy for the distributed sensing problem changes at critical values of the data rate R or the noise level. 1

5 0.44403881 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning

Author: Zhenyue Zhang, Hongyuan Zha

Abstract: We propose a fast manifold learning algorithm based on the methodology of domain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. Numerical examples are given to illustrate the efficiency and effectiveness of the proposed methods. 1

6 0.44399658 78 nips-2005-From Weighted Classification to Policy Search

7 0.4414002 102 nips-2005-Kernelized Infomax Clustering

8 0.43880025 168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions

9 0.43560377 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

10 0.43175441 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

11 0.42578575 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

12 0.42202687 144 nips-2005-Off-policy Learning with Options and Recognizers

13 0.42198214 167 nips-2005-Robust design of biological experiments

14 0.42044359 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

15 0.41927135 45 nips-2005-Conditional Visual Tracking in Kernel Space

16 0.41754979 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

17 0.41655165 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

18 0.41616374 184 nips-2005-Structured Prediction via the Extragradient Method

19 0.41543144 100 nips-2005-Interpolating between types and tokens by estimating power-law generators

20 0.41541383 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers