nips nips2009 nips2009-140 knowledge-graph by maker-knowledge-mining

140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation


Source: pdf

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Linearly constrained Bayesian matrix factorization for blind source separation Mikkel N. [sent-1, score-0.888]

2 dk Abstract We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. [sent-4, score-0.479]

3 The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. [sent-5, score-0.25]

4 Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. [sent-7, score-0.387]

5 The method is evaluated on a blind source separation problem. [sent-8, score-0.367]

6 We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques. [sent-9, score-0.449]

7 1 Introduction Source separation problems arise when a number of signals are mixed together, and the objective is to estimate the underlying sources based on the observed mixture. [sent-10, score-0.57]

8 In the supervised, modelbased approach to source separation, examples of isolated sources are used to train source models, which are then combined in order to separate a mixture. [sent-11, score-0.643]

9 Oppositely, in unsupervised, blind source separation, only very general information about the sources is available. [sent-12, score-0.623]

10 Instead of estimating models of the sources, blind source separation is based on relatively weak criteria such as minimizing correlations, maximizing statistical independence, or fitting data subject to constraints. [sent-13, score-0.422]

11 Under the assumptions of linear mixing and additive noise, blind source separation can be expressed as a matrix factorization problem, K X = AB +N, I×J I×K K×J I×J or equivalently, xij = aik bkj + nij , (1) k=1 where the subscripts below the matrices denote their dimensions. [sent-14, score-1.676]

12 The columns of A represent K unknown sources, and the elements of B are the mixing coefficients. [sent-15, score-0.167]

13 Each of the J columns of X contains an observation that is a mixture of the sources plus additive noise represented by the columns of N . [sent-16, score-0.56]

14 orthant (b) bkj ≥ 0 (c) aik ≥ 0, bkj ≥ 0 (d) Polytope in non-neg. [sent-22, score-1.137]

15 The red hatched area indicates the feasible region for the source vectors (columns of A). [sent-24, score-0.177]

16 Dots, , are examples of specific positions of source vectors, and the black hatched area, , is the corresponding feasible region for the data vectors. [sent-25, score-0.177]

17 The main contribution in this paper is that specifying relevant constraints other than non-negativity substantially changes the qualities of the results obtained using matrix factorization. [sent-28, score-0.243]

18 Some intuition about how the incorporation of different constraints affects the matrix factorization can be gained by considering their geometric implications. [sent-29, score-0.536]

19 Figure 1 shows how different linear constraints on A and B constrain the model space. [sent-30, score-0.149]

20 For example, if the mixing coefficients are constrained to be non-negative, data is modelled as the convex hull of a simplicial cone, and if the mixing coefficients are further constrained to sum to unity, data is modelled as the hull of a convex polytope. [sent-31, score-0.673]

21 In this paper, we develop a general and flexible framework for Bayesian matrix factorization, in which the unknown sources and the mixing coefficients are treated as hidden variables. [sent-32, score-0.586]

22 Furthermore, we allow any number of linear equality or inequality constraints to be specified. [sent-33, score-0.343]

23 On an unsupervised image separation problem, we demonstrate, that when relevant constraints are specified, the method finds interpretable features that are remarkably different from features computed using other matrix factorization techniques. [sent-34, score-0.789]

24 The proposed method is related to recently proposed Bayesian matrix factorization techniques: Bayesian matrix factorization based on Gibbs sampling has been demonstrated [7, 8] to scale up to very large datasets and to avoid the problem of overfitting associated with non-Bayesian techniques. [sent-35, score-0.852]

25 Bayesian methods for non-negative matrix factorization have also been proposed, either based on variational inference [1] or Gibbs sampling [4, 9]. [sent-36, score-0.493]

26 The paper is structured as follows: In section 2, the linearly constrained Bayesian matrix factorization model is described. [sent-38, score-0.603]

27 In Section 4, the method is applied to an unsupervised source separation problem and compared to other existing matrix factorization methods. [sent-40, score-0.647]

28 2 2 The linearly constrained Bayesian matrix factorization model In the following, we describe the linearly constrained Bayesian matrix factorization model. [sent-42, score-1.206]

29 zero mean Gaussian noise model, 1 2πvij p(nij ) = N (nij |0, vij ) = √ n2 exp − 2vij , ij (2) where, in the most general formulation, each matrix element has its own variance, vij ; however, the variance parameters can easily be joined, e. [sent-46, score-0.704]

30 The likelihood is given by I J p(x|θ) = i=1 j=1 K N xij I J aik bkj , vij = k=1 i=1 j=1 (x − 1 exp − 2πvij K k=1 aik bkj )2 2vij , (3) where θ = A, B, {vij } denotes all parameters in the model. [sent-52, score-1.683]

31 For the noise variance parameters we choose conjugate inverse-gamma priors, p(vij ) = IG(vij |α, β) = β α −(α+1) v exp Γ(α) ij −β vij . [sent-53, score-0.327]

32 2 Priors for sources and mixing coefficients We now define the prior distribution for the factorizing matrices, A and B. [sent-55, score-0.577]

33 We choose a Gaussian prior over a and b subject to inequality constraints, Q, and equality constraints, R,   Σa Σab µa a   , , if Q(a, b) ≤ 0, R(a, b) = 0,  N b µb Σ⊤ Σb ab (5) p(a, b) ∝   ≡µ ≡Σ   0, otherwise. [sent-63, score-0.426]

34 In the most general formulation, the constraints, Q : RIK×RKJ → RNQ and R : RIK×RKJ → RNR , are biaffine maps, that define NQ inequality and NR equality constraints jointly in a and b. [sent-65, score-0.343]

35 m m m By rearranging terms and combining the NQ constraints in matrix notation, we may write   (b) −q1 −b⊤q 1 ⊤   (a) (ab) . [sent-67, score-0.243]

36   ⊤ (b) −qNQ−b q NQ ≡Q (6) (7) a ≡qa from which it is clear that the constraints are linear in a. [sent-70, score-0.149]

37 Likewise, the constraints can be rearranged to a linear form in b. [sent-71, score-0.149]

38 This general formulation of the priors allows all elements of a and b to have prior dependencies both through their covariance matrix, Σab , and through the joint constraints; however, in some 3 µ, Σ, Q, R k=1. [sent-73, score-0.157]

39 aik vij Figure 2: Graphical model for linearly constrained Bayesian matrix factorization, when A and B are independent in the prior. [sent-82, score-0.854]

40 1 Gibbs sampling We propose an inference procedure based on Gibbs sampling. [sent-98, score-0.138]

41 Gibbs sampling is applicable when the joint density of the parameters is not known, but the parameters can be partitioned into groups, such that their posterior conditional densities are known. [sent-99, score-0.263]

42 Due to the choice of conjugate prior, the posterior density is an inverse-gamma, p(vij |θ\vij ) = IG(vij |¯ , β), α ¯ α=α+ ¯ 1 2, ¯ β=β+ 1 2 4 xij − K k=1 (9) aik bkj 2 , (10) from which samples can be generated using standard acceptance-rejection methods. [sent-104, score-0.844]

43 We only discuss generating samples from a, since the sampling procedure for b is identical due to the symmetry of the model. [sent-106, score-0.137]

44 Conditioned on b, the prior density of a is a constrained Gaussian, N (a|˜ a , Σa ), if Q⊤ a ≤ q a , R⊤ a = ra , µ ˜ a a p(a|b) ∝ (11) 0, otherwise, ˜ µa = µa + Σab Σ−1 (b − µb ), Σa = Σa − Σab Σ−1 Σ⊤ , ˜ (12) ab b b where we have used Eq. [sent-107, score-0.411]

45 (9); second, a is generated from the constrained Gaussian density in Eq. [sent-118, score-0.19]

46 2 Sampling from a constrained Gaussian An essential component in the proposed matrix factorization method is an algorithm for generating random samples from a multivariate Gaussian density subject to linear equality and inequality constraints. [sent-122, score-0.879]

47 (15) A similar problem has previously been treated by Geweke [2], who proposes a Gibbs sampling procedure, that does not handle equality constraints and no more than N inequality constraints. [sent-124, score-0.421]

48 [6] extends the method in [2] to an arbitrary number of inequality constraints, but do not provide an algorithm for handling equality constraints. [sent-126, score-0.194]

49 Here, we present a general Gibbs sampling procedure that handles any number of equality and inequality constraints. [sent-127, score-0.304]

50 The equality constraints restrict the distribution to an affine subspace of dimensionality N − R, where R is the number of linearly independent constraints. [sent-128, score-0.381]

51 The conditional distribution on that subspace is a Gaussian subject to inequality constraints. [sent-129, score-0.201]

52 We now define a transformed variable, y, that is related to x by y = T⊥ (x − x0 ), y ∈ RN −R (17) where x0 is some vector that satisfies the equality constraints, e. [sent-135, score-0.144]

53 This may potentially improve the sampling procedure, because Gibbs sampling can suffer from slow mixing when the distribution is highly correlated. [sent-141, score-0.259]

54 Correlations between the elements of y are due to both the Gaussian covariance structure and the inequality constraints; however, for simplicity we only decorrelate with respect to the covariance of the underlying unconstrained Gaussian. [sent-142, score-0.201]

55 To this end, we define the transformed variable, z, given by z = L−⊤ (y − µy ), (20) where L is the Cholesky factorization of the covariance matrix, LL⊤ = Σy . [sent-143, score-0.368]

56 The distribution of z is then a standard Gaussian subject to inequality constraints, N (z|0, I), if Q⊤z ≤ q z , z p(z|R⊤x = rx ) ∝ x 0, otherwise, q z = q y − Q⊤µy . [sent-144, score-0.18]

57 (23) Samples from this density can be generated using standard methods such as inverse transform sampling (transforming a uniform random variable by the inverse cumulative density function); the efficient mixed rejection sampling algorithm proposed by Geweke [2]; or slice sampling [5]. [sent-146, score-0.42]

58 4 Experiments We demonstrate the proposed linearly constrained Bayesian matrix factorization method on a blind image separation problem, and compare it to two other matrix factorization techniques: independent component analysis (ICA) and non-negative matrix factorization (NMF). [sent-150, score-1.649]

59 Data We used a subset from the MNIST dataset which consists of 28 × 28 pixel grayscale images of handwritten digits (see Figure 4. [sent-151, score-0.233]

60 From these images we created 4, 000 image mixtures by adding the grayscale intensities of the images two and two, such that the different digits were combined in equal proportion. [sent-154, score-0.355]

61 We rescaled the mixed images so that their pixel intensities were in the 0–1 interval, and arranged the vectorized images as the columns of the matrix X ∈ RI×J , where I = 784 and J = 4, 000. [sent-155, score-0.382]

62 6 x2 x2 p(x1 |x2 = x∗ ) x∗ 1 2 5 3 x1 ℓ (a) u x1 ℓ (b) u 4 x1 (c) Figure 3: Gibbs sampling from a multivariate Gaussian density subject to linear constraints. [sent-158, score-0.215]

63 c) Gibbs sampling proceeds iteratively by sweeping over the dimensions and sampling from the conditional distribution in each dimension conditioned on the current value in the other dimensions. [sent-161, score-0.271]

64 Task The objective is to factorize the data matrix in order to find a number of source images that explain the data. [sent-162, score-0.263]

65 Ideally, the sources should correspond to the original digits. [sent-163, score-0.389]

66 We cannot hope to find exactly 10 sources that each corresponds to a digit, because there are large variations as to how each digit is written. [sent-164, score-0.423]

67 For that reason, we used 40 hidden sources in our experiments, which allowed 4 exemplars on average for each digit. [sent-165, score-0.389]

68 Method For comparison we factorized the mixed image data using two standard matrix factorization techniques: ICA, where we used the FastICA algorithm, and NMF, where we used Lee and Seung’s multiplicative update algorithm [3]. [sent-166, score-0.467]

69 The sources determined using these methods are shown in Figure 4. [sent-167, score-0.389]

70 For the linearly constrained Bayesian matrix factorization, we used an isotropic noise model. [sent-169, score-0.385]

71 The hidden sources were constrained to have the same range of pixel intensities as the image mixtures, 0 ≤ aik ≤ 1. [sent-171, score-0.937]

72 This constraint allows the sources to be interpreted as images since pixel intensities outside this interval are not meaningful. [sent-172, score-0.552]

73 The mixing coefficients were constrained to be non-negative, bkj ≥ 0, and sum to unity, K bkj = 1; thus, the observed data is modeled k=1 as a convex combination of the sources. [sent-173, score-1.063]

74 The constraints ensure that only additive combinations of the sources are allowed, and introduces a negative correlation between the mixing coefficients. [sent-174, score-0.669]

75 This negative correlation implies that if one source contributes more to a mixture the other sources must correspondingly contribute less. [sent-175, score-0.545]

76 The idea behind this constraint is that it will lead the sources to compete as opposed to collaborate to explain the data. [sent-176, score-0.417]

77 A geometric interpretation of the constraints is illustrated in Figure 1. [sent-177, score-0.177]

78 h: The data vectors are modeled by a convex polytope in the non-negative unit hypercube, and the hidden sources are the vertices of this polytope. [sent-178, score-0.462]

79 The result of the matrix factorization are shown in Figure 4. [sent-180, score-0.387]

80 c) the sources are not constrained to be non-negative, and therefore do not have a direct interpretation as images. [sent-183, score-0.551]

81 Most of the computed sources are complex patterns, that appear to be dominated by a single digit. [sent-184, score-0.415]

82 While ICA certainly does find structure in the data, the estimated sources lack a clear interpretation. [sent-185, score-0.389]

83 Spatially, the sources are local as opposed to global. [sent-188, score-0.417]

84 The decomposition has an intuitive interpretation: Each source is a short line segment or a dot, and the different digits can be constructed by combining these parts. [sent-189, score-0.221]

85 e) computes sources with a very clear and intuitive interpretation: Almost all of the 40 computed sources visually resemble handwritten digits, and are thus well aligned with the sources that were used to generate the mixtures. [sent-191, score-1.257]

86 b) The mixture data consists of 4000 images of two mixed digits (20 examples shown). [sent-194, score-0.213]

87 e) Sources computed using linearly constrained Bayesian matrix factorization (details explained in the text). [sent-197, score-0.629]

88 Two sources stand out: One is a black blob of approximately the same size as the digits, and another is an all white feature, which are useful for adjusting the brightness. [sent-199, score-0.389]

89 5 Conclusions We presented a linearly constrained Bayesian matrix factorization method as well as an inference procedure for this model. [sent-200, score-0.663]

90 On an unsupervised image separation problem, we have demonstrated that the method finds sources that have a clear an interpretable meaning. [sent-201, score-0.616]

91 As opposed to ICA and NMF, our method finds sources that visually resemble handwritten digits. [sent-202, score-0.481]

92 The Gaussian priors over the sources can be used if knowledge is available about the covariance structure of the sources, e. [sent-204, score-0.492]

93 The constraints we used in our experiments were separate for A and B, but the framework allows bilinearly dependent constraints to be specified, which can be used for example to specify constraints in the data domain, i. [sent-207, score-0.447]

94 As a general framework for constrained Bayesian matrix factorization, the proposed method has applications in many other areas than blind source separation. [sent-210, score-0.462]

95 The method can also be used in a supervised source separation setting, where the distributions over sources and mixing coefficients are learned from a training set of isolated sources. [sent-212, score-0.752]

96 It is an interesting challenge to develop methods for learning relevant constraints from data. [sent-213, score-0.149]

97 Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints and the evaluation of constraint probabilities. [sent-223, score-0.23]

98 Separation of non-negative mixture of non-negative sources using a Bayesian approach and MCMC sampling. [sent-244, score-0.418]

99 Efficient gibbs sampling of truncated multivariate normal with application to constrained linear regression. [sent-259, score-0.418]

100 Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. [sent-269, score-0.416]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bkj', 0.413), ('sources', 0.389), ('factorization', 0.293), ('vij', 0.283), ('aik', 0.261), ('ab', 0.152), ('gibbs', 0.149), ('constraints', 0.149), ('constrained', 0.134), ('separation', 0.133), ('qz', 0.131), ('source', 0.127), ('equality', 0.116), ('blind', 0.107), ('mixing', 0.103), ('digits', 0.094), ('matrix', 0.094), ('nmf', 0.093), ('intensities', 0.087), ('bayesian', 0.087), ('doi', 0.084), ('linearly', 0.082), ('nq', 0.08), ('sampling', 0.078), ('inequality', 0.078), ('ica', 0.075), ('simplicial', 0.075), ('polytope', 0.073), ('zi', 0.065), ('nij', 0.065), ('interpretable', 0.062), ('posterior', 0.062), ('factorizing', 0.06), ('density', 0.056), ('coef', 0.056), ('priors', 0.056), ('subject', 0.055), ('cients', 0.053), ('xij', 0.052), ('hatched', 0.05), ('orthant', 0.05), ('qnq', 0.05), ('rik', 0.05), ('rkj', 0.05), ('mixed', 0.048), ('covariance', 0.047), ('rx', 0.047), ('noise', 0.044), ('ra', 0.044), ('decouple', 0.044), ('geweke', 0.044), ('sweeping', 0.044), ('gaussian', 0.043), ('images', 0.042), ('cone', 0.04), ('erf', 0.04), ('dk', 0.037), ('conditioned', 0.037), ('ig', 0.037), ('qa', 0.037), ('nds', 0.035), ('bv', 0.035), ('unity', 0.035), ('columns', 0.035), ('pixel', 0.034), ('subspace', 0.034), ('mnist', 0.034), ('digit', 0.034), ('hull', 0.034), ('seung', 0.034), ('conditional', 0.034), ('densities', 0.033), ('resemble', 0.032), ('procedure', 0.032), ('handwritten', 0.032), ('image', 0.032), ('grayscale', 0.031), ('hypercube', 0.031), ('isotropic', 0.031), ('qm', 0.031), ('truncated', 0.031), ('vec', 0.03), ('elements', 0.029), ('chain', 0.029), ('mixture', 0.029), ('inference', 0.028), ('interpretation', 0.028), ('diag', 0.028), ('additive', 0.028), ('transformed', 0.028), ('opposed', 0.028), ('modelled', 0.028), ('salakhutdinov', 0.028), ('mixtures', 0.027), ('schmidt', 0.027), ('generating', 0.027), ('slice', 0.026), ('multivariate', 0.026), ('computed', 0.026), ('prior', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

2 0.19220026 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

Author: Marcel V. Gerven, Botond Cseke, Robert Oostenveld, Tom Heskes

Abstract: We introduce a novel multivariate Laplace (MVL) distribution as a sparsity promoting prior for Bayesian source localization that allows the specification of constraints between and within sources. We represent the MVL distribution as a scale mixture that induces a coupling between source variances instead of their means. Approximation of the posterior marginals using expectation propagation is shown to be very efficient due to properties of the scale mixture representation. The computational bottleneck amounts to computing the diagonal elements of a sparse matrix inverse. Our approach is illustrated using a mismatch negativity paradigm for which MEG data and a structural MRI have been acquired. We show that spatial coupling leads to sources which are active over larger cortical areas as compared with an uncoupled prior. 1

3 0.16911611 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

4 0.080169864 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

Author: Finale Doshi-velez, Shakir Mohamed, Zoubin Ghahramani, David A. Knowles

Abstract: Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the first parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible. 1

5 0.079398483 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

Author: Ilya Sutskever, Joshua B. Tenenbaum, Ruslan Salakhutdinov

Abstract: We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data.

6 0.074758731 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

7 0.069689073 152 nips-2009-Measuring model complexity with the prior predictive

8 0.06776385 226 nips-2009-Spatial Normalized Gamma Processes

9 0.066608511 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters

10 0.062272299 43 nips-2009-Bayesian estimation of orientation preference maps

11 0.060670651 147 nips-2009-Matrix Completion from Noisy Entries

12 0.060528509 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

13 0.059826989 175 nips-2009-Occlusive Components Analysis

14 0.059115738 100 nips-2009-Gaussian process regression with Student-t likelihood

15 0.057832327 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

16 0.057730582 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

17 0.057362925 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

18 0.057066917 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

19 0.056634106 91 nips-2009-Fast, smooth and adaptive regression in metric spaces

20 0.055811636 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.188), (1, -0.041), (2, -0.0), (3, -0.002), (4, 0.039), (5, -0.095), (6, 0.146), (7, -0.032), (8, 0.027), (9, -0.032), (10, -0.049), (11, 0.003), (12, -0.043), (13, 0.052), (14, -0.025), (15, -0.012), (16, 0.006), (17, 0.085), (18, -0.05), (19, -0.07), (20, -0.082), (21, -0.029), (22, -0.186), (23, -0.015), (24, -0.142), (25, -0.16), (26, -0.023), (27, 0.031), (28, 0.094), (29, -0.004), (30, 0.026), (31, -0.075), (32, 0.071), (33, -0.189), (34, -0.018), (35, -0.115), (36, 0.133), (37, 0.067), (38, 0.025), (39, 0.0), (40, -0.068), (41, -0.05), (42, -0.005), (43, 0.029), (44, -0.069), (45, 0.002), (46, 0.038), (47, 0.023), (48, -0.119), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95185155 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

2 0.83971786 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

Author: Marcel V. Gerven, Botond Cseke, Robert Oostenveld, Tom Heskes

Abstract: We introduce a novel multivariate Laplace (MVL) distribution as a sparsity promoting prior for Bayesian source localization that allows the specification of constraints between and within sources. We represent the MVL distribution as a scale mixture that induces a coupling between source variances instead of their means. Approximation of the posterior marginals using expectation propagation is shown to be very efficient due to properties of the scale mixture representation. The computational bottleneck amounts to computing the diagonal elements of a sparse matrix inverse. Our approach is illustrated using a mismatch negativity paradigm for which MEG data and a structural MRI have been acquired. We show that spatial coupling leads to sources which are active over larger cortical areas as compared with an uncoupled prior. 1

3 0.78893745 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Author: Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj

Abstract: In this paper we present an algorithm for separating mixed sounds from a monophonic recording. Our approach makes use of training data which allows us to learn representations of the types of sounds that compose the mixture. In contrast to popular methods that attempt to extract compact generalizable models for each sound from training data, we employ the training data itself as a representation of the sources in the mixture. We show that mixtures of known sounds can be described as sparse combinations of the training data itself, and in doing so produce significantly better separation results as compared to similar systems based on compact statistical models. Keywords: Example-Based Representation, Signal Separation, Sparse Models. 1

4 0.46754664 100 nips-2009-Gaussian process regression with Student-t likelihood

Author: Jarno Vanhatalo, Pasi Jylänki, Aki Vehtari

Abstract: In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Student-t distribution, reduces the influence of outlying observations and improves the predictions. The problem, however, is the analytically intractable inference. In this work, we discuss the properties of a Gaussian process regression model with the Student-t likelihood and utilize the Laplace approximation for approximate inference. We compare our approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the Student-t distribution. 1

5 0.42520103 94 nips-2009-Fast Learning from Non-i.i.d. Observations

Author: Ingo Steinwart, Andreas Christmann

Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1

6 0.4012807 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

7 0.39921364 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

8 0.39420515 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison

9 0.38997871 152 nips-2009-Measuring model complexity with the prior predictive

10 0.38266632 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

11 0.37969741 226 nips-2009-Spatial Normalized Gamma Processes

12 0.36744305 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

13 0.36413872 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

14 0.36165407 247 nips-2009-Time-rescaling methods for the estimation and assessment of non-Poisson neural encoding models

15 0.35846955 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

16 0.35050449 165 nips-2009-Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording

17 0.34965593 138 nips-2009-Learning with Compressible Priors

18 0.34393558 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

19 0.34004581 147 nips-2009-Matrix Completion from Noisy Entries

20 0.33696544 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.017), (21, 0.015), (24, 0.035), (25, 0.069), (35, 0.073), (36, 0.074), (39, 0.036), (58, 0.09), (71, 0.053), (81, 0.392), (86, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98519361 200 nips-2009-Reconstruction of Sparse Circuits Using Multi-neuronal Excitation (RESCUME)

Author: Tao Hu, Anthony Leonardo, Dmitri B. Chklovskii

Abstract: One of the central problems in neuroscience is reconstructing synaptic connectivity in neural circuits. Synapses onto a neuron can be probed by sequentially stimulating potentially pre-synaptic neurons while monitoring the membrane voltage of the post-synaptic neuron. Reconstructing a large neural circuit using such a “brute force” approach is rather time-consuming and inefficient because the connectivity in neural circuits is sparse. Instead, we propose to measure a post-synaptic neuron’s voltage while stimulating sequentially random subsets of multiple potentially pre-synaptic neurons. To reconstruct these synaptic connections from the recorded voltage we apply a decoding algorithm recently developed for compressive sensing. Compared to the brute force approach, our method promises significant time savings that grow with the size of the circuit. We use computer simulations to find optimal stimulation parameters and explore the feasibility of our reconstruction method under realistic experimental conditions including noise and non-linear synaptic integration. Multineuronal stimulation allows reconstructing synaptic connectivity just from the spiking activity of post-synaptic neurons, even when sub-threshold voltage is unavailable. By using calcium indicators, voltage-sensitive dyes, or multi-electrode arrays one could monitor activity of multiple postsynaptic neurons simultaneously, thus mapping their synaptic inputs in parallel, potentially reconstructing a complete neural circuit. 1 In tro d uc tio n Understanding information processing in neural circuits requires systematic characterization of synaptic connectivity [1, 2]. The most direct way to measure synapses between a pair of neurons is to stimulate potentially pre-synaptic neuron while recording intra-cellularly from the potentially post-synaptic neuron [3-8]. This method can be scaled to reconstruct multiple synaptic connections onto one neuron by combining intracellular recordings from the postsynaptic neuron with photo-activation of pre-synaptic neurons using glutamate uncaging [913] or channelrhodopsin [14, 15], or with multi-electrode arrays [16, 17]. Neurons are sequentially stimulated to fire action potentials by scanning a laser beam (or electrode voltage) over a brain slice, while synaptic weights are measured by recording post-synaptic voltage. Although sequential excitation of single potentially pre-synaptic neurons could reveal connectivity, such a “brute force” approach is inefficient because the connectivity among neurons is sparse. Even among nearby neurons in the cerebral cortex, the probability of connection is only about ten percent [3-8]. Connection probability decays rapidly with the 1 distance between neurons and falls below one percent on the scale of a cortical column [3, 8]. Thus, most single-neuron stimulation trials would result in zero response making the brute force approach slow, especially for larger circuits. Another drawback of the brute force approach is that single-neuron stimulation cannot be combined efficiently with methods allowing parallel recording of neural activity, such as calcium imaging [18-22], voltage-sensitive dyes [23-25] or multi-electrode arrays [17, 26]. As these techniques do not reliably measure sub-threshold potential but report only spiking activity, they would reveal only the strongest connections that can drive a neuron to fire [2730]. Therefore, such combination would reveal only a small fraction of the circuit. We propose to circumvent the above limitations of the brute force approach by stimulating multiple potentially pre-synaptic neurons simultaneously and reconstructing individual connections by using a recently developed method called compressive sensing (CS) [31-35]. In each trial, we stimulate F neurons randomly chosen out of N potentially pre-synaptic neurons and measure post-synaptic activity. Although each measurement yields only a combined response to stimulated neurons, if synaptic inputs sum linearly in a post-synaptic neuron, one can reconstruct the weights of individual connections by using an optimization algorithm. Moreover, if the synaptic connections are sparse, i.e. only K << N potentially pre-synaptic neurons make synaptic connections onto a post-synaptic neuron, the required number of trials M ~ K log(N/K), which is much less than N [31-35]. The proposed method can be used even if only spiking activity is available. Because multiple neurons are driven to fire simultaneously, if several of them synapse on the post-synaptic neuron, they can induce one or more spikes in that neuron. As quantized spike counts carry less information than analog sub-threshold voltage recordings, reconstruction requires a larger number of trials. Yet, the method can be used to reconstruct a complete feedforward circuit from spike recordings. Reconstructing neural circuit with multi-neuronal excitation may be compared with mapping retinal ganglion cell receptive fields. Typically, photoreceptors are stimulated by white-noise checkerboard stimulus and the receptive field is obtained by Reverse Correlation (RC) in case of sub-threshold measurements or Spike-Triggered Average (STA) of the stimulus [36, 37]. Although CS may use the same stimulation protocol, for a limited number of trials, the reconstruction quality is superior to RC or STA. 2 Ma pp i ng sy na pti c inp ut s o nto o n e ne uro n We start by formalizing the problem of mapping synaptic connections from a population of N potentially pre-synaptic neurons onto a single neuron, as exemplified by granule cells synapsing onto a Purkinje cell (Figure 1a). Our experimental protocol can be illustrated using linear algebra formalism, Figure 1b. We represent synaptic weights as components of a column vector x, where zeros represent non-existing connections. Each row in the stimulation matrix A represents a trial, ones indicating neurons driven to spike once and zeros indicating non-spiking neurons. The number of rows in the stimulation matrix A is equal to the number of trials M. The column vector y represents M measurements of membrane voltage obtained by an intra-cellular recording from the post-synaptic neuron: y = Ax. (1) In order to recover individual synaptic weights, Eq. (1) must be solved for x. RC (or STA) solution to this problem is x = (ATA)-1AT y, which minimizes (y-Ax)2 if M>N. In the case M << N for a sparse circuit. In this section we search computationally for the minimum number of trials required for exact reconstruction as a function of the number of non-zero synaptic weights K out of N potentially pre-synaptic neurons. First, note that the number of trials depends on the number of stimulated neurons F. If F = 1 we revert to the brute force approach and the number of measurements is N, while for F = N, the measurements are redundant and no finite number suffices. As the minimum number of measurements is expected to scale as K logN, there must be an optimal F which makes each measurement most informative about x. To determine the optimal number of stimulated neurons F for given K and N, we search for the minimum number of trials M, which allows a perfect reconstruction of the synaptic connectivity x. For each F, we generate 50 synaptic weight vectors and attempt reconstruction from sequentially increasing numbers of trials. The value of M, at which all 50 recoveries are successful (up to computer round-off error), estimates the number of trial needed for reconstruction with probability higher than 98%. By repeating this procedure 50 times for each F, we estimate the mean and standard deviation of M. We find that, for given N and K, the minimum number of trials, M, as a function of the number of stimulated neurons, F, has a shallow minimum. As K decreases, the minimum shifts towards larger F because more neurons should be stimulated simultaneously for sparser x. For the explored range of simulation parameters, the minimum is located close to 0.1N. Next, we set F = 0.1N and explore how the minimum number of measurements required for exact reconstruction depends on K and N. Results of the simulations following the recipe described above are shown in Figure 3a. As expected, when x is sparse, M grows approximately linearly with K (Figure 3b), and logarithmically with N (Figure 3c). N = 1000 180 25 160 20 140 120 15 100 10 80 5 150 250 400 650 1000 Number of potential connections (N) K = 30 220 200 (a) 200 220 Number of measurements (M) 30 Number of measurements (M) Number of actual connections (K) Number of necessary measurements (M) (b) 180 160 140 120 100 80 5 10 15 20 25 30 Number of actual connections (K) 210 (c) 200 190 180 170 160 150 140 130 120 2 10 10 3 Number of potential connections (N) Figure 3: a) Minimum number of measurements M required for reconstruction as a function of the number of actual synapses, K, and the number of potential synapses, N. b) For given N, we find M ~ K. c) For given K, we find M ~ logN (note semi-logarithmic scale in c). 4 R o b ust nes s o f re con st r uc t io n s t o noi se a n d v io la tio n o f si m pli fy in g a ss umpt io n s To make our simulation more realistic we now take into account three possible sources of noise: 1) In reality, post-synaptic voltage on a given synapse varies from trial to trial [4, 5, 46-52], an effect we call synaptic noise. Such noise detrimentally affects reconstructions because each row of A is multiplied by a different instantiation of vector x. 2) Stimulation of neurons may be imprecise exciting a slightly different subset of neurons than intended and/or firing intended neurons multiple times. We call this effect stimulation noise. Such noise detrimentally affects reconstructions because, in its presence, the actual measurement matrix A is different from the one used for recovery. 3) A synapse may fail to release neurotransmitter with some probability. Naturally, in the presence of noise, reconstructions cannot be exact. We quantify the 4 reconstruction x − xr l2 = error ∑ N i =1 by the normalized x − xr l2–error l2 / xl , where 2 ( xi − xri ) 2 . We plot normalized reconstruction error in brute force approach (M = N = 500 trials) as a function of noise, as well as CS and RC reconstruction errors (M = 200, 600 trials), Figure 4. 2 0.9 Normalized reconstruction error ||x-x|| /||x|| 1 r 2 For each noise source, the reconstruction error of the brute force approach can be achieved with 60% fewer trials by CS method for the above parameters (Figure 4). For the same number of trials, RC method performs worse. Naturally, the reconstruction error decreases with the number of trials. The reconstruction error is most sensitive to stimulation noise and least sensitive to synaptic noise. 1 (a) 1 (b) 0.9 0.8 (c) 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0.7 RC: M=200 RC: M=600 Brute force method: M=500 CS: M=200 CS: M=600 0.6 0.5 0 0.05 0.1 0.15 Synaptic noise level 0.2 0.25 0 0.05 0.1 0.15 Stimulation noise level 0.2 0.25 0 0 0.05 0.1 0.15 0.2 0.25 Synaptic failure probability Figure 4: Impact of noise on the reconstruction quality for N = 500, K = 30, F = 50. a) Recovery error due to trial-to-trial variation in synaptic weight. The response y is calculated using the synaptic connectivity x perturbed by an additive Gaussian noise. The noise level is given by the coefficient of variation of synaptic weight. b) Recovery error due to stimulation noise. The matrix A used for recovery is obtained from the binary matrix used to calculate the measurement vector y by shifting, in each row, a fraction of ones specified by the noise level to random positions. c) Recovery error due to synaptic failures. The detrimental effect of the stimulation noise on the reconstruction can be eliminated by monitoring spiking activity of potentially pre-synaptic neurons. By using calcium imaging [18-22], voltage-sensitive dyes [23] or multi-electrode arrays [17, 26] one could record the actual stimulation matrix. Because most random matrices satisfy the reconstruction requirements [31, 34, 35], the actual stimulation matrix can be used for a successful recovery instead of the intended one. If neuronal activity can be monitored reliably, experiments can be done in a different mode altogether. Instead of stimulating designated neurons with high fidelity by using highly localized and intense light, one could stimulate all neurons with low probability. Random firing events can be detected and used in the recovery process. The light intensity can be tuned to stimulate the optimal number of neurons per trial. Next, we explore the sensitivity of the proposed reconstruction method to the violation of simplifying assumptions. First, whereas our simulation assumes that the actual number of connections, K, is known, in reality, connectivity sparseness is known a priori only approximately. Will this affect reconstruction results? In principle, CS does not require prior knowledge of K for reconstruction [31, 34, 35]. For the CoSaMP algorithm, however, it is important to provide value K larger than the actual value (Figure 5a). Then, the algorithm will find all the actual synaptic weights plus some extra non-zero weights, negligibly small when compared to actual ones. Thus, one can provide the algorithm with the value of K safely larger than the actual one and then threshold the reconstruction result according to the synaptic noise level. Second, whereas we assumed a linear summation of inputs [53], synaptic integration may be 2 non-linear [54]. We model non-linearity by setting y = yl + α yl , where yl represents linearly summed synaptic inputs. Results of simulations (Figure 5b) show that although nonlinearity can significantly degrade CS reconstruction quality, it still performs better than RC. 5 (b) 0.45 0.4 Actual K = 30 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 10 20 30 40 50 60 Normalized reconstruction error ||x-x||2/||x|| r 2 Normalized reconstrcution error ||x-x||2/||x|| r 2 (a) 0.9 0.8 0.7 CS RC 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.15-0.12-0.09-0.06-0.03 0 0.03 0.06 0.09 0.12 0.15 Relative strength of the non-linear term α × mean(y ) K fed to CoSaMP l Figure 5: Sensitivity of reconstruction error to the violation of simplifying assumptions for N = 500, K = 30, M = 200, F = 50. a) The quality of the reconstruction is not affected if the CoSaMP algorithm is fed with the value of K larger than actual. b) Reconstruction error computed in 100 realizations for each value of the quadratic term relative to the linear term. 5 Ma pp i ng sy na pti c inp ut s o nto a n e uro na l po pu la tio n Until now, we considered reconstruction of synaptic inputs onto one neuron using subthreshold measurements of its membrane potential. In this section, we apply CS to reconstructing synaptic connections onto a population of potentially post-synaptic neurons. Because in CS the choice of stimulated neurons is non-adaptive, by recording from all potentially post-synaptic neurons in response to one sequence of trials one can reconstruct a complete feedforward network (Figure 6). (a) x p(y=1) A Normalized reconstruction error ||x/||x||2−xr/||xr||2||2 y (b) (c) 100 (d) 500 700 900 1100 Number of spikes 1 STA CS 0.8 0.6 0.4 0.2 0 1000 0 300 3000 5000 7000 9000 Number of trials (M) Ax Figure 6: Mapping of a complete feedforward network. a) Each post-synaptic neuron (red) receives synapses from a sparse subset of potentially pre-synaptic neurons (blue). b) Linear algebra representation of the experimental protocol. c) Probability of firing as a function of synaptic current. d) Comparison of CS and STA reconstruction error using spike trains for N = 500, K = 30 and F = 50. Although attractive, such parallelization raises several issues. First, patching a large number of neurons is unrealistic and, therefore, monitoring membrane potential requires using different methods, such as calcium imaging [18-22], voltage sensitive dyes [23-25] or multielectrode arrays [17, 26]. As these methods can report reliably only spiking activity, the measurement is not analog but discrete. Depending on the strength of summed synaptic inputs compared to the firing threshold, the postsynaptic neuron may be silent, fire once or multiple times. As a result, the measured response y is quantized by the integer number of spikes. Such quantized measurements are less informative than analog measurements of the sub-threshold membrane potential. In the extreme case of only two quantization levels, spike or no spike, each measurement contains only 1 bit of information. Therefore, to achieve reasonable reconstruction quality using quantized measurements, a larger number of trials M>>N is required. We simulate circuit reconstruction from spike recordings in silico as follows. First, we draw synaptic weights from an experimentally motivated distribution. Second, we generate a 6 random stimulation matrix and calculate the product Ax. Third, we linear half-wave rectify this product and use the result as the instantaneous firing rate for the Poisson spike generator (Figure 6c). We used a rectifying threshold that results in 10% of spiking trials as typically observed in experiments. Fourth, we reconstruct synaptic weights using STA and CS and compare the results with the generated weights. We calculated mean error over 100 realizations of the simulation protocol (Figure 6d). Due to the non-linear spike generating procedure, x can be recovered only up to a scaling factor. We propose to calibrate x with a few brute-force measurements of synaptic weights. Thus, in calculating the reconstruction error using l2 norm, we normalize both the generated and recovered synaptic weights. Such definition is equivalent to the angular error, which is often used to evaluate the performance of STA in mapping receptive field [37, 55]. Why is CS superior to STA for a given number of trials (Figure 6d)? Note that spikeless trials, which typically constitute a majority, also carry information about connectivity. While STA discards these trials, CS takes them into account. In particular, CoSaMP starts with the STA solution as zeroth iteration and improves on it by using the results of all trials and the sparseness prior. 6 D i s c uss ion We have demonstrated that sparse feedforward networks can be reconstructed by stimulating multiple potentially pre-synaptic neurons simultaneously and monitoring either subthreshold or spiking response of potentially post-synaptic neurons. When sub-threshold voltage is recorded, significantly fewer measurements are required than in the brute force approach. Although our method is sensitive to noise (with stimulation noise worse than synapse noise), it is no less robust than the brute force approach or RC. The proposed reconstruction method can also recover inputs onto a neuron from spike counts, albeit with more trials than from sub-threshold potential measurements. This is particularly useful when intra-cellular recordings are not feasible and only spiking can be detected reliably, for example, when mapping synaptic inputs onto multiple neurons in parallel. For a given number of trials, our method yields smaller error than STA. The proposed reconstruction method assumes linear summation of synaptic inputs (both excitatory and inhibitory) and is sensitive to non-linearity of synaptic integration. Therefore, it is most useful for studying connections onto neurons, in which synaptic integration is close to linear. On the other hand, multi-neuron stimulation is closer than single-neuron stimulation to the intrinsic activity in the live brain and can be used to study synaptic integration under realistic conditions. In contrast to circuit reconstruction using intrinsic neuronal activity [56, 57], our method relies on extrinsic stimulation of neurons. Can our method use intrinsic neuronal activity instead? We see two major drawbacks of such approach. First, activity of non-monitored presynaptic neurons may significantly distort reconstruction results. Thus, successful reconstruction would require monitoring all active pre-synaptic neurons, which is rather challenging. Second, reliable reconstruction is possible only when the activity of presynaptic neurons is uncorrelated. Yet, their activity may be correlated, for example, due to common input. We thank Ashok Veeraraghavan for introducing us to CS, Anthony Leonardo for making a retina dataset available for the analysis, Lou Scheffer and Hong Young Noh for commenting on the manuscript and anonymous reviewers for helpful suggestions. References [1] Luo, L., Callaway, E.M. & Svoboda, K. (2008) Genetic dissection of neural circuits. Neuron 57(5):634-660. [2] Helmstaedter, M., Briggman, K.L. & Denk, W. (2008) 3D structural imaging of the brain with photons and electrons. Current opinion in neurobiology 18(6):633-641. [3] Holmgren, C., Harkany, T., Svennenfors, B. & Zilberter, Y. (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. Journal of Physiology 551:139-153. [4] Markram, H. (1997) A network of tufted layer 5 pyramidal neurons. Cerebral Cortex 7(6):523-533. 7 [5] Markram, H., Lubke, J., Frotscher, M., Roth, A. & Sakmann, B. (1997) Physiology and anatomy of synaptic connections between thick tufted pyramidal neurones in the developing rat neocortex. Journal of Physiology 500(2):409-440. [6] Thomson, A.M. & Bannister, A.P. (2003) Interlaminar connections in the neocortex. Cerebral Cortex 13(1):5-14. [7] Thomson, A.M., West, D.C., Wang, Y. & Bannister, A.P. (2002) Synaptic connections and small circuits involving excitatory and inhibitory neurons in layers 2-5 of adult rat and cat neocortex: triple intracellular recordings and biocytin labelling in vitro. Cerebral Cortex 12(9):936-953. [8] Song, S., Sjostrom, P.J., Reigl, M., Nelson, S. & Chklovskii, D.B. (2005) Highly nonrandom features of synaptic connectivity in local cortical circuits. Plos Biology 3(3):e68. [9] Callaway, E.M. & Katz, L.C. (1993) Photostimulation using caged glutamate reveals functional circuitry in living brain slices. Proceedings of the National Academy of Sciences of the United States of America 90(16):7661-7665. [10] Dantzker, J.L. & Callaway, E.M. (2000) Laminar sources of synaptic input to cortical inhibitory interneurons and pyramidal neurons. Nature Neuroscience 3(7):701-707. [11] Shepherd, G.M. & Svoboda, K. (2005) Laminar and columnar organization of ascending excitatory projections to layer 2/3 pyramidal neurons in rat barrel cortex. Journal of Neuroscience 25(24):5670-5679. [12] Nikolenko, V., Poskanzer, K.E. & Yuste, R. (2007) Two-photon photostimulation and imaging of neural circuits. Nature Methods 4(11):943-950. [13] Shoham, S., O'connor, D.H., Sarkisov, D.V. & Wang, S.S. (2005) Rapid neurotransmitter uncaging in spatially defined patterns. Nature Methods 2(11):837-843. [14] Gradinaru, V., Thompson, K.R., Zhang, F., Mogri, M., Kay, K., Schneider, M.B. & Deisseroth, K. (2007) Targeting and readout strategies for fast optical neural control in vitro and in vivo. Journal of Neuroscience 27(52):14231-14238. [15] Petreanu, L., Huber, D., Sobczyk, A. & Svoboda, K. (2007) Channelrhodopsin-2-assisted circuit mapping of long-range callosal projections. Nature Neuroscience 10(5):663-668. [16] Na, L., Watson, B.O., Maclean, J.N., Yuste, R. & Shepard, K.L. (2008) A 256×256 CMOS Microelectrode Array for Extracellular Neural Stimulation of Acute Brain Slices. Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International. [17] Fujisawa, S., Amarasingham, A., Harrison, M.T. & Buzsaki, G. (2008) Behavior-dependent shortterm assembly dynamics in the medial prefrontal cortex. Nature Neuroscience 11(7):823-833. [18] Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., Ferster, D. & Yuste, R. (2004) Synfire chains and cortical songs: temporal modules of cortical activity. Science 304(5670):559-564. [19] Ohki, K., Chung, S., Ch'ng, Y.H., Kara, P. & Reid, R.C. (2005) Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature 433(7026):597-603. [20] Stosiek, C., Garaschuk, O., Holthoff, K. & Konnerth, A. (2003) In vivo two-photon calcium imaging of neuronal networks. Proceedings of the National Academy of Sciences of the United States of America 100(12):7319-7324. [21] Svoboda, K., Denk, W., Kleinfeld, D. & Tank, D.W. (1997) In vivo dendritic calcium dynamics in neocortical pyramidal neurons. Nature 385(6612):161-165. [22] Sasaki, T., Minamisawa, G., Takahashi, N., Matsuki, N. & Ikegaya, Y. (2009) Reverse optical trawling for synaptic connections in situ. Journal of Neurophysiology 102(1):636-643. [23] Zecevic, D., Djurisic, M., Cohen, L.B., Antic, S., Wachowiak, M., Falk, C.X. & Zochowski, M.R. (2003) Imaging nervous system activity with voltage-sensitive dyes. Current Protocols in Neuroscience Chapter 6:Unit 6.17. [24] Cacciatore, T.W., Brodfuehrer, P.D., Gonzalez, J.E., Jiang, T., Adams, S.R., Tsien, R.Y., Kristan, W.B., Jr. & Kleinfeld, D. (1999) Identification of neural circuits by imaging coherent electrical activity with FRET-based dyes. Neuron 23(3):449-459. [25] Taylor, A.L., Cottrell, G.W., Kleinfeld, D. & Kristan, W.B., Jr. (2003) Imaging reveals synaptic targets of a swim-terminating neuron in the leech CNS. Journal of Neuroscience 23(36):11402-11410. [26] Hutzler, M., Lambacher, A., Eversmann, B., Jenkner, M., Thewes, R. & Fromherz, P. (2006) High-resolution multitransistor array recording of electrical field potentials in cultured brain slices. Journal of Neurophysiology 96(3):1638-1645. [27] Egger, V., Feldmeyer, D. & Sakmann, B. (1999) Coincidence detection and changes of synaptic efficacy in spiny stellate neurons in rat barrel cortex. Nature Neuroscience 2(12):1098-1105. [28] Feldmeyer, D., Egger, V., Lubke, J. & Sakmann, B. (1999) Reliable synaptic connections between pairs of excitatory layer 4 neurones within a single 'barrel' of developing rat somatosensory cortex. Journal of Physiology 521:169-190. [29] Peterlin, Z.A., Kozloski, J., Mao, B.Q., Tsiola, A. & Yuste, R. (2000) Optical probing of neuronal circuits with calcium indicators. Proceedings of the National Academy of Sciences of the United States of America 97(7):3619-3624. 8 [30] Thomson, A.M., Deuchars, J. & West, D.C. (1993) Large, deep layer pyramid-pyramid single axon EPSPs in slices of rat motor cortex display paired pulse and frequency-dependent depression, mediated presynaptically and self-facilitation, mediated postsynaptically. Journal of Neurophysiology 70(6):2354-2369. [31] Baraniuk, R.G. (2007) Compressive sensing. Ieee Signal Processing Magazine 24(4):118-120. [32] Candes, E.J. (2008) Compressed Sensing. Twenty-Second Annual Conference on Neural Information Processing Systems, Tutorials. [33] Candes, E.J., Romberg, J.K. & Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics 59(8):1207-1223. [34] Candes, E.J. & Tao, T. (2006) Near-optimal signal recovery from random projections: Universal encoding strategies? Ieee Transactions on Information Theory 52(12):5406-5425. [35] Donoho, D.L. (2006) Compressed sensing. Ieee Transactions on Information Theory 52(4):12891306. [36] Ringach, D. & Shapley, R. (2004) Reverse Correlation in Neurophysiology. Cognitive Science 28:147-166. [37] Schwartz, O., Pillow, J.W., Rust, N.C. & Simoncelli, E.P. (2006) Spike-triggered neural characterization. Journal of Vision 6(4):484-507. [38] Candes, E.J. & Tao, T. (2005) Decoding by linear programming. Ieee Transactions on Information Theory 51(12):4203-4215. [39] Needell, D. & Vershynin, R. (2009) Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit. Foundations of Computational Mathematics 9(3):317-334. [40] Tropp, J.A. & Gilbert, A.C. (2007) Signal recovery from random measurements via orthogonal matching pursuit. Ieee Transactions on Information Theory 53(12):4655-4666. [41] Dai, W. & Milenkovic, O. (2009) Subspace Pursuit for Compressive Sensing Signal Reconstruction. Ieee Transactions on Information Theory 55(5):2230-2249. [42] Needell, D. & Tropp, J.A. (2009) CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis 26(3):301-321. [43] Varshney, L.R., Sjostrom, P.J. & Chklovskii, D.B. (2006) Optimal information storage in noisy synapses under resource constraints. Neuron 52(3):409-423. [44] Brunel, N., Hakim, V., Isope, P., Nadal, J.P. & Barbour, B. (2004) Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43(5):745-757. [45] Napoletani, D. & Sauer, T.D. (2008) Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77(2):026103. [46] Allen, C. & Stevens, C.F. (1994) An evaluation of causes for unreliability of synaptic transmission. Proceedings of the National Academy of Sciences of the United States of America 91(22):10380-10383. [47] Hessler, N.A., Shirke, A.M. & Malinow, R. (1993) The probability of transmitter release at a mammalian central synapse. Nature 366(6455):569-572. [48] Isope, P. & Barbour, B. (2002) Properties of unitary granule cell-->Purkinje cell synapses in adult rat cerebellar slices. Journal of Neuroscience 22(22):9668-9678. [49] Mason, A., Nicoll, A. & Stratford, K. (1991) Synaptic transmission between individual pyramidal neurons of the rat visual cortex in vitro. Journal of Neuroscience 11(1):72-84. [50] Raastad, M., Storm, J.F. & Andersen, P. (1992) Putative Single Quantum and Single Fibre Excitatory Postsynaptic Currents Show Similar Amplitude Range and Variability in Rat Hippocampal Slices. European Journal of Neuroscience 4(1):113-117. [51] Rosenmund, C., Clements, J.D. & Westbrook, G.L. (1993) Nonuniform probability of glutamate release at a hippocampal synapse. Science 262(5134):754-757. [52] Sayer, R.J., Friedlander, M.J. & Redman, S.J. (1990) The time course and amplitude of EPSPs evoked at synapses between pairs of CA3/CA1 neurons in the hippocampal slice. Journal of Neuroscience 10(3):826-836. [53] Cash, S. & Yuste, R. (1999) Linear summation of excitatory inputs by CA1 pyramidal neurons. Neuron 22(2):383-394. [54] Polsky, A., Mel, B.W. & Schiller, J. (2004) Computational subunits in thin dendrites of pyramidal cells. Nature Neuroscience 7(6):621-627. [55] Paninski, L. (2003) Convergence properties of three spike-triggered analysis techniques. Network: Computation in Neural Systems 14(3):437-464. [56] Okatan, M., Wilson, M.A. & Brown, E.N. (2005) Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity. Neural Computation 17(9):1927-1961. [57] Timme, M. (2007) Revealing network connectivity from response dynamics. Physical Review Letters 98(22):224101. 9

same-paper 2 0.86221546 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

Author: Mikkel Schmidt

Abstract: We present a general Bayesian approach to probabilistic matrix factorization subject to linear constraints. The approach is based on a Gaussian observation model and Gaussian priors with bilinear equality and inequality constraints. We present an efficient Markov chain Monte Carlo inference procedure based on Gibbs sampling. Special cases of the proposed model are Bayesian formulations of nonnegative matrix factorization and factor analysis. The method is evaluated on a blind source separation problem. We demonstrate that our algorithm can be used to extract meaningful and interpretable features that are remarkably different from features extracted using existing related matrix factorization techniques.

3 0.79286838 75 nips-2009-Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Author: Ryan Mcdonald, Mehryar Mohri, Nathan Silberman, Dan Walker, Gideon S. Mann

Abstract: Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.

4 0.74270892 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

Author: Kyomin Jung, Pushmeet Kohli, Devavrat Shah

Abstract: We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF). We present a randomized iterative algorithm based on simple local updates. The algorithm, starting with an arbitrary initial assignment, updates it in each iteration by first, picking a random node, then selecting an (appropriately chosen) random local neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show that this algorithm finds a near optimal assignment within n log 2 n iterations with high probability for any n node pair-wise MRF with geometry (i.e. MRF graph with polynomial growth) with the approximation error depending on (in a reasonable manner) the geometric growth rate of the graph and the average radius of the local neighborhood – this allows for a graceful tradeoff between the complexity of the algorithm and the approximation error. Through extensive simulations, we show that our algorithm finds extremely good approximate solutions for various kinds of MRFs with geometry.

5 0.62296867 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

Author: Steven Chase, Andrew Schwartz, Wolfgang Maass, Robert A. Legenstein

Abstract: The control of neuroprosthetic devices from the activity of motor cortex neurons benefits from learning effects where the function of these neurons is adapted to the control task. It was recently shown that tuning properties of neurons in monkey motor cortex are adapted selectively in order to compensate for an erroneous interpretation of their activity. In particular, it was shown that the tuning curves of those neurons whose preferred directions had been misinterpreted changed more than those of other neurons. In this article, we show that the experimentally observed self-tuning properties of the system can be explained on the basis of a simple learning rule. This learning rule utilizes neuronal noise for exploration and performs Hebbian weight updates that are modulated by a global reward signal. In contrast to most previously proposed reward-modulated Hebbian learning rules, this rule does not require extraneous knowledge about what is noise and what is signal. The learning rule is able to optimize the performance of the model system within biologically realistic periods of time and under high noise levels. When the neuronal noise is fitted to experimental data, the model produces learning effects similar to those found in monkey experiments.

6 0.5964238 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

7 0.57621241 121 nips-2009-Know Thy Neighbour: A Normative Theory of Synaptic Depression

8 0.56848639 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

9 0.5529294 17 nips-2009-A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

10 0.54438812 13 nips-2009-A Neural Implementation of the Kalman Filter

11 0.539823 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

12 0.51983994 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

13 0.50914085 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

14 0.50326467 70 nips-2009-Discriminative Network Models of Schizophrenia

15 0.50128335 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

16 0.49528605 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

17 0.48478654 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

18 0.48471141 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

19 0.48467433 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

20 0.47626993 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity