nips nips2003 nips2003-193 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
Reference: text
sentIndex sentText sentNum sentScore
1 dk (1) Abstract A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. [sent-6, score-0.774]
2 Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations. [sent-7, score-0.287]
3 The maturity of the field has recently been underpinned by the appearance of the variational Bayes method [2, 3, 4] and associated software making it possible with a window based interface to define and make inference for a diverse range of graphical models [5, 6]. [sent-11, score-0.446]
4 The most important is that it based upon the variational assumption of independent variables. [sent-13, score-0.437]
5 However, if this is not the case, the variational method can grossly underestimate the width of marginal distributions because variance contributions induced by other variables are ignored as a consequence of the assumed independence. [sent-15, score-0.659]
6 Secondly, the variational approximation may be non-convex which is indicated by the occurrence of multiple solutions for the variational distribution. [sent-16, score-0.86]
7 Linear response (LR) is a perturbation technique that gives an improved estimate of the correlations between the stochastic variables by expanding around the solution to variational distribution [8]. [sent-18, score-0.724]
8 This means that we can get non-trivial estimates of correlations from the factorizing variational distribution. [sent-19, score-0.546]
9 Variational calculus is in this paper used to derive a general linear response correction from the variational distribution. [sent-23, score-0.704]
10 It is demonstrated that the variational LR correction can be calculated as systematically the variational distribution in the Variational Bayes framework (albeit at a somewhat higher computational cost). [sent-24, score-0.888]
11 Three applications are given: a model with a quadratic interactions, a Bayesian model for estimation of mean and variance of a 1D Gaussian and a Variational Bayes mixture of multinomials (i. [sent-25, score-0.211]
12 For the two analytically tractable models (the Gaussian and example two above), it is shown that LR gives the correct analytical result where the variational method does not. [sent-28, score-0.465]
13 [5] and references therein, that is performing exact inference for solvable subgraphs, might thus be eliminated by the use of linear response. [sent-31, score-0.144]
14 The objective of a Bayesian analysis are typically the following: to derive the marginal likelihood p(y|M) = ds p(s, y|M) and marginal distributions e. [sent-33, score-0.44]
15 the 1 one-variable pi (si |y) = p(y) k=i dsk p(s, y) and the two-variable pij (si , sj |y) = 1 dsk p(s, y). [sent-35, score-0.555]
16 In this paper, we will only discuss how to derive linear response k=i,j p(y) approximations to marginal distributions. [sent-36, score-0.468]
17 Linear response corrected marginal likelihoods can also be calculated, see Ref. [sent-37, score-0.313]
18 The paper is organized as follows: in section 2 we discuss how to use the marginal likelihood as a generating functions for deriving marginal distributions. [sent-39, score-0.4]
19 In section 4 we use this result to derive the linear response approximation to the two-variable marginals and derive an explict solution of these equations in section 5. [sent-40, score-0.428]
20 In section 6 we discuss why LR in the cases where the variational method gives a reasonable solution will give an even better result. [sent-41, score-0.464]
21 In section 7, we give the three applications and in section we conclude and discuss how to combine the mean field approximation (variational, Bethe,. [sent-42, score-0.127]
22 ) with linear response to give more precise mean field approaches. [sent-45, score-0.264]
23 (8) and furthermore extend linear response to the Bethe approximation, give several general results for the properties of linear response estimates and derive belief propagation algorithms for computing the linear response estimates. [sent-47, score-0.723]
24 2 Generating Marginal Distributions In this section it is shown how exact marginal distributions can be obtained from functional derivatives of a generating function (the log partition function). [sent-50, score-0.274]
25 In the derivation of the variational linear response approximation to the two-variable marginal distribution pij (si , sj |y), we can use result by replacing the exact marginal distribution with the variational approximation. [sent-51, score-1.826]
26 To get marginal distributions we introduce a generating function Z[a] = ds p(s, y)e P i ai (si ) (1) which is a functional of the arbitrary functions ai (si ) and a is shorthand for the vector of functions a = (a1 (s1 ), a2 (s2 ), . [sent-52, score-0.482]
27 We can now obtain the marginal distribution p(si |y, a) by taking the functional derivative1 with respect to ai (si ): eai (si ) δ ln Z[a] = δai (si ) Z[a] 1 s dˆk eak (ˆk ) p(ˆ, y) = pi (si |y, a) . [sent-56, score-0.671]
28 s s k=i δa (s ) The functional derivative is defined by δaj (sj) = δij δ(si − sj ) and the chain rule. [sent-57, score-0.324]
29 This will give us a function that are closely related to the two-variable marginal distribution. [sent-60, score-0.147]
30 In the two next sections, variational approximations to the single variable and two-variable marginals are derived. [sent-62, score-0.536]
31 A prominent method is the variational, where a simpler factorized distribution q(s) = i qi (si ) is used instead of the posterior distribution. [sent-67, score-0.235]
32 Approximations to the marginal distributions pi (si |y) and pij (si , sj |y) are now simply qi (si ) and qi (si )qj (sj ). [sent-68, score-0.961]
33 The purpose of this paper is to show that it is possible within the variational framework to go beyond the factorized distribution for two-variable marginals. [sent-69, score-0.46]
34 For this purpose we need the distribution q(s) which minimizes the KL-divergence or ‘distance’ between q(s) and p(s|y): q(s) KL(q(s)||p(s|y)) = ds q(s) ln . [sent-70, score-0.403]
35 (4) p(s|y) The variational approximation to the Likelihood is obtained from − ln Zv [a] = ds q(s) ln q(s) P = − ln Z[a] + KL(q(s)||p(s|y, a)) , s p(s, y)e k ak (ˆk ) where a has been introduced to be able use qi (si |a) as a generating function. [sent-71, score-1.653]
36 Introducing Lagrange multipliers {λi } as enforce normalization and minimizing KL + i λi ( dsi qi (si ) − 1) with respect to qi (si ) and λi , one finds qi (si |a) = eai (si )+ R Q s dˆi eai (ˆi )+ s k=i {dsk qk (sk |a)} ln p(s,y) R Q s s s k=i {dˆk qk (ˆk |a)} ln p(ˆ,y) . [sent-72, score-1.441]
37 (5) Note that qi (si |a) depends upon all a through the implicit dependence in the q k s appearing on the right hand side. [sent-73, score-0.234]
38 as a factor graph p(s, y) = ψi (si ) ψi,j (si , sj ) . [sent-76, score-0.302]
39 , (6) i i>j it is easy to see that potentials that do not depend upon si will drop out of variational distribution. [sent-79, score-0.947]
40 A similar property will be used below to simplify the variational two-variable marginals. [sent-80, score-0.414]
41 (3) shows that we can obtain the two-variable marginal as the derivative of the marginal distribution. [sent-82, score-0.294]
42 To get the variational linear response approximation we exchange the exact marginal with the variational approximation eq. [sent-83, score-1.33]
43 In section 6 an argument is given for why one can expect the variational approach to work in many cases and why the linear response approximation gives improved estimates of correlations in these cases. [sent-86, score-0.785]
44 Defining the variational ’mean subtracted’ two-variable marginal as Cij (si , sj |a) ≡ δqi (si |a) , δaj (sj ) (7) it is now possible to derive an expression corresponding to eq. [sent-87, score-0.896]
45 What makes the derivation a bit cumbersome is that it necessary to take into account the implicit dependence of aj (sj ) in qk (sk |a) and the result will consequently be expressed as a set of linear integral equations in Cij (si , sj |a). [sent-89, score-0.61]
46 Taking into account both explicit and implicit a dependence we get the variational linear response theorem: Cij (si , sj |a) = δij δ(si − sj )qi (si |a) − qi (si |a)qj (sj |a) +qi (si |a) dsk l=i k=i (8) qk (sk |a)Clj (sl , sj |a) k=i,l × ln p(s, y) − dsi qi (si |a) ln p(s, y) . [sent-91, score-2.774]
47 The first term represents the normal variational correlation estimate and the second term is linear response correction which expresses the coupling between the two-variable marginals. [sent-92, score-0.712]
48 (6), it is easily seen that potentials that do not depend on both si and sl will drop out in the last term. [sent-94, score-0.54]
49 This property will make the calculations for the most variational Bayes models quite simple since this means that one only has to sum over variables that are directly connected in the graphical model. [sent-95, score-0.449]
50 5 Explicit Solution The integral equation can be simplified by introducing the symmetric kernel Kij (s, s ) = (1 − δij ) ln p(s, y) \(i,j) − ln p(s, y) \j − ln p(s, y) \i + ln p(s, y) , where the brackets . [sent-96, score-1.232]
51 q\(i,j) denote expectations over q for all variables, except for si and sj and similarly for . [sent-102, score-0.781]
52 One can easily show that ds qi (s) Kij (s, s ) = 0. [sent-106, score-0.269]
53 Writing C in the form Cij (s, s ) = qi (s)qj (s ) δij δ(s − s ) − 1 + Rij (s, s ) qj (s ) , (9) we obtain an integral equation for the function R Rij (s, s ) = d˜ ql (˜)Kil (s, s)Rlj (˜, s ) + Kij (s, s ) . [sent-107, score-0.311]
54 The integral equation reduces to a system of linear equations ij i j αα for the coefficients Aαα . [sent-114, score-0.203]
55 ij We now discuss the simplest case where Kij (s, s ) = Jij φi (s)φj (s ). [sent-115, score-0.118]
56 Using Rij (s, s ) = Aij φi (s)φj (s ) and augmenting the matrix of Jij ’s with the diagonal elements Jii ≡ − φ1 q yield the solution 2 i Aij = −Jii Jjj D(Jii ) − J−1 ij , (12) where D(Jii ) is a diagonal matrix with entries Jii . [sent-117, score-0.151]
57 To shed more light on this phenomenon, we would like to see how the true partition function, which serves as a generating function for expectations, differs from the mean field one when the approximating mean field distribution q are close. [sent-120, score-0.176]
58 (1) the parameter : Z [a] = ds q(s)e ( P i ai (si )+ln p(s|y)−ln q(s)) (14) which serves as a bookkeeping device for collecting relevant terms, when ln p(s|y)−ln q(s) is assumed to be small. [sent-122, score-0.475]
59 Then expanding the partition function to first order in , we get ln Z [a] = ai (si ) q + ln p(s|y) − ln q(s) ai (si ) q − KL(q||p) q + O( 2 ) (15) i = + O( 2 ) . [sent-124, score-1.068]
60 i Keeping only the linear term, setting = 1 and inserting the minimizing mean field distribution for q yields δ ln Z pi (s|y, a) = = qi (s|a) + O( 2 ) . [sent-125, score-0.689]
61 (16) δai (s) Hence the computation of the correlations via Bij (s, s ) = δ 2 ln Z δpi (s|a) δqi (s|a) = = + O( 2 ) = Cij (s, s ) + O( 2 ) (17) δai (s)δaj (s ) δaj (s ) δaj (s ) can be assumed to incorporate correctly effects of linear order in ln p(s|a) − ln q(s). [sent-126, score-1.02]
62 On the other hand, one should expect p(si , sj |y) − qi (si )qj (sj ) to be order . [sent-127, score-0.491]
63 Although the above does not prove that diagonal correlations are estimated more precisely from C ii (s, s ) than from qi (s)–only that both are correct to linear order in —one often observes this in practice, see below. [sent-128, score-0.336]
64 1 Quadratic Interactions The quadratic interaction model—ln ψij (si , sj ) = si Jij sj and arbitrary ψ(si ), i. [sent-130, score-1.145]
65 ln p(s, y) = i ln ψi (si ) + 1 i=j si Jij sj + constant—is used in many contexts, e. [sent-132, score-1.383]
66 (13) to get si sj − si sj = −(J−1 )ij where we have set Jii = −1/( s2 q i − (18) si 2 ). [sent-136, score-2.076]
67 q We can apply this to the Gaussian model ln ψi (si ) = hi si + Ai s2 /2, The variational i distribution is Gaussian with variance −1/Ai (and covariance zero). [sent-137, score-1.283]
68 The exact marginals have mean −[J−1 h]i and covariance −[J−1 ]ij . [sent-140, score-0.199]
69 The variance estimates are 1/Jii = 1 J= 1 − 1 −1 2 for variational and [J ]ii = 1/(1 − ) for the exact case. [sent-144, score-0.514]
70 The latter diverges for completely correlated variable, → 1 illustrating that the variational covariance estimate breaks down when the interaction between the variables are strong. [sent-145, score-0.52]
71 Choosing non-informative priors—p(µ) flat and p(β) ∝ 1/β—the variational distribution qµ (µ) becomes Gaussian with mean y and variance 1/N β q and qβ (β) becomes a Gamma distribution Γ(β|b, c) ∝ β c−1 e−β/b , with parameters cq = N/2 and 1/bq = N (ˆ 2 + (µ − y)2 q ). [sent-155, score-0.533]
72 The mean and variance of 2 σ Gamma distribution are given by bc and b2 c. [sent-156, score-0.12]
73 −1 A comparison shows that the mean bc is the same in both cases whereas variational underestimates the variance b2 c. [sent-159, score-0.512]
74 This is a quite generic property of the variational approach. [sent-160, score-0.414]
75 Inverting the 2 × 2 matrix J, we immediately get 2 q φ2 = Var(β) = −(J−1 )11 = bq β q /(1 − bq /2 β q ) 1 Inserting the result for β q , we find that this is in fact the correct result. [sent-164, score-0.129]
76 3 Variational Bayes Mixture of Multinomials As a final example, we take a mixture model of practical interest and show that linear response corrections straightforwardly can be calculated. [sent-166, score-0.259]
77 We can model this with a mixture of multinomials (Lars Kai Hansen 2003, in preparation): K π p(yn |π , ρ ) = D k=1 y nj ρkj , πk (20) j=1 where πk is the probability of the kth mixture and ρkj is the probability of observing the jth histogram given we are in the kth component, i. [sent-174, score-0.199]
78 Eventually in the variational Bayes treatment we will introduce Dirichlet priors for the variables. [sent-177, score-0.414]
79 But the general linear response expression is independent of this. [sent-178, score-0.219]
80 To rewrite the model such that it is suitable for a variational treatment—i. [sent-179, score-0.414]
81 in a product form—we introduce hidden (Potts) variables xn = {xnk }, xnk = {0, 1} and k xnk = 1 and write the joint probability of observed and hidden variables as: xnk K π p(yn , xn |π , ρ ) = k=1 D πk y j=1 nj ρkj . [sent-181, score-0.692]
82 We can now identify the interaction terms in n ln p(yn , xn , π , ρ ) as xnk ln πk and ynj xnk ln ρkj . [sent-183, score-1.551]
83 To get the explicit solution we need to write the coupling matrix for the problem and add diagonal terms and invert. [sent-186, score-0.13]
84 However, it turns out that the two variable marginal distributions involving the hidden variables—the number of which scales with the number of examples—can be eliminated analytically. [sent-188, score-0.202]
85 Jρ K xN (22) where for simplicity the log on π and ρ are omitted and (Jρ k xn )jk = ynj . [sent-196, score-0.122]
86 To get the covariance V we introduce diagonal elements into J (which are all tractable in . [sent-198, score-0.13]
87 q ): −(J−1xn )kk xn = xnk xnk − xnk xnk −1 −(Jπ π )kk −1 −(Jρ k ρ k )jj = ln πk ln πk − ln πk ln πk (24) = ln ρkj ln ρkj − ln ρkj ln ρkj (25) and invert: V = −J −1 = δkk xnk − xnk xnk (23) . [sent-204, score-4.195]
88 ρ 8 Conclusion and Outlook In this paper we have shown that it is possible to extend linear response to completely general variational distributions and solve the linear response equations explicitly. [sent-207, score-0.916]
89 that linear response provides approximations of increased quality for two-variable marginals and 2. [sent-209, score-0.341]
90 Together this suggests that building linear response into variational Bayes software such as VIBES [5, 6] would be useful. [sent-211, score-0.633]
91 Welling and Teh [12, 13] have, as mentioned in the introduction, shown how to apply the general linear response methods to the Bethe approximation. [sent-212, score-0.219]
92 (8): Cii (si , si ) = δ(si − si )q(si ) − q(si )q(si ) . [sent-214, score-0.958]
93 Generalizing this idea to general potentials, general mean field approximations, deriving the corresponding marginal likelihoods and deriving guaranteed convergent algorithms for the approximations are under current investigation. [sent-219, score-0.322]
94 Attias, “A variational Bayesian framework for graphical models,” in Advances in Neural Information Processing Systems 12, T. [sent-224, score-0.414]
95 Beal, “Propagation algorithms for variational Bayesian learning,” in Advances in Neural Information Processing Systems 13. [sent-238, score-0.414]
96 Winn, “VIBES: A variational inference engine for Bayesian networks,” in Advances in Neural Information Processing Systems 15, 2002. [sent-245, score-0.446]
97 Winn, “Structured variational distributions in VIBES,” in Artificial Intelligence and Statistics, Key West, Florida, 2003. [sent-249, score-0.447]
98 Rodr´guez, “Efficient learning in Boltzmann machines using ’ linear ı response theory,” Neural Computation, vol. [sent-260, score-0.219]
99 Teh, “Linear response algorithms for approximate inference,” Artificial Intelligence Journal, 2003. [sent-282, score-0.166]
100 Teh, “Propagation rules for linear response estimates of joint pairwise probabilities,” preprint, 2003. [sent-286, score-0.252]
wordName wordTfidf (topN-words)
[('si', 0.479), ('variational', 0.414), ('sj', 0.302), ('ln', 0.301), ('xnk', 0.246), ('qi', 0.189), ('response', 0.166), ('marginal', 0.147), ('jii', 0.132), ('kj', 0.112), ('lr', 0.105), ('qj', 0.094), ('ij', 0.091), ('aj', 0.088), ('jij', 0.082), ('ds', 0.08), ('marginals', 0.08), ('dsk', 0.076), ('rij', 0.07), ('kij', 0.066), ('cij', 0.066), ('jx', 0.066), ('kk', 0.066), ('ai', 0.065), ('xn', 0.065), ('correlations', 0.064), ('bayes', 0.062), ('eld', 0.06), ('qk', 0.06), ('bij', 0.06), ('welling', 0.057), ('pi', 0.057), ('eai', 0.057), ('vibes', 0.057), ('ynj', 0.057), ('teh', 0.055), ('linear', 0.053), ('winther', 0.045), ('multinomials', 0.045), ('mean', 0.045), ('deriving', 0.044), ('pij', 0.044), ('opper', 0.042), ('approximations', 0.042), ('coupling', 0.041), ('mixture', 0.04), ('interactions', 0.039), ('correction', 0.038), ('dsi', 0.038), ('jxx', 0.038), ('winn', 0.038), ('boltzmann', 0.037), ('exact', 0.037), ('covariance', 0.037), ('kl', 0.036), ('generating', 0.035), ('variables', 0.035), ('get', 0.035), ('interaction', 0.034), ('estimates', 0.033), ('derive', 0.033), ('mackay', 0.033), ('bq', 0.033), ('denmark', 0.033), ('hansen', 0.033), ('distributions', 0.033), ('inference', 0.032), ('approximation', 0.032), ('potentials', 0.031), ('equations', 0.031), ('gamma', 0.03), ('sk', 0.03), ('sl', 0.03), ('variance', 0.03), ('diagonal', 0.03), ('serves', 0.029), ('quadratic', 0.028), ('integral', 0.028), ('immediately', 0.028), ('tractable', 0.028), ('histogram', 0.028), ('discuss', 0.027), ('writing', 0.026), ('derivation', 0.026), ('bayesian', 0.026), ('yn', 0.025), ('explicit', 0.024), ('factorized', 0.024), ('gives', 0.023), ('upon', 0.023), ('bethe', 0.023), ('aij', 0.023), ('bc', 0.023), ('kth', 0.023), ('applications', 0.023), ('functional', 0.022), ('distribution', 0.022), ('inserting', 0.022), ('eliminated', 0.022), ('implicit', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 193 nips-2003-Variational Linear Response
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
2 0.19497621 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
Author: Maneesh Sahani
Abstract: Significant plasticity in sensory cortical representations can be driven in mature animals either by behavioural tasks that pair sensory stimuli with reinforcement, or by electrophysiological experiments that pair sensory input with direct stimulation of neuromodulatory nuclei, but usually not by sensory stimuli presented alone. Biologically motivated theories of representational learning, however, have tended to focus on unsupervised mechanisms, which may play a significant role on evolutionary or developmental timescales, but which neglect this essential role of reinforcement in adult plasticity. By contrast, theoretical reinforcement learning has generally dealt with the acquisition of optimal policies for action in an uncertain world, rather than with the concurrent shaping of sensory representations. This paper develops a framework for representational learning which builds on the relative success of unsupervised generativemodelling accounts of cortical encodings to incorporate the effects of reinforcement in a biologically plausible way. 1
3 0.18552759 155 nips-2003-Perspectives on Sparse Bayesian Learning
Author: Jason Palmer, Bhaskar D. Rao, David P. Wipf
Abstract: Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning using a weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing the prior, one for each weight, and then adopts a specific approximation to the full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximation to the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors and leads to natural, intuitive explanations of why sparsity is achieved in practice. 1
4 0.18096314 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach
Author: David Barber, Felix V. Agakov
Abstract: The maximisation of information transmission over noisy channels is a common, albeit generally computationally difficult problem. We approach the difficulty of computing the mutual information for noisy channels by using a variational approximation. The resulting IM algorithm is analagous to the EM algorithm, yet maximises mutual information, as opposed to likelihood. We apply the method to several practical examples, including linear compression, population encoding and CDMA. 1
5 0.17824405 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons
Author: Hsin Chen, Patrice Fleury, Alan F. Murray
Abstract: This paper presents VLSI circuits with continuous-valued probabilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Continuous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomedical data. The Minimising-Contrastive-Divergence learning algorithm for CRBM is also implemented in mixed-mode VLSI, to adapt the noisy neurons’ parameters on-chip. 1
6 0.16382985 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
7 0.14021812 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
8 0.13005349 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
9 0.12843618 100 nips-2003-Laplace Propagation
10 0.12746736 117 nips-2003-Linear Response for Approximate Inference
11 0.12064029 31 nips-2003-Approximate Analytical Bootstrap Averages for Support Vector Classifiers
12 0.093024246 32 nips-2003-Approximate Expectation Maximization
13 0.092194483 163 nips-2003-Probability Estimates for Multi-Class Classification by Pairwise Coupling
14 0.084762558 189 nips-2003-Tree-structured Approximations by Expectation Propagation
15 0.083062828 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion
16 0.081257105 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
17 0.074721664 135 nips-2003-Necessary Intransitive Likelihood-Ratio Classifiers
18 0.073238179 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms
19 0.071024083 179 nips-2003-Sparse Representation and Its Applications in Blind Source Separation
20 0.067956239 60 nips-2003-Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis
topicId topicWeight
[(0, -0.21), (1, -0.051), (2, -0.008), (3, 0.166), (4, 0.237), (5, -0.033), (6, 0.202), (7, -0.142), (8, -0.172), (9, -0.154), (10, -0.094), (11, -0.263), (12, -0.068), (13, -0.086), (14, 0.086), (15, 0.063), (16, -0.073), (17, 0.022), (18, -0.133), (19, -0.064), (20, -0.011), (21, -0.126), (22, 0.172), (23, 0.032), (24, 0.016), (25, 0.134), (26, -0.053), (27, 0.051), (28, -0.051), (29, 0.094), (30, -0.053), (31, -0.073), (32, 0.1), (33, 0.018), (34, 0.003), (35, 0.014), (36, 0.052), (37, 0.073), (38, -0.05), (39, 0.027), (40, 0.021), (41, -0.009), (42, -0.046), (43, 0.028), (44, -0.005), (45, 0.109), (46, -0.035), (47, -0.034), (48, 0.094), (49, -0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.98811382 193 nips-2003-Variational Linear Response
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
2 0.7139709 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach
Author: David Barber, Felix V. Agakov
Abstract: The maximisation of information transmission over noisy channels is a common, albeit generally computationally difficult problem. We approach the difficulty of computing the mutual information for noisy channels by using a variational approximation. The resulting IM algorithm is analagous to the EM algorithm, yet maximises mutual information, as opposed to likelihood. We apply the method to several practical examples, including linear compression, population encoding and CDMA. 1
3 0.56793624 155 nips-2003-Perspectives on Sparse Bayesian Learning
Author: Jason Palmer, Bhaskar D. Rao, David P. Wipf
Abstract: Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning using a weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing the prior, one for each weight, and then adopts a specific approximation to the full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximation to the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors and leads to natural, intuitive explanations of why sparsity is achieved in practice. 1
4 0.55627126 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons
Author: Hsin Chen, Patrice Fleury, Alan F. Murray
Abstract: This paper presents VLSI circuits with continuous-valued probabilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Continuous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomedical data. The Minimising-Contrastive-Divergence learning algorithm for CRBM is also implemented in mixed-mode VLSI, to adapt the noisy neurons’ parameters on-chip. 1
5 0.50699508 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
Author: Maneesh Sahani
Abstract: Significant plasticity in sensory cortical representations can be driven in mature animals either by behavioural tasks that pair sensory stimuli with reinforcement, or by electrophysiological experiments that pair sensory input with direct stimulation of neuromodulatory nuclei, but usually not by sensory stimuli presented alone. Biologically motivated theories of representational learning, however, have tended to focus on unsupervised mechanisms, which may play a significant role on evolutionary or developmental timescales, but which neglect this essential role of reinforcement in adult plasticity. By contrast, theoretical reinforcement learning has generally dealt with the acquisition of optimal policies for action in an uncertain world, rather than with the concurrent shaping of sensory representations. This paper develops a framework for representational learning which builds on the relative success of unsupervised generativemodelling accounts of cortical encodings to incorporate the effects of reinforcement in a biologically plausible way. 1
6 0.47237518 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
7 0.45328861 100 nips-2003-Laplace Propagation
8 0.44586775 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion
9 0.44482097 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
10 0.40448537 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
11 0.38025942 32 nips-2003-Approximate Expectation Maximization
12 0.36416766 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
13 0.30717421 31 nips-2003-Approximate Analytical Bootstrap Averages for Support Vector Classifiers
14 0.30087262 163 nips-2003-Probability Estimates for Multi-Class Classification by Pairwise Coupling
15 0.29164651 135 nips-2003-Necessary Intransitive Likelihood-Ratio Classifiers
16 0.28402096 60 nips-2003-Eigenvoice Speaker Adaptation via Composite Kernel Principal Component Analysis
17 0.27786303 151 nips-2003-PAC-Bayesian Generic Chaining
18 0.2738297 117 nips-2003-Linear Response for Approximate Inference
19 0.27155083 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
20 0.26170981 189 nips-2003-Tree-structured Approximations by Expectation Propagation
topicId topicWeight
[(0, 0.028), (5, 0.011), (11, 0.011), (30, 0.012), (35, 0.042), (53, 0.07), (71, 0.05), (73, 0.025), (76, 0.037), (85, 0.528), (91, 0.054), (99, 0.02)]
simIndex simValue paperId paperTitle
1 0.98699349 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification
Author: Felix A. Wichmann, Arnulf B. Graf
Abstract: We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and shape representation of the faces. The classification performance of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away. 1
same-paper 2 0.98114002 193 nips-2003-Variational Linear Response
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
3 0.97503757 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
4 0.94126815 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
Author: Maxim Likhachev, Geoffrey J. Gordon, Sebastian Thrun
Abstract: In real world planning problems, time for deliberation is often limited. Anytime planners are well suited for these problems: they find a feasible solution quickly and then continually work on improving it until time runs out. In this paper we propose an anytime heuristic search, ARA*, which tunes its performance bound based on available search time. It starts by finding a suboptimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it finds a provably optimal solution. While improving its bound, ARA* reuses previous search efforts and, as a result, is significantly more efficient than other anytime search methods. In addition to our theoretical analysis, we demonstrate the practical utility of ARA* with experiments on a simulated robot kinematic arm and a dynamic path planning problem for an outdoor rover. 1
5 0.89551228 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
Author: Kazuyuki Samejima, Kenji Doya, Yasumasa Ueda, Minoru Kimura
Abstract: When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics of the network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables. In this paper, we apply particle filter for estimating internal parameters and metaparameters of a reinforcement learning model. We verified the effectiveness of the method using both artificial data and real animal behavioral data. 1
6 0.85462236 124 nips-2003-Max-Margin Markov Networks
7 0.80330259 192 nips-2003-Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
8 0.75448948 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
9 0.72165602 3 nips-2003-AUC Optimization vs. Error Rate Minimization
10 0.7164734 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
11 0.68323493 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
12 0.66663474 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
13 0.6527245 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
14 0.64074177 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
15 0.6303885 148 nips-2003-Online Passive-Aggressive Algorithms
16 0.62346447 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
17 0.62324876 41 nips-2003-Boosting versus Covering
18 0.62098801 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
19 0.61679149 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
20 0.61357647 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales