nips nips2001 nips2001-132 knowledge-graph by maker-knowledge-mining

132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Source: pdf

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Novel iteration schemes for the Cluster Variation Method Hilbert J. [sent-1, score-0.105]

2 nl Abstract The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. [sent-6, score-0.168]

3 We derive two novel iteration schemes for the Cluster Variation Method. [sent-7, score-0.105]

4 One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. [sent-8, score-0.57]

5 The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. [sent-9, score-0.149]

6 We conclude that the methods are of significant practical value for large inference problems. [sent-10, score-0.042]

7 1 Introduction Belief Propagation (BP) is a message passing scheme, which is known to yield exact inference in tree structured graphical models [1]. [sent-11, score-0.316]

8 It has been noted by several authors that Belief Propagation can can also give impressive results for graphs that are not trees [2]. [sent-12, score-0.031]

9 The Cluster Variation Method (CVM), is a method that has been developed in the physics community for approximate inference in the Ising model [3]. [sent-13, score-0.184]

10 The CVM approximates the joint probability distribution by a number of (overlapping) marginal distributions (clusters). [sent-14, score-0.031]

11 The quality of the approximation is determined by the size and number of clusters. [sent-15, score-0.088]

12 When the clusters consist of only two variables, the method is known as the Bethe approximation. [sent-16, score-0.352]

13 Recently, the method has been introduced by Yedidia et a1. [sent-17, score-0.065]

14 [4] into the machine learning community, showing that in the Bethe approximation, the CVM solution coincides with the fixed points of the belief propagation algorithm. [sent-18, score-0.249]

15 For clusters consisting of more than two variables, [4] present a message passing scheme called generalized belief propagation (GBP). [sent-19, score-0.574]

16 This approximation to the free energy is often referred to as the Kikuchi approximation. [sent-20, score-0.279]

17 They show, that GBP gives a significant improvement over the Bethe approximation for a small two dimensional Ising lattice with random couplings. [sent-21, score-0.093]

18 However, for larger latices, both GBP and BP fail to converge [4, 5]. [sent-22, score-0.056]

19 In [5] the CCCP method is proposed, which is a double loop iteration algorithm that is guaranteed to converge for the general CVM problem. [sent-23, score-0.351]

20 Intuitively, the method consists of iteration a sequence of convex subproblem (outer loop) each of which is solved using a fixed point iteration method (inner loop). [sent-24, score-0.468]

21 In this sense, the method is similar to the UPS algorithm of [6] which identifies trees as subproblems. [sent-25, score-0.096]

22 In this paper, we propose two algorithms, one is a fixed point iteration procedure, the other a gradient based method. [sent-26, score-0.283]

23 We show that the fixed point iteration method gives very fast convergence and accurate results for some classical directed graphical models. [sent-27, score-0.432]

24 However, for more challenging cases the fixed point method does not converge and the gradient based approach, which is guaranteed to converge, is preferable. [sent-28, score-0.342]

25 2 The Cluster Variation Method In this section, we briefly present the cluster variation method. [sent-29, score-0.418]

26 When one minimizes FH(P) with respect to P under the constraint of normalization L x P(X) = 1, one obtains PH. [sent-38, score-0.081]

27 Computing marginals of PH such as PH(Xi) or PH(Xi, Xj) involves sums over all states, which is intractable for large n. [sent-39, score-0.245]

28 The cluster variation method replaces the probability distribution PH(X) by a large number of (possibly overlapping) probability distributions , each describing a sub set (cluster) of variables. [sent-41, score-0.528]

29 Due to the one-to-one correspondence between a probability distribution and the minima of a free energy we can define approximate probability distributions by constructing approximate free energies and computing their minimum. [sent-42, score-0.575]

30 The solution is obtained by minimizing this approximate free energy subject to normalization and consistency constraints. [sent-45, score-0.399]

31 Define clusters as subsets of distinct variables: Xa = (XiI' . [sent-46, score-0.287]

32 Consider the set of clusters P that describe the interactions in H and write H as a sum of these interactions: H(x) = Hl(xoJ 2:= a EP We now define a set of clusters B, that will determine our approximation in the cluster variation method. [sent-50, score-1.197]

33 For each cluster a E B, we introduce a probability distribution Pa(xa) which jointly must approximate p(x). [sent-51, score-0.349]

34 B should at least contain the interactions in p(x) in the following way: Va E P => 30:' E B,a c a'. [sent-52, score-0.096]

35 In addition, we demand that no two clusters in B contain each other: a, a' E B => a rt a', a' rt a. [sent-53, score-0.401]

36 The minimal choice for B is to chose clusters from P itself. [sent-54, score-0.287]

37 The maximal choice for B is the cliques obtained when constructing the junction tree[8]. [sent-55, score-0.04]

38 In this case, the clusters in B form a tree structure and the CVM method is exact. [sent-56, score-0.394]

39 Define a set of clusters M that consist of any intersection of a number of clusters of B: M = {,BI,B = nkak, ak E B}, and define U = BuM. [sent-57, score-0.666]

40 Once U is given, we define numbers a/3 recursively by the Moebius formula 1= L ao;, o;EU,o;"J/3 In particular, this shows that ao; = 1 for 0: E V(3 E U B. [sent-58, score-0.146]

41 The Moebius formula allows us to rewrite (H) in terms of the cluster probabilities (H) = Lao; LPo;(xo;)Ho;(xo;), o;EU x" (2) with Ho;(xo;) = L. [sent-59, score-0.332]

42 Since interactions Hh may appear in more than one Ho;, the constants ao; ensure that double counting is compensated for. [sent-61, score-0.097]

43 Whereas (H) can be written exactly in terms of Po;, this is not the case for the entropy term in Eq. [sent-62, score-0.089]

44 The approach is to decompose the entropy of a cluster 0: in terms of 'connected entropies' in the following way: 1 (3) x" /3Co; where the sum over (3 contains all sub clusters of 0:. [sent-64, score-0.699]

45 3) , we have expressed the entropy in terms of cluster probabilities Po; . [sent-69, score-0.367]

46 The quality of this approximation is illustrated in Fig. [sent-70, score-0.088]

47 Note, that both the Bethe and Kikuchi approximation strongly deteriorate around J = 1, which is where the spin-glass phase starts. [sent-72, score-0.084]

48 For J < 1, the Kikuchi approximation is superior to the Bethe approximation. [sent-73, score-0.051]

49 Note, however, that this figure only illustrates the quality of the truncations in Eq. [sent-74, score-0.037]

50 It does not say anything about the accuracy of the approximate marginals using the approximate free energy. [sent-76, score-0.5]

51 1 we obtain the approximate free energy of the Cluster Variation method. [sent-79, score-0.299]

52 This free energy must be minimized subject to normalization constraints L. [sent-80, score-0.364]

53 x" Po; (x o; ) = 1 and consistency constraints Po;(X/3) = P/3(X/3), with Po; (X/3) = L. [sent-81, score-0.154]

54 For instance when a = (i) , S(i) = SIi) is the usual mean field entropy and S(ij) = Sli) + SIj) + Slij) defines the two node correction Slij)" 12 10 8 >a. [sent-85, score-0.222]

55 5 2 J Figure 1: Exact and approximate entropies for the fully connected Boltzmann-Gibbs distribution on n = 10 variables with random couplings (SK model) as a function of mean coupling strength. [sent-89, score-0.259]

56 Couplings Wij are chosen from a normal Gaussian distribution with mean zero and standard deviation J /. [sent-90, score-0.031]

57 External fields ()i are chosen from a normal Gaussian distribution with mean zero and standard deviation 0. [sent-93, score-0.031]

58 The Bethe and Kikuchi approximations are computed using the approximate entropy expression Eq. [sent-97, score-0.212]

59 5 with exact marginals and by choosing B as the set of all pairs and all triplets, respectively. [sent-98, score-0.317]

60 The set of consistency constraints can be significantly reduced because some constraints imply others. [sent-99, score-0.249]

61 • If fJ c fJ' Co: and Pa(x/3') = P/3' (x/3') and Pa(x/3 ) = P/3(x/3), then P/3' (x/3) = P/3 (X/3)' This means that constraints between clusters in M can be removed . [sent-107, score-0.382]

62 fJ c fJ' c 0: , 0:' and Pa(x/3') = Pa' (x/3') and p,,,(x/3) = P/3 (x /3 ), then Pa,(x/3) = P/3 (x/3)' This means that some constraints between clusters in B • If and M can be removed. [sent-108, score-0.382]

63 We denote the remaining necessary constraints by 0: ---t fJ. [sent-109, score-0.095]

64 a(3 (X(3 )) (9) (3 a-t (3 The remaining task is to solve for the Lagrange multipliers such that all constraints (Eq. [sent-115, score-0.229]

65 6 one obtains a system of coupled non-linear equations. [sent-120, score-0.04]

66 [4] a message passing algorithm was proposed to find a solution to this problem. [sent-122, score-0.133]

67 1 Fixed point iteration Consider the constraints Eq. [sent-125, score-0.2]

68 6 for some fixed cluster fJ and all clusters a -+ fJ and define B(3 = {a E Bla -+ fJ }· We wish to solve for all constraints a -+ fJ, with a E B(3 by adjusting ). [sent-126, score-0.88]

69 a(3 , a E B(3 , and consider all other Lagrange multipliers as fixed . [sent-136, score-0.262]

70 a(3 (X(3 ) = - a(3 alB IH(3 (X(3 ) - L A aa do gPa (X(3 ) l + (3 a' with Aaa = l 1 /ja a l - --c=--:- a(3 + IB(3 1 We update the probabilities with the new values of the Lagrange multipliers using Eqs. [sent-146, score-0.174]

71 2 Gradient descent We define an auxiliary cost function C = L LP(3 (X(3 ) log P(3 ((X(3 )) = L Ca(3 a(3 Xf3 Pa x(3 a(3 (11) that is zero when all constraints are satisfied and positive otherwise and minimize this cost function with respect to the Lagrange multipliers ). [sent-150, score-0.321]

72 The gradient of C is given by: 8C Pf3(Xf3 ) ~ log - - - ""' ( Pf3(Xf3 )) - Calf3 ) - ""' (PIaf3' ( ) - Pa ()) ( ~ xf3 xf3 af3 a/-tf3 Pa' Xf3 13 ' +--a with 4 4. [sent-153, score-0.05]

73 1 Numerical results Directed Graphical models We show the performance of the fixed point iteration procedure on several 'real world' directed graphical models. [sent-154, score-0.367]

74 In figure 2a, we plot the exact single node marginals against the approximate marginals for the Asia problem [8]. [sent-155, score-0.684]

75 Convergence was reached in 6 iterations using fixed point iteration. [sent-157, score-0.128]

76 For comparison, we computed the mean field and TAP approximations, as previously introduced by [9]. [sent-160, score-0.082]

77 This is not surprising, since both the MF and TAP approximation are based on single node approximation, whereas the CVM method uses potentials up to size 3. [sent-162, score-0.167]

78 In figure 2b, we plot the exact single node marginals against the approximate CVM marginals for the alarm network [10]. [sent-163, score-0.717]

79 Clusters in B are defined according to the conditional probability tables and maximally contain 5 variables. [sent-169, score-0.034]

80 Convergence was reached in 15 iterations using fixed point iteration. [sent-170, score-0.128]

81 Ordinary loopy BP gives an error in the marginals of approximately 0. [sent-173, score-0.291]

82 Mean field and TAP methods did not give reproducible results on this problem. [sent-175, score-0.051]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cvm', 0.305), ('clusters', 0.287), ('cluster', 0.278), ('fj', 0.263), ('marginals', 0.245), ('pa', 0.203), ('lagrange', 0.177), ('bethe', 0.169), ('po', 0.162), ('variation', 0.14), ('multipliers', 0.134), ('fixed', 0.128), ('nijmegen', 0.121), ('kikuchi', 0.121), ('tap', 0.121), ('energy', 0.115), ('ph', 0.115), ('gbp', 0.115), ('moebius', 0.115), ('free', 0.113), ('xa', 0.113), ('xo', 0.106), ('iteration', 0.105), ('constraints', 0.095), ('define', 0.092), ('entropy', 0.089), ('bp', 0.086), ('aeu', 0.076), ('couplings', 0.076), ('slij', 0.076), ('message', 0.076), ('ho', 0.076), ('exact', 0.072), ('approximate', 0.071), ('graphical', 0.069), ('ao', 0.067), ('directed', 0.065), ('method', 0.065), ('interactions', 0.062), ('propagation', 0.061), ('biophysics', 0.06), ('ising', 0.06), ('mf', 0.06), ('belief', 0.06), ('consistency', 0.059), ('passing', 0.057), ('converge', 0.056), ('ib', 0.056), ('formula', 0.054), ('netherlands', 0.053), ('approximations', 0.052), ('node', 0.051), ('field', 0.051), ('approximation', 0.051), ('fh', 0.05), ('eu', 0.05), ('hh', 0.05), ('co', 0.05), ('gradient', 0.05), ('community', 0.048), ('sh', 0.048), ('sk', 0.048), ('loop', 0.047), ('loopy', 0.046), ('entropies', 0.046), ('logp', 0.045), ('yedidia', 0.045), ('sub', 0.045), ('guaranteed', 0.043), ('significant', 0.042), ('tree', 0.042), ('normalization', 0.041), ('overlapping', 0.04), ('rt', 0.04), ('obtains', 0.04), ('aa', 0.04), ('maximal', 0.04), ('lp', 0.038), ('quality', 0.037), ('double', 0.035), ('variables', 0.035), ('contain', 0.034), ('hl', 0.033), ('alarm', 0.033), ('vm', 0.033), ('aaa', 0.033), ('substitutes', 0.033), ('lpa', 0.033), ('wiegerinck', 0.033), ('deteriorate', 0.033), ('bert', 0.033), ('sii', 0.033), ('ups', 0.033), ('wim', 0.033), ('scheme', 0.033), ('xi', 0.033), ('substituting', 0.032), ('mean', 0.031), ('approximates', 0.031), ('trees', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

2 0.25858638 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This uniﬁed propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

3 0.2295994 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

Abstract: Despite many empirical successes of spectral clustering methodsalgorithms that cluster points using eigenvectors of matrices derived from the data- there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems. 1

4 0.1623909 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

Abstract: We develop a tree-based reparameterization framework that provides a new conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including a new characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy. Our results also have natural extensions to more structured approximations [e.g. , 1, 2]. 1

5 0.16071217 24 nips-2001-Active Information Retrieval

Author: Tommi Jaakkola, Hava T. Siegelmann

Abstract: In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system responds to the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information that can be presented to the user per iteration. 1

6 0.15109366 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

7 0.13373628 180 nips-2001-The Concave-Convex Procedure (CCCP)

8 0.12112883 171 nips-2001-Spectral Relaxation for K-means Clustering

9 0.11210443 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

10 0.080335528 30 nips-2001-Agglomerative Multivariate Information Bottleneck

11 0.078595817 185 nips-2001-The Method of Quantum Clustering

12 0.07840392 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

13 0.077565119 119 nips-2001-Means, Correlations and Bounds

14 0.076971002 21 nips-2001-A Variational Approach to Learning Curves

15 0.076406285 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

16 0.075337455 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images

17 0.070051819 59 nips-2001-Direct value-approximation for factored MDPs

18 0.067759559 58 nips-2001-Covariance Kernels from Bayesian Generative Models

19 0.065617912 36 nips-2001-Approximate Dynamic Programming via Linear Programming

20 0.063346937 196 nips-2001-Very loopy belief propagation for unwrapping phase images

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.194), (1, 0.011), (2, 0.051), (3, -0.184), (4, 0.076), (5, -0.296), (6, -0.036), (7, -0.205), (8, 0.049), (9, 0.086), (10, 0.162), (11, -0.044), (12, -0.01), (13, 0.123), (14, 0.052), (15, 0.077), (16, -0.132), (17, 0.076), (18, -0.054), (19, -0.139), (20, 0.153), (21, -0.232), (22, -0.17), (23, -0.078), (24, 0.077), (25, 0.006), (26, -0.067), (27, 0.051), (28, 0.001), (29, -0.128), (30, -0.094), (31, -0.09), (32, -0.093), (33, -0.027), (34, 0.027), (35, 0.053), (36, 0.03), (37, -0.038), (38, -0.094), (39, -0.051), (40, 0.047), (41, 0.014), (42, 0.01), (43, -0.002), (44, 0.051), (45, -0.037), (46, -0.004), (47, -0.015), (48, 0.024), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97816277 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

2 0.76364893 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

3 0.69274527 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

4 0.58694661 180 nips-2001-The Concave-Convex Procedure (CCCP)

Author: Alan L. Yuille, Anand Rangarajan

Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1

5 0.50840896 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

Author: Andrew Y. Ng, Michael I. Jordan, Yair Weiss

6 0.45790187 196 nips-2001-Very loopy belief propagation for unwrapping phase images

7 0.43836254 24 nips-2001-Active Information Retrieval

8 0.43801305 171 nips-2001-Spectral Relaxation for K-means Clustering

9 0.43356466 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

10 0.40883493 100 nips-2001-Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

11 0.40059233 30 nips-2001-Agglomerative Multivariate Information Bottleneck

12 0.37287918 53 nips-2001-Constructing Distributed Representations Using Additive Clustering

13 0.34005049 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

14 0.3202242 185 nips-2001-The Method of Quantum Clustering

15 0.30558348 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM

16 0.25515825 120 nips-2001-Minimax Probability Machine

17 0.24322887 21 nips-2001-A Variational Approach to Learning Curves

18 0.23583907 59 nips-2001-Direct value-approximation for factored MDPs

19 0.23521084 36 nips-2001-Approximate Dynamic Programming via Linear Programming

20 0.22290689 190 nips-2001-Thin Junction Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.041), (17, 0.043), (19, 0.03), (27, 0.155), (30, 0.042), (36, 0.018), (59, 0.065), (72, 0.092), (79, 0.072), (83, 0.023), (87, 0.188), (91, 0.148)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95052999 154 nips-2001-Products of Gaussians

Author: Christopher Williams, Felix V. Agakov, Stephen N. Felderhof

Abstract: Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. Below we consider PoE models in which each expert is a Gaussian. Although the product of Gaussians is also a Gaussian, if each Gaussian has a simple structure the product can have a richer structure. We examine (1) Products of Gaussian pancakes which give rise to probabilistic Minor Components Analysis, (2) products of I-factor PPCA models and (3) a products of experts construction for an AR(l) process. Recently Hinton (1999) has introduced the Products of Experts (PoE) model in which several individual probabilistic models for data are combined to provide an overall model of the data. In this paper we consider PoE models in which each expert is a Gaussian. It is easy to see that in this case the product model will also be Gaussian. However, if each Gaussian has a simple structure, the product can have a richer structure. Using Gaussian experts is attractive as it permits a thorough analysis of the product architecture, which can be difficult with other models , e.g. models defined over discrete random variables. Below we examine three cases of the products of Gaussians construction: (1) Products of Gaussian pancakes (PoGP) which give rise to probabilistic Minor Components Analysis (MCA), providing a complementary result to probabilistic Principal Components Analysis (PPCA) obtained by Tipping and Bishop (1999); (2) Products of I-factor PPCA models; (3) A products of experts construction for an AR(l) process. Products of Gaussians If each expert is a Gaussian pi(xI8 i ) '

same-paper 2 0.89862216 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

3 0.78490573 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

Author: Marzia Polito, Pietro Perona

Abstract: Locally Linear Embedding (LLE) is an elegant nonlinear dimensionality-reduction technique recently introduced by Roweis and Saul [2]. It fails when the data is divided into separate groups. We study a variant of LLE that can simultaneously group the data and calculate local embedding of each group. An estimate for the upper bound on the intrinsic dimension of the data set is obtained automatically. 1

4 0.78227758 13 nips-2001-A Natural Policy Gradient

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

5 0.7811656 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

Author: Roland Vollgraf, Klaus Obermayer

Abstract: We present a new method for the blind separation of sources, which do not fulfill the independence assumption. In contrast to standard methods we consider groups of neighboring samples (

6 0.7807647 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

7 0.77848756 135 nips-2001-On Spectral Clustering: Analysis and an algorithm

8 0.7745989 8 nips-2001-A General Greedy Approximation Algorithm with Applications

9 0.772632 58 nips-2001-Covariance Kernels from Bayesian Generative Models

10 0.77089751 7 nips-2001-A Dynamic HMM for On-line Segmentation of Sequential Data

11 0.77013469 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

12 0.76995462 44 nips-2001-Blind Source Separation via Multinode Sparse Representation

13 0.76945925 61 nips-2001-Distribution of Mutual Information

14 0.76937973 36 nips-2001-Approximate Dynamic Programming via Linear Programming

15 0.76856732 143 nips-2001-PAC Generalization Bounds for Co-training

16 0.76753402 84 nips-2001-Global Coordination of Local Linear Models

17 0.7673173 188 nips-2001-The Unified Propagation and Scaling Algorithm

18 0.76460379 114 nips-2001-Learning from Infinite Data in Finite Time

19 0.76313394 155 nips-2001-Quantizing Density Estimators

20 0.76257968 121 nips-2001-Model-Free Least-Squares Policy Iteration