nips nips2005 nips2005-2 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Manfred K. Warmuth
Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. [sent-4, score-0.432]
2 We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. [sent-5, score-0.909]
3 The classical Bayes rule is retained as the special case when the matrices are diagonal. [sent-6, score-0.526]
4 In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. [sent-7, score-0.291]
5 In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). [sent-8, score-1.073]
6 Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. [sent-10, score-1.215]
7 We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. [sent-11, score-1.869]
8 1 Introduction In [TRW05] various on-line updates were generalized from vector parameters to matrix parameters. [sent-12, score-0.304]
9 In this paper we use the same method for deriving a Bayes rule for density matrices (symmetric positive definite matrices of trace one). [sent-14, score-0.933]
10 When the parameters are probability vectors over the set of models, then the “classical” Bayes rule can be derived using the relative entropy as the divergence (e. [sent-15, score-0.355]
11 Analogously we now use the quantum relative entropy, introduced by Umegaki, to derive the generalized Bayes rule. [sent-18, score-0.603]
12 The classical Bayes rule may be seen as a soft maximum calculation. [sent-22, score-0.377]
13 Figure 2: We depict seven iterations of the generalized Bayes rule with the bold NWSE ellipse as the prior density and the bolddashed SE-NW ellipse as data covariance matrix. [sent-23, score-0.924]
14 The posterior density matrices (dashed) gradually move from the prior to the longest axis of the covariance matrix. [sent-24, score-0.503]
15 The new rule uses matrix logarithms and exponentials to avoid the fact that symmetric positive definite matrices are not closed under the matrix product. [sent-25, score-0.854]
16 The rule is strikingly similar to the classical Bayes rule and retains the latter as a special case when the matrices are diagonal. [sent-26, score-0.751]
17 Various cancellations occur when the classical Bayes rule is applied iteratively and similar cancellations happen with the new rule. [sent-27, score-0.472]
18 We shall see that the classical Bayes rule may be seen a soft maximum calculation and the new rule as a soft calculation of the eigenvector with the largest eigenvalue (See figures 1 and 2). [sent-28, score-0.856]
19 The mathematics applied in this paper is most commonly used in quantum physics. [sent-29, score-0.47]
20 For example, the data likelihood becomes a quantum measurement. [sent-30, score-0.502]
21 It is tempting to call the new rule the “quantum Bayes rule”. [sent-31, score-0.225]
22 Also, the term “quantum Bayes rule” has been claimed before in [SBC01] where the classical Bayes rule is used to update probabilities that happen to arise in the context of quantum physics. [sent-34, score-0.837]
23 Our work is most closely related to a paper by Cerf and Adam [CA99] who also give a formula for conditional densities that relies on the matrix exponential and logarithm. [sent-36, score-0.208]
24 However they are interested in the multivariate case (which requires the use of tensors) and their motivation is to obtain a generalization of a conditional quantum entropy. [sent-37, score-0.496]
25 We hope to build on the great body of work done with the classical Bayes rule in the statistics community and therefore believe that this line of research holds great promise. [sent-38, score-0.344]
26 In the classical setup, model Mi is chosen with prior probability P (Mi ) and then Mi generates a datum y with probability P (y|Mi ). [sent-44, score-0.183]
27 j P (Mj )P (y|Mj ) (1) Figure 3: An ellipse S in R2 : The eigenvectors are the directions of the axes and the eigenvalues their lengths. [sent-46, score-0.316]
28 (For unit u, the dyad uu is a degenerate one-dimensional ellipse with its single axis in direction u. [sent-48, score-0.411]
29 ) The solid curve of the ellipse is a plot of Su and the outer dashed figure eight is direction u times the variance u Su. [sent-49, score-0.251]
30 Figure 4: When the ellipse S and T don’t have the same span, then S T lies in the intersection of both spans and is a degenerate ellipse of dimension one (bold line). [sent-51, score-0.424]
31 This generalizes the following intersection property of the matrix product when S and T are both diagonal (here of dimension four): diag(S) diag(T ) diag(ST ) 0 0 0 a 0 0 . [sent-52, score-0.264]
32 3 Density Matrices as Priors We now let our prior D be an arbitrary symmetric positive1 definite matrix of trace one. [sent-56, score-0.474]
33 Such matrices are called density matrices in quantum physics. [sent-57, score-0.96]
34 Any mixture i αi ai ai of dyads ai ai is a density matrix as long as the coefficients αi are non-negative and sum to one. [sent-59, score-0.515]
35 The trace of such a mixture is one because dyads have trace one and i αi = 1. [sent-61, score-0.462]
36 Of course any density matrix D can be decomposed based on an eigensystem. [sent-62, score-0.267]
37 In quantum physics, the dyads are called pure states and density matrices are mixtures over such states. [sent-65, score-0.916]
38 The probability vector (P (Mi )) can be represented as a diagonal matrix diag((P (Mi ))) = i P (Mi ) ei ei , where ei denotes the ith standard basis vector. [sent-67, score-0.349]
39 This means that 1 We use the convention that positive definite matrices have non-negative eigenvalues and strictly positive definite matrices have positive eigenvalues. [sent-68, score-0.664]
40 probability vectors are special density matrices where the eigenvectors are fixed to the standard basis vectors. [sent-69, score-0.392]
41 4 Co-variance Matrices and Basic Notation In this paper we replace the (conditional) data likelihoods P (y|Mi ) by a data covariance matrix D(y|. [sent-70, score-0.25]
42 A covariance matrix S can be depicted as an ellipse {Su : ||u||2 ≤ 1} centered at the origin, where the eigenvectors form the principal axes and the eigenvalues are the lengths of the axes (See Figure 3). [sent-73, score-0.575]
43 Assume S is the covariance matrix of some random cost vector c ∈ Rn , i. [sent-74, score-0.245]
44 Note that a covariance matrix S is diagonal if the components of the cost vector are independent. [sent-77, score-0.281]
45 The variance of the cost vector c along a unit vector u has the form V(c u) = E( c u − E(c u) 2 ) = E( (c − E(c )) u 2 ) = u Su and the variance along an eigenvector is the corresponding eigenvalue (See Figure 3). [sent-78, score-0.289]
46 Thirdly, uT Su is a quantum 2 measurement of the pure state u with an instrument represented by S. [sent-87, score-0.586]
47 The trace tr(A) of a square matrix A is the sum of its diagonal elements Aii . [sent-96, score-0.38]
48 Therefore the trace of a square matrix may be seen as the total variance along any set of orthogonal directions: ui Aui . [sent-102, score-0.443]
49 ui ui A) = tr(A) = tr(IA) = tr( i i In particular, the trace of a square matrix is the sum of its eigenvalues. [sent-103, score-0.402]
50 The matrix exponential exp(S) of the symmetric matrix S = SσS is defined as S exp(σ)S , where exp(σ) is obtained by exponentiating the diagonal entries (eigenvalues). [sent-104, score-0.426]
51 The matrix logarithm log(S) is defined similarly but now S must be strictly positive definite. [sent-105, score-0.285]
52 It is important to remember that exp (S + T ) = exp(S) exp(T ) only holds iff the two symmetric matrices commute2 , i. [sent-107, score-0.33]
53 However, the following trace inequality, known as the Golden-Thompson inequality [Bha97], always holds: tr(exp S exp T ) ≥ tr(exp (S + T )). [sent-110, score-0.246]
54 ) = is chosen with probability δi and i δi di di , then the dyad (or pure state) di di a random variable c di is observed where c has covariance matrix D(y|. [sent-112, score-0.916]
55 2 This occurs iff the two symmetric matrices have the same eigensystem. [sent-114, score-0.29]
56 In our generalization we replace the expected data likelihood P (y) i P (Mi )P (y|Mi ) by the following trace: δi di D(y|. [sent-115, score-0.217]
57 Therefore the above trace is the expected variance along the eigenvectors of the density matrix weighted by the eigenvalues. [sent-124, score-0.584]
58 Curiously enough, this trace computation is a quantum measurement, where D(y|. [sent-125, score-0.631]
59 In the generalized Bayes rule we cannot simply multiply the prior density matrix with the covariance matrix that corresponds to the data likelihood. [sent-128, score-0.877]
60 This is because a product of two symmetric positive definite matrices may be neither symmetric nor positive definite. [sent-129, score-0.539]
61 Instead we define the operation on the cone of symmetric positive definite matrices. [sent-130, score-0.206]
62 We begin by defining this operation for the case when the matrices S and T are strictly positive definite (and symmetric): S T := exp(log S + log T ). [sent-131, score-0.397]
63 (3) The matrix log of both matrices produces symmetric matrices that sum to a symmetric matrix. [sent-132, score-0.785]
64 Finally the matrix exponential of the sum produces again a symmetric positive matrix. [sent-133, score-0.306]
65 Note that the matrix log is not defined when the matrix has a zero eigenvalue. [sent-134, score-0.346]
66 However for arbitrary symmetric positive definite matrices one can define the operation as the following limit: S T := lim (S 1/n T 1/n )n . [sent-135, score-0.388]
67 n→∞ This limit is the Lie Product Formula [Bha97] when S and T are both strictly positive, but it exists even if the matrices don’t have full rank and by Theorem 1. [sent-136, score-0.235]
68 B ∈ Rn×k , B T B = Ik , and range(B) = range(S) ∩ range(T )) and that log+ denotes the modified matrix logarithm that takes logs of the non-zero eigenvalues but leaves zero eigenvalues unchanged. [sent-140, score-0.327]
69 (4) When both matrices have the same eigensystem, then becomes the matrix product. [sent-142, score-0.323]
70 One can show that is associative, commutative, has the identity matrix I as its neutral element and for any strictly positive definite and symmetric matrix S, S S −1 = I. [sent-143, score-0.5]
71 Using this new product operation, the generalized Bayes rule becomes: D(. [sent-145, score-0.352]
72 )) (5) Normalizing by the trace assures that the trace of the posterior density matrix is one. [sent-151, score-0.613]
73 „1 0 « Assume the prior density matrix is the circle D(. [sent-153, score-0.331]
74 ) = and the data 0 1 « 2 „ « „ 0 0 1 −1 covariance matrix the degenerate NE-SW ellipse D(y|. [sent-154, score-0.433]
75 In particular, when the prior is diag((P (Mi ))) and the covariance matrix diag((P (y|Mi )), then the new rule realizes the classical rule and computes diag((P (Mi |y)). [sent-169, score-0.854]
76 In the case of the generalized Bayes rule, the expected variance only upper bounds the normalization factor via the Golden-Thompsen inequality (2): tr(D(. [sent-172, score-0.228]
77 (6) The classical Bayes rule can be applied iteratively to a sequence of data and various cancellations occur. [sent-177, score-0.408]
78 , y1 )) (7) Finally, the product of the expected variance for both trials combine in a similar way, except that in the generalized case the equality becomes an inequality: tr(D(. [sent-192, score-0.237]
79 6 The Derivation of the Generalized Bayes Rule The classical Bayes rule can be derived4 by minimizing a relative entropy to the prior plus a convex combination of the log losses of the models (See e. [sent-205, score-0.602]
80 The negative entropy ameliorates the maximum calculation and pulls the optimal solution towards the prior. [sent-211, score-0.233]
81 By introducing a Lagrange multiplier for the remaining constraint and differentiating, (y|Mi ) ∗ we obtain the solution γi = PP (Mi )P)P (y|Mj ) , which is the classical Bayes rule (1). [sent-213, score-0.394]
82 Notice that this is minus the logarithm of the normalization of the Bayes rule (1) and is also the log loss associated the standard Bayesian setup. [sent-215, score-0.323]
83 To derive the new generalized Bayes rule in an analogous way, we use the quantum physics generalizations of the relative entropy between two densities G and D (due to Umegaki): tr(G(log G − log D)). [sent-216, score-0.989]
84 We also need to replace the mixture of negative log likelihoods by the trace −tr(G log D(y|. [sent-217, score-0.351]
85 Now the matrix parameter G is constrained to be a density matrix and the minimization problem becomes5 : G inf tr(G(log G − log D(. [sent-219, score-0.472]
86 Except for the quantum relative entropy term, the argument of the infimum is again linear in the variable G and is minimized when G is a single dyad uu , where u is the eigenvector belonging to maximum eigenvalue of the matrix log D(y|. [sent-223, score-1.002]
87 The linear term pulls G toward a direction of high variance of this matrix, whereas the quantum relative entropy pulls G toward the prior density matrix. [sent-225, score-1.041]
88 The density matrix constraint requires the eigenvalues of G to be non-negative and the trace to G to be one. [sent-226, score-0.504]
89 Again by introducing a Lagrange multiplier for the remaining trace constraint and differentiating (following [TRW05]), we arrive at a formula for the optimum G ∗ which coincides with the formula for the D(. [sent-228, score-0.32]
90 |y) given in the generalized Bayes rule (5), where is defined6 as in (3). [sent-229, score-0.325]
91 Since the quantum relative entropy is strictly convex [NC00] in G, the optimum G ∗ is unique. [sent-230, score-0.68]
92 6 With some work, one can also derive the Bayes rule with the fancier operation (4). [sent-235, score-0.266]
93 5 7 Conclusion Our generalized Bayes rule suggests a definition of conditional density matrices and we are currently developing a calculus for such matrices. [sent-236, score-0.659]
94 In particular, a common formalism is needed that includes the multivariate conditional density matrices defined in [CA99] based on tensors. [sent-237, score-0.334]
95 e square matrices in Cn×n T for which S = S = S ∗ . [sent-240, score-0.224]
96 Now both the prior density matrix and the data covariance matrix must be Hermitian instead of symmetric. [sent-241, score-0.552]
97 The generalized Bayes rule for symmetric positive definite matrices relies on computing eigendecompositions (Ω(n3 ) time). [sent-242, score-0.672]
98 Hopefully, there exist O(n2 ) versions of the update that approximate the generalized Bayes rule sufficiently well. [sent-243, score-0.348]
99 In preliminary research we showed that one can maintain a density matrix over the base experts instead and derive updates similar to the generalized Bayes rule given in this paper. [sent-247, score-0.672]
100 Most importantly, the bounds generalize to the case when mixtures over experts are replaced by density matrices. [sent-248, score-0.214]
wordName wordTfidf (topN-words)
[('quantum', 0.47), ('mi', 0.431), ('tr', 0.277), ('bayes', 0.268), ('rule', 0.225), ('matrices', 0.182), ('trace', 0.161), ('ellipse', 0.152), ('matrix', 0.141), ('su', 0.133), ('density', 0.126), ('di', 0.12), ('classical', 0.119), ('symmetric', 0.108), ('dyads', 0.107), ('generalized', 0.1), ('entropy', 0.097), ('covariance', 0.08), ('eigenvalues', 0.076), ('calculation', 0.072), ('log', 0.064), ('cancellations', 0.064), ('dyad', 0.064), ('pulls', 0.064), ('umegaki', 0.064), ('diag', 0.064), ('prior', 0.064), ('degenerate', 0.06), ('positive', 0.057), ('mj', 0.057), ('uu', 0.056), ('strictly', 0.053), ('instrument', 0.051), ('eigenvectors', 0.05), ('mum', 0.047), ('variance', 0.047), ('inequality', 0.045), ('argmaxi', 0.043), ('cerf', 0.043), ('curiously', 0.043), ('eigensystem', 0.043), ('erentiating', 0.043), ('hermitian', 0.043), ('square', 0.042), ('operation', 0.041), ('formula', 0.041), ('experts', 0.041), ('exp', 0.04), ('nite', 0.04), ('eigenvalue', 0.04), ('range', 0.04), ('expert', 0.039), ('updates', 0.039), ('axes', 0.038), ('ei', 0.038), ('intersection', 0.038), ('eigenvector', 0.037), ('expected', 0.036), ('diagonal', 0.036), ('logarithm', 0.034), ('basis', 0.034), ('manfred', 0.034), ('measurement', 0.034), ('relative', 0.033), ('mixture', 0.033), ('soft', 0.033), ('likelihood', 0.032), ('kivinen', 0.032), ('warmuth', 0.032), ('pure', 0.031), ('au', 0.03), ('ellipses', 0.03), ('replace', 0.029), ('ui', 0.029), ('exponentiated', 0.028), ('direction', 0.028), ('ai', 0.027), ('axis', 0.027), ('product', 0.027), ('equality', 0.027), ('optimum', 0.027), ('conditional', 0.026), ('analogously', 0.026), ('multiplier', 0.026), ('bold', 0.025), ('posteriors', 0.025), ('toward', 0.024), ('unit', 0.024), ('ab', 0.024), ('outer', 0.024), ('sake', 0.024), ('vector', 0.024), ('introducing', 0.024), ('posterior', 0.024), ('replaced', 0.024), ('update', 0.023), ('physical', 0.023), ('along', 0.023), ('generalize', 0.023), ('dimension', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 2 nips-2005-A Bayes Rule for Density Matrices
Author: Manfred K. Warmuth
Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1
2 0.13586335 192 nips-2005-The Information-Form Data Association Filter
Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun
Abstract: This paper presents a new filter for online data association problems in high-dimensional spaces. The key innovation is a representation of the data association posterior in information form, in which the “proximity” of objects and tracks are expressed by numerical links. Updating these links requires linear time, compared to exponential time required for computing the exact posterior probabilities. The paper derives the algorithm formally and provides comparative results using data obtained by a real-world camera array and by a large-scale sensor network simulation.
3 0.10142334 70 nips-2005-Fast Information Value for Graphical Models
Author: Brigham S. Anderson, Andrew W. Moore
Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.
4 0.098390438 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
Author: Manfred Opper
Abstract: The problem of computing a resample estimate for the reconstruction error in PCA is reformulated as an inference problem with the help of the replica method. Using the expectation consistent (EC) approximation, the intractable inference problem can be solved efficiently using only two variational parameters. A perturbative correction to the result is computed and an alternative simplified derivation is also presented. 1
5 0.080790944 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
Author: Jian Zhang, Zoubin Ghahramani, Yiming Yang
Abstract: We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters are assumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, our model enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multi-label text classification data sets show that the proposed approach is promising.
6 0.076193571 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
7 0.069610223 105 nips-2005-Large-Scale Multiclass Transduction
8 0.069545113 189 nips-2005-Tensor Subspace Analysis
9 0.06950549 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts
10 0.069134288 140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior
11 0.066040196 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
12 0.065878607 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models
13 0.064467907 98 nips-2005-Infinite latent feature models and the Indian buffet process
14 0.064124085 136 nips-2005-Noise and the two-thirds power Law
15 0.063034937 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
16 0.062154263 58 nips-2005-Divergences, surrogate loss functions and experimental design
17 0.059498284 126 nips-2005-Metric Learning by Collapsing Classes
18 0.058063783 47 nips-2005-Consistency of one-class SVM and related algorithms
19 0.053362351 114 nips-2005-Learning Rankings via Convex Hull Separation
20 0.052373309 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
topicId topicWeight
[(0, 0.181), (1, 0.061), (2, -0.025), (3, -0.011), (4, 0.047), (5, -0.085), (6, -0.107), (7, -0.055), (8, -0.013), (9, -0.047), (10, 0.033), (11, -0.018), (12, 0.052), (13, 0.052), (14, 0.019), (15, 0.07), (16, -0.114), (17, -0.034), (18, -0.048), (19, 0.04), (20, -0.002), (21, 0.076), (22, 0.11), (23, 0.049), (24, 0.105), (25, 0.106), (26, 0.145), (27, 0.0), (28, -0.032), (29, 0.054), (30, 0.003), (31, 0.003), (32, -0.028), (33, -0.124), (34, 0.115), (35, 0.028), (36, -0.174), (37, -0.183), (38, 0.0), (39, -0.09), (40, 0.033), (41, -0.163), (42, -0.042), (43, -0.024), (44, 0.016), (45, 0.035), (46, -0.11), (47, 0.045), (48, -0.238), (49, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.97804213 2 nips-2005-A Bayes Rule for Density Matrices
Author: Manfred K. Warmuth
Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1
2 0.67996353 192 nips-2005-The Information-Form Data Association Filter
Author: Brad Schumitsch, Sebastian Thrun, Gary Bradski, Kunle Olukotun
Abstract: This paper presents a new filter for online data association problems in high-dimensional spaces. The key innovation is a representation of the data association posterior in information form, in which the “proximity” of objects and tracks are expressed by numerical links. Updating these links requires linear time, compared to exponential time required for computing the exact posterior probabilities. The paper derives the algorithm formally and provides comparative results using data obtained by a real-world camera array and by a large-scale sensor network simulation.
3 0.55463839 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
Author: Manfred Opper
Abstract: The problem of computing a resample estimate for the reconstruction error in PCA is reformulated as an inference problem with the help of the replica method. Using the expectation consistent (EC) approximation, the intractable inference problem can be solved efficiently using only two variational parameters. A perturbative correction to the result is computed and an alternative simplified derivation is also presented. 1
4 0.45068529 189 nips-2005-Tensor Subspace Analysis
Author: Xiaofei He, Deng Cai, Partha Niyogi
Abstract: Previous work has demonstrated that the image variations of many objects (human faces in particular) under variable lighting can be effectively modeled by low dimensional linear spaces. The typical linear subspace learning algorithms include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Locality Preserving Projection (LPP). All of these methods consider an n1 × n2 image as a high dimensional vector in Rn1 ×n2 , while an image represented in the plane is intrinsically a matrix. In this paper, we propose a new algorithm called Tensor Subspace Analysis (TSA). TSA considers an image as the second order tensor in Rn1 ⊗ Rn2 , where Rn1 and Rn2 are two vector spaces. The relationship between the column vectors of the image matrix and that between the row vectors can be naturally characterized by TSA. TSA detects the intrinsic local geometrical structure of the tensor space by learning a lower dimensional tensor subspace. We compare our proposed approach with PCA, LDA and LPP methods on two standard databases. Experimental results demonstrate that TSA achieves better recognition rate, while being much more efficient. 1
5 0.44648007 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
Author: Gilles Blanchard, Masashi Sugiyama, Motoaki Kawanabe, Vladimir Spokoiny, Klaus-Robert Müller
Abstract: We propose a new linear method for dimension reduction to identify nonGaussian components in high dimensional data. Our method, NGCA (non-Gaussian component analysis), uses a very general semi-parametric framework. In contrast to existing projection methods we define what is uninteresting (Gaussian): by projecting out uninterestingness, we can estimate the relevant non-Gaussian subspace. We show that the estimation error of finding the non-Gaussian components tends to zero at a parametric rate. Once NGCA components are identified and extracted, various tasks can be applied in the data analysis process, like data visualization, clustering, denoising or classification. A numerical study demonstrates the usefulness of our method. 1
6 0.40777442 86 nips-2005-Generalized Nonnegative Matrix Approximations with Bregman Divergences
7 0.39984578 205 nips-2005-Worst-Case Bounds for Gaussian Process Models
8 0.39916781 140 nips-2005-Nonparametric inference of prior probabilities from Bayes-optimal behavior
9 0.38274994 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction
10 0.37966937 105 nips-2005-Large-Scale Multiclass Transduction
11 0.37600249 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
12 0.34897593 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
13 0.34553337 70 nips-2005-Fast Information Value for Graphical Models
14 0.34420583 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
15 0.33725527 62 nips-2005-Efficient Estimation of OOMs
16 0.33197954 71 nips-2005-Fast Krylov Methods for N-Body Learning
17 0.33109462 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions
18 0.32834306 167 nips-2005-Robust design of biological experiments
19 0.32523823 133 nips-2005-Nested sampling for Potts models
20 0.32400903 30 nips-2005-Assessing Approximations for Gaussian Process Classification
topicId topicWeight
[(3, 0.034), (10, 0.021), (27, 0.024), (31, 0.03), (34, 0.595), (50, 0.014), (55, 0.042), (69, 0.025), (73, 0.022), (77, 0.017), (88, 0.045), (91, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.98961085 2 nips-2005-A Bayes Rule for Density Matrices
Author: Manfred K. Warmuth
Abstract: The classical Bayes rule computes the posterior model probability from the prior probability and the data likelihood. We generalize this rule to the case when the prior is a density matrix (symmetric positive definite and trace one) and the data likelihood a covariance matrix. The classical Bayes rule is retained as the special case when the matrices are diagonal. In the classical setting, the calculation of the probability of the data is an expected likelihood, where the expectation is over the prior distribution. In the generalized setting, this is replaced by an expected variance calculation where the variance is computed along the eigenvectors of the prior density matrix and the expectation is over the eigenvalues of the density matrix (which form a probability vector). The variances along any direction is determined by the covariance matrix. Curiously enough this expected variance calculation is a quantum measurement where the co-variance matrix specifies the instrument and the prior density matrix the mixture state of the particle. We motivate both the classical and the generalized Bayes rule with a minimum relative entropy principle, where the Kullbach-Leibler version gives the classical Bayes rule and Umegaki’s quantum relative entropy the new Bayes rule for density matrices. 1
2 0.97784573 12 nips-2005-A PAC-Bayes approach to the Set Covering Machine
Author: François Laviolette, Mario Marchand, Mohak Shah
Abstract: We design a new learning algorithm for the Set Covering Machine from a PAC-Bayes perspective and propose a PAC-Bayes risk bound which is minimized for classifiers achieving a non trivial margin-sparsity trade-off. 1
3 0.96934134 10 nips-2005-A General and Efficient Multiple Kernel Learning Algorithm
Author: Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer
Abstract: While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lankriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constraint quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm helps for automatic model selection, improving the interpretability of the learning result and works for hundred thousands of examples or hundreds of kernels to be combined. 1
4 0.96883237 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
Author: Andreas Argyriou, Mark Herbster, Massimiliano Pontil
Abstract: A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. For each of these graphs we associate a basic graph kernel. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the ‘k’ in nearest neighbors. 1
5 0.946854 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning
Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun
Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
6 0.81920719 44 nips-2005-Computing the Solution Path for the Regularized Support Vector Regression
7 0.78135997 57 nips-2005-Distance Metric Learning for Large Margin Nearest Neighbor Classification
8 0.76842618 114 nips-2005-Learning Rankings via Convex Hull Separation
9 0.76152194 123 nips-2005-Maximum Margin Semi-Supervised Learning for Structured Variables
10 0.76031977 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
11 0.75829315 31 nips-2005-Asymptotics of Gaussian Regularized Least Squares
12 0.75459373 105 nips-2005-Large-Scale Multiclass Transduction
13 0.74527013 77 nips-2005-From Lasso regression to Feature vector machine
14 0.73626018 184 nips-2005-Structured Prediction via the Extragradient Method
15 0.72355878 58 nips-2005-Divergences, surrogate loss functions and experimental design
16 0.72332728 50 nips-2005-Convex Neural Networks
17 0.7182526 126 nips-2005-Metric Learning by Collapsing Classes
18 0.71813285 47 nips-2005-Consistency of one-class SVM and related algorithms
19 0.70823473 138 nips-2005-Non-Local Manifold Parzen Windows
20 0.70737904 195 nips-2005-Transfer learning for text classification