nips nips2001 nips2001-97 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari
Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. [sent-10, score-1.279]
2 It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. [sent-12, score-0.412]
3 1 Introduction Recent progress on error-correcting codes has attracted much attention because their decoders, exhibiting performance very close to Shannon's limit, can be implemented as neural networks. [sent-13, score-0.235]
4 Important examples are turbo codes and Gallager codes [1]. [sent-14, score-0.6]
5 They are, however, not exact but approximate, since the associated graphical representations have loops in both cases. [sent-16, score-0.177]
6 An important problem posed is to quantify the effect that comes from the existence of loops in the underlying graph. [sent-17, score-0.154]
7 The so-called TAP approach [4] in statistical physics is an alternative way to formulate the same decoding algorithm [5]. [sent-18, score-0.388]
8 We specifically make use of the information geometry [6] and report a result of perturbation analysis on decoding error of the belief propagation decoder. [sent-21, score-1.054]
9 2 Gallager codes Gallager code is defined by its parity-check matrix A, which has the form A = [C1 | C2 ], (1) where C1 and C2 are K × M and K × K matrices, both of which are taken to be very sparse. [sent-22, score-0.369]
10 We define the generator matrix of the Gallager code to be GT = I (2) −1 C2 C1 where I is the M × M identity matrix. [sent-24, score-0.177]
11 The whole model of communication with the Gallager code is shown in Fig. [sent-26, score-0.089]
12 An information vector s of length M is encoded into a codeword t = GT s mod 2 of length N ≡ K + M. [sent-28, score-0.31]
13 The codeword t is then transmitted over a channel. [sent-29, score-0.095]
14 We assume that the transmission channel is a binary symmetric channel (BSC) with bit-error probability σ . [sent-30, score-0.143]
15 The received vector is then r = t + n mod 2, where n is the noise vector. [sent-31, score-0.187]
16 Decoder tries to find the most probable x satisfying the parity-check equation Ax = z mod 2, (3) AGT s where z ≡ Ar mod 2 is the syndrome vector. [sent-32, score-0.459]
17 If we are successful in finding the true noise vector n, we can reconstruct, from r, the original codeword t by t = r + n mod 2, and then the information vector s. [sent-35, score-0.256]
18 (3) is underdetermined, one has to take into account the prior knowledge of the noise in order to solve it properly. [sent-37, score-0.025]
19 The decoding problem can be cast into the Bayes framework. [sent-38, score-0.388]
20 The prior for x is p(x) = exp β1 · x − N ψ(β) , ψ(β) ≡ log(eβ + e−β ), (4) where 1 is an N -dimensional vector whose elements are all 1, i. [sent-40, score-0.093]
21 β is a parameter which is related with the bit-error probability σ of the transmission channel by σ = 1 (1 − tanh β). [sent-46, score-0.125]
22 2 (5) For the sake of analytical tractability, we assume that the syndrome vector z is observed via another BSC channel with bit-error probability σ (see Fig. [sent-47, score-0.193]
23 This leads K p(z|x) ∝ exp ρ zr xi , (6) r=1 i∈ (r) where (r ) is the set of all indices of nonzero elements in row r of the parity-check matrix A, i. [sent-49, score-0.295]
24 , (r ) ≡ {i | Ari = 1}, and ρ is defined by σ = (1/2)(1 − tanh ρ). [sent-51, score-0.038]
25 If we take the limit ρ → +∞, then we recover the conventional situation of observing the syndrome in a deterministic way. [sent-52, score-0.137]
26 Since experimental findings suggest that it is usually the case for decoding results of Gallager codes to violate no parity-check constraints [3], we believe that making the parity-check constraints soft does not alter essential properties of the problem. [sent-54, score-0.651]
27 ¡ ¡ 3 Decoding The posterior distribution of x for given observed syndrome z is derived from the prior p(x) and the conditional p(z | x) by applying the Bayes formula, and the result is K p(x|z) ∝ exp c0 (x) + ρ cr (x) , (7) r=1 where we let c0 (x) ≡ β1 · x, cr (x) ≡ zr (r = 1, . [sent-55, score-0.486]
28 xi (8) i∈ (r) The objective of decoder of Gallager codes is to obtain the marginal-posterior-mode (MPM) estimate based on the posterior p(x|z): xi = arg max ˆ xi p(x|z). [sent-59, score-0.704]
29 (9) x\x i The MPM estimation provides the Bayes-optimum decoder minimizing expected bit-error probability of the decoding results. [sent-60, score-0.711]
30 However, the marginalization is in general computationally hard, which renders the decoding problem intractable. [sent-61, score-0.509]
31 An approximate decoding algorithm, originally proposed by Gallager himself [1], is known to be efficient in practice. [sent-62, score-0.388]
32 It has been recently rediscovered by MacKay [3] by application of the belief propagation to the underlying graphical model. [sent-63, score-0.525]
33 The decoder implementing the algorithm is called the belief propagation decoder, and is analyzed in the following. [sent-66, score-0.798]
34 We define a family of distributions with a set of parameters ζ = (ζ1 , . [sent-67, score-0.027]
35 , v K ): S = p(x; ζ , v) p(x; ζ , v) = exp ζ · x + v · c(x) − ϕ(ζ , v) , (10) T where c(x) ≡ c1 (x), . [sent-73, score-0.042]
36 The family S includes the factorizable test distribution p0 (x; θ 0 ) (= p(x; θ 0 , 0)), the true posterior p(x|z) (= p(x; β1, ρ1)), and K partial posteriors pr (x; θ r ) (= p(x; θ r , ρer ); er ≡ (0, . [sent-77, score-0.308]
37 r ˆ We then define the expectation parameter η(ζ , v) by η(ζ , v) ≡ x p(x; ζ , v). [sent-84, score-0.032]
38 (9) corresponds to evaluating the expectation parameter of the true posterior. [sent-86, score-0.032]
39 We now introduce the equimarginal family M(θ 0 ) ≡ p(x; ζ , v) η(ζ , v) = η(θ 0 , 0) , (12) and define the marginalization operator as follows: For p ∈ S, ◦ p ≡ θ0 if p ∈ M(θ 0 ). [sent-87, score-0.196]
40 t Horizontal step: Evaluate the marginalization of pr (x; θ r ) ∈ Mr to produce a t based on the current prior information θt and the check z : guess ζ r r r t ζr = t ◦ pr (x; θ r ), r = 1, . [sent-92, score-0.424]
41 , K , (15) and calculate a net contribution (the 'cavity field' [7]) from the check zr by subtracting the prior information: t t t ξr = ζ r − θr . [sent-95, score-0.238]
42 (16) t (It should be noted that (ξ r )i = 0 holds for i ∈ (r ) as it should be, since the constituent decoder with check zr cannot provide any contribution regarding information of xi , i ∈ (r). [sent-96, score-0.644]
43 The desired decoding result η(β1, ρ1) is then approximated by η(θ ∗ , 0), where θ ∗ is the convergent value of {θ t }. [sent-102, score-0.388]
44 4 Information-geometrical characterization of equilibrium Assume that the convergence is achieved and let (θ∗ , ξ ∗ , . [sent-103, score-0.053]
45 , ξ ∗ ) be the equilibrium values K 1 of (θ t , ξ t , . [sent-106, score-0.053]
46 (15) and (16), we have 1 ∗ ◦ pr (x; θ ∗ − ξ r ) = θ ∗ , r = 1, . [sent-111, score-0.104]
47 (19) ∗ This means that p0(x; θ ∗ ) and pr (x; θ ∗ − ξ r ), r = 1, . [sent-115, score-0.104]
48 , K , are equimarginal, that is, ∗ pr (x; θ ∗ − ξ r ) ∈ M(θ ∗ ), r = 1, . [sent-118, score-0.104]
49 , K (20) E(θ ∗ ) p(x|z) M(θ ∗ ) p2 (x; θ ∗ − ξ ∗ ) 2 p1 (x; θ ∗ − ξ ∗ ) 1 M1 M2 ¢ ¡ MK p K (x; θ ∗ − ξ ∗ ) K S M0 p0 (x; θ ∗ ) Figure 2: Geometric structure of belief propagation decoder holds. [sent-121, score-0.798]
50 Another property of the equilibrium is the log-linear relation K ∗ ∗ log pr (x; θ ∗ − ξ r ) − log p0 (x; θ ∗ ) + const. [sent-122, score-0.257]
51 log p(x|z) − log p0 (x; θ ) = (21) r=1 or, in the (ζ , v) coordinate, K (β1, ρ1) − (θ ∗ , 0) = ∗ (θ ∗ − ξ r , ρer ) − (θ ∗ , 0) . [sent-123, score-0.056]
52 (22) r=1 This means that the true posterior p(x|z) belongs to the 'log-linear submanifold' E(θ∗ ), ∗ the affine subspace in the (ζ , v)-coordinate rooted at (θ∗ , 0) and spanned by (−ξ r , ρer ), r = 1, . [sent-124, score-0.056]
53 These two properties do not imply p(x|z) ∈ M(θ∗ ). [sent-128, score-0.028]
54 In fact, if we were to assume, instead of the log-linear relation (21), the linear relation K p(x|z) − p0 (x; θ ∗ ) = ∗ pr (x; ξ r ) − p0 (x; θ ∗ ) , (23) r=1 then we would have p(x|z) ∈ M(θ ∗ ) and thus η(β1, ρ1) = η(θ ∗ , 0). [sent-129, score-0.192]
55 To what degree the log-linear relation deviates from the linear relation determines the decoding error of belief propagation decoder. [sent-131, score-1.01]
56 The structure is best described on the basis of information geometry [6]. [sent-132, score-0.098]
57 Figure 2 illustrates the geometric structure of the belief propagation decoder. [sent-133, score-0.475]
58 It should be noted that the geometrical structure shown here is essentially the same as that for the turbo decoding [8, 9]. [sent-134, score-0.585]
59 5 Main result Based on the information geometry, we have evaluated decoding error, the difference between the true expectation η(β1, ρ1) and its estimate by the belief propagation decoder η(θ ∗ , 0), via perturbation analysis. [sent-135, score-1.276]
60 Taking into account the terms up to second order, we have ρ2 η(β1, ρ1) − η(θ ∗ , 0) = Brs η(θ ∗ , 0) + O(ρ 3 ), (24) 2 r,s;r =s where Brs ≡ ∂ − ∂vr N ∂ ∂θk k g kk Ar k=1 N ∂ − ∂vs j g j j As j=1 ∂ , ∂θ j (25) and ∂ηi (θ ∗ , 0) (26) = Covθ ∗ ,0 x i , cr (x) . [sent-136, score-0.055]
61 ∂vr {Brs } are the elements of the m-embedding curvature tensor of the manifold E(θ∗ ) in S. [sent-137, score-0.061]
62 g ii ≡ 1/(1 − ηii (θ ∗ , 0)2 ) are the diagonal elements of the inverse of the Fisher information of p0 (x; θ ∗ ). [sent-138, score-0.026]
63 This is the generalization of the result obtained for the turbo decoding [8]. [sent-139, score-0.518]
64 If the parity-check matrix A has no two columns with overlap greater than 1, then the principal error term, given in Eq. [sent-144, score-0.26]
65 (24) shows that the principal error term is not coordinate invariant, since the summation with respect to r and s in the right-hand side of Eq. [sent-148, score-0.129]
66 This corresponds to the empirical fact that the performance does depend on the design of the code, that is, the choice of the parity-check matrix A. [sent-150, score-0.045]
67 Explicit evaluation of the principal error term, as in Theorem 1, makes it possible to improve the performance of a code, just in the same way as the perturbational approach to improving the naive mean-field approximation [10, 11, 12, 13, 14, 15, 16, 17]. [sent-151, score-0.101]
68 It is believed [3] that Gallager codes have smaller average probability of decoding error if we avoid any two columns of the parity-check matrix A to have overlap greater than 1. [sent-152, score-0.817]
69 An intuitive explanation to this belief is that such avoidance prevents loops with length 4 from appearing in the graphical representation. [sent-153, score-0.45]
70 Since short loops are expected to do harm in proper functioning of belief propagation, their existence may raise the possibility of decoding errors. [sent-154, score-0.785]
71 Our result supports this belief by showing analytically that the principal term of decoding error vanishes when the parity-check matrix of the code is so sparse and prepared with care so that there are no two columns with overlap greater than 1. [sent-155, score-1.046]
72 Loops with length longer than 4 do not contribute to the decoding error at least via the principal term, but they may have effects via higher-order terms. [sent-156, score-0.516]
73 Our analysis presented here can be extended in a straightforward manner to higher-order perturbation analysis in order to quantify these effects. [sent-157, score-0.085]
74 It should be noted that our approach taken in this paper is different from the common approach to analyzing the properties of the belief propagation decoder in the literature, in that we do not consider ensembles of codes. [sent-158, score-0.902]
75 The statistical-mechanical approach to performance analysis of Gallager-type codes [5] also assumes random ensembles. [sent-162, score-0.235]
76 Our analysis, on the other hand, does not assume ensembles but allows, although asymptotically, performance evaluation of the belief propagation decoder to Gallager codes with any single instance of the parity-check matrix with finite size. [sent-163, score-1.118]
77 Cheng, “Turbo decoding as an instance of Pearl's `belief propagation' algorithm,” IEEE J. [sent-180, score-0.388]
78 MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. [sent-191, score-0.235]
79 Amari, “Information geometrical framework for analyzing belief propagation decoder,” in T. [sent-237, score-0.506]
80 Amari, “Information geometry of turbo codes and low-density paritycheck codes,” submitted to IEEE Trans. [sent-246, score-0.463]
81 Rodriguez, “Boltzmann machine learning using mean field theory and linear response correction,” in M. [sent-261, score-0.076]
82 Tanaka, “A theory of mean field approximation,” in M. [sent-269, score-0.076]
83 Tanaka, “Information geometry of mean-field approximation,” Neural Computation, vol. [sent-277, score-0.098]
84 Yedidia, “An idiosyncratic journey beyond mean field theory,” in M. [sent-283, score-0.032]
85 Wiegerinck, “Mean field theory for graphical models,” in M. [sent-292, score-0.094]
86 Shimokawa, “Information geometry of α-projection in mean field approximation,” in M. [sent-300, score-0.13]
87 Urbanke, “The capacity of low-density parity-check codes under message-passing decodeing,” IEEE Trans. [sent-315, score-0.235]
wordName wordTfidf (topN-words)
[('decoding', 0.388), ('gallager', 0.38), ('decoder', 0.323), ('belief', 0.246), ('codes', 0.235), ('propagation', 0.229), ('mod', 0.161), ('zr', 0.143), ('tanaka', 0.141), ('saad', 0.141), ('syndrome', 0.137), ('turbo', 0.13), ('loops', 0.127), ('marginalization', 0.121), ('pr', 0.104), ('geometry', 0.098), ('bsc', 0.095), ('codeword', 0.095), ('eld', 0.093), ('code', 0.089), ('ikeda', 0.087), ('opper', 0.083), ('brs', 0.082), ('amari', 0.078), ('advanced', 0.077), ('factorizable', 0.071), ('kabashima', 0.071), ('check', 0.07), ('tap', 0.069), ('principal', 0.066), ('gt', 0.061), ('kappen', 0.061), ('perturbation', 0.058), ('channel', 0.056), ('cr', 0.055), ('murayama', 0.055), ('rodriguez', 0.055), ('equilibrium', 0.053), ('graphical', 0.05), ('overlap', 0.05), ('ar', 0.048), ('decoders', 0.048), ('equimarginal', 0.048), ('mpm', 0.048), ('shiro', 0.048), ('field', 0.045), ('matrix', 0.045), ('mackay', 0.045), ('theory', 0.044), ('er', 0.044), ('relation', 0.044), ('japan', 0.043), ('generator', 0.043), ('mr', 0.043), ('exp', 0.042), ('ensembles', 0.04), ('checks', 0.04), ('submanifold', 0.04), ('xi', 0.039), ('vr', 0.038), ('tanh', 0.038), ('noted', 0.036), ('tokyo', 0.036), ('columns', 0.036), ('curvature', 0.035), ('vanishes', 0.035), ('error', 0.035), ('posteriors', 0.033), ('constituent', 0.033), ('mean', 0.032), ('boltzmann', 0.032), ('expectation', 0.032), ('geometrical', 0.031), ('knowing', 0.031), ('transmission', 0.031), ('practice', 0.03), ('mit', 0.03), ('ef', 0.03), ('ne', 0.029), ('posterior', 0.029), ('greater', 0.028), ('log', 0.028), ('properties', 0.028), ('term', 0.028), ('quantify', 0.027), ('spanned', 0.027), ('family', 0.027), ('length', 0.027), ('ensemble', 0.027), ('elements', 0.026), ('received', 0.026), ('prior', 0.025), ('de', 0.024), ('expecting', 0.024), ('ari', 0.024), ('functioning', 0.024), ('compose', 0.024), ('agt', 0.024), ('cheng', 0.024), ('deviates', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari
Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1
2 0.51874119 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari
Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.
3 0.18738823 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
Author: Si Wu, Shun-ichi Amari
Abstract: This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accuracy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is conveyed by the change in recurrent interactions as a result of Hebbian learning. 1
4 0.17992048 196 nips-2001-Very loopy belief propagation for unwrapping phase images
Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic
Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances
5 0.12450356 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
Author: Lehel Csató, Manfred Opper, Ole Winther
Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
6 0.12300797 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
7 0.094641156 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
8 0.084363803 188 nips-2001-The Unified Propagation and Scaling Algorithm
9 0.06464985 119 nips-2001-Means, Correlations and Bounds
10 0.06248612 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
11 0.048760191 21 nips-2001-A Variational Approach to Learning Curves
12 0.0456945 167 nips-2001-Semi-supervised MarginBoost
13 0.045238663 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance
14 0.043533813 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds
15 0.042262152 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
16 0.041152988 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
17 0.039529733 43 nips-2001-Bayesian time series classification
18 0.03852066 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
19 0.038450588 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
20 0.038296994 102 nips-2001-KLD-Sampling: Adaptive Particle Filters
topicId topicWeight
[(0, -0.155), (1, -0.04), (2, -0.023), (3, -0.045), (4, 0.009), (5, -0.301), (6, 0.093), (7, -0.336), (8, 0.307), (9, 0.268), (10, -0.234), (11, -0.068), (12, -0.083), (13, 0.041), (14, 0.019), (15, -0.013), (16, 0.239), (17, -0.141), (18, 0.016), (19, 0.152), (20, -0.012), (21, 0.188), (22, 0.098), (23, -0.023), (24, 0.019), (25, -0.019), (26, 0.05), (27, -0.047), (28, -0.023), (29, 0.039), (30, 0.027), (31, 0.016), (32, 0.098), (33, -0.028), (34, -0.05), (35, 0.018), (36, -0.011), (37, -0.006), (38, 0.062), (39, 0.05), (40, 0.026), (41, 0.042), (42, -0.039), (43, -0.035), (44, -0.017), (45, -0.02), (46, -0.004), (47, -0.002), (48, -0.024), (49, -0.047)]
simIndex simValue paperId paperTitle
1 0.9765867 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari
Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.
same-paper 2 0.97388369 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari
Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1
3 0.61781245 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
Author: Si Wu, Shun-ichi Amari
Abstract: This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accuracy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is conveyed by the change in recurrent interactions as a result of Hebbian learning. 1
4 0.61032873 196 nips-2001-Very loopy belief propagation for unwrapping phase images
Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic
Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances
5 0.39784136 159 nips-2001-Reducing multiclass to binary by coupling probability estimates
Author: B. Zadrozny
Abstract: This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.
6 0.34760416 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
7 0.28801861 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
8 0.26898369 188 nips-2001-The Unified Propagation and Scaling Algorithm
9 0.22646669 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
10 0.21510293 57 nips-2001-Correlation Codes in Neuronal Populations
11 0.18572399 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
12 0.14668876 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
13 0.13959967 143 nips-2001-PAC Generalization Bounds for Co-training
14 0.13863941 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
15 0.13566056 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
16 0.13470413 3 nips-2001-ACh, Uncertainty, and Cortical Inference
17 0.13454646 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
18 0.13402875 120 nips-2001-Minimax Probability Machine
19 0.13048315 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network
20 0.12926236 76 nips-2001-Fast Parameter Estimation Using Green's Functions
topicId topicWeight
[(14, 0.022), (16, 0.313), (17, 0.022), (19, 0.033), (27, 0.175), (30, 0.079), (38, 0.018), (59, 0.054), (72, 0.063), (79, 0.031), (83, 0.015), (88, 0.018), (91, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.83532256 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari
Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1
2 0.68826902 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari
Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.
3 0.62555742 123 nips-2001-Modeling Temporal Structure in Classical Conditioning
Author: Aaron C. Courville, David S. Touretzky
Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1
4 0.57074857 40 nips-2001-Batch Value Function Approximation via Support Vectors
Author: Thomas G. Dietterich, Xin Wang
Abstract: We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily 'adjust the complexity of the function approximator to fit the complexity of the value function. 1
5 0.56811309 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
Author: Michael Collins, S. Dasgupta, Robert E. Schapire
Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.
6 0.55166513 190 nips-2001-Thin Junction Trees
7 0.55113357 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
8 0.55013669 137 nips-2001-On the Convergence of Leveraging
9 0.54067266 60 nips-2001-Discriminative Direction for Kernel Classifiers
10 0.54063153 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines
11 0.53916776 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
12 0.53888226 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
13 0.53863764 8 nips-2001-A General Greedy Approximation Algorithm with Applications
14 0.53826731 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
15 0.5367052 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
16 0.53566855 114 nips-2001-Learning from Infinite Data in Finite Time
17 0.5328033 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade
18 0.53270775 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
19 0.5326069 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
20 0.53249383 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules