nips nips2001 nips2001-97 knowledge-graph by maker-knowledge-mining

97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Source: pdf

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. [sent-10, score-1.279]

2 It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. [sent-12, score-0.412]

3 1 Introduction Recent progress on error-correcting codes has attracted much attention because their decoders, exhibiting performance very close to Shannon's limit, can be implemented as neural networks. [sent-13, score-0.235]

4 Important examples are turbo codes and Gallager codes [1]. [sent-14, score-0.6]

5 They are, however, not exact but approximate, since the associated graphical representations have loops in both cases. [sent-16, score-0.177]

6 An important problem posed is to quantify the effect that comes from the existence of loops in the underlying graph. [sent-17, score-0.154]

7 The so-called TAP approach [4] in statistical physics is an alternative way to formulate the same decoding algorithm [5]. [sent-18, score-0.388]

8 We speciﬁcally make use of the information geometry [6] and report a result of perturbation analysis on decoding error of the belief propagation decoder. [sent-21, score-1.054]

9 2 Gallager codes Gallager code is deﬁned by its parity-check matrix A, which has the form A = [C1 | C2 ], (1) where C1 and C2 are K × M and K × K matrices, both of which are taken to be very sparse. [sent-22, score-0.369]

10 We deﬁne the generator matrix of the Gallager code to be GT = I (2) −1 C2 C1 where I is the M × M identity matrix. [sent-24, score-0.177]

11 The whole model of communication with the Gallager code is shown in Fig. [sent-26, score-0.089]

12 An information vector s of length M is encoded into a codeword t = GT s mod 2 of length N ≡ K + M. [sent-28, score-0.31]

13 The codeword t is then transmitted over a channel. [sent-29, score-0.095]

14 We assume that the transmission channel is a binary symmetric channel (BSC) with bit-error probability σ . [sent-30, score-0.143]

15 The received vector is then r = t + n mod 2, where n is the noise vector. [sent-31, score-0.187]

16 Decoder tries to ﬁnd the most probable x satisfying the parity-check equation Ax = z mod 2, (3) AGT s where z ≡ Ar mod 2 is the syndrome vector. [sent-32, score-0.459]

17 If we are successful in ﬁnding the true noise vector n, we can reconstruct, from r, the original codeword t by t = r + n mod 2, and then the information vector s. [sent-35, score-0.256]

18 (3) is underdetermined, one has to take into account the prior knowledge of the noise in order to solve it properly. [sent-37, score-0.025]

19 The decoding problem can be cast into the Bayes framework. [sent-38, score-0.388]

20 The prior for x is p(x) = exp β1 · x − N ψ(β) , ψ(β) ≡ log(eβ + e−β ), (4) where 1 is an N -dimensional vector whose elements are all 1, i. [sent-40, score-0.093]

21 β is a parameter which is related with the bit-error probability σ of the transmission channel by σ = 1 (1 − tanh β). [sent-46, score-0.125]

22 2 (5) For the sake of analytical tractability, we assume that the syndrome vector z is observed via another BSC channel with bit-error probability σ (see Fig. [sent-47, score-0.193]

23 This leads K p(z|x) ∝ exp ρ zr xi , (6) r=1 i∈ (r) where (r ) is the set of all indices of nonzero elements in row r of the parity-check matrix A, i. [sent-49, score-0.295]

24 , (r ) ≡ {i | Ari = 1}, and ρ is deﬁned by σ = (1/2)(1 − tanh ρ). [sent-51, score-0.038]

25 If we take the limit ρ → +∞, then we recover the conventional situation of observing the syndrome in a deterministic way. [sent-52, score-0.137]

26 Since experimental ﬁndings suggest that it is usually the case for decoding results of Gallager codes to violate no parity-check constraints [3], we believe that making the parity-check constraints soft does not alter essential properties of the problem. [sent-54, score-0.651]

27 ¡ ¡ 3 Decoding The posterior distribution of x for given observed syndrome z is derived from the prior p(x) and the conditional p(z | x) by applying the Bayes formula, and the result is K p(x|z) ∝ exp c0 (x) + ρ cr (x) , (7) r=1 where we let c0 (x) ≡ β1 · x, cr (x) ≡ zr (r = 1, . [sent-55, score-0.486]

28 xi (8) i∈ (r) The objective of decoder of Gallager codes is to obtain the marginal-posterior-mode (MPM) estimate based on the posterior p(x|z): xi = arg max ˆ xi p(x|z). [sent-59, score-0.704]

29 (9) x\x i The MPM estimation provides the Bayes-optimum decoder minimizing expected bit-error probability of the decoding results. [sent-60, score-0.711]

30 However, the marginalization is in general computationally hard, which renders the decoding problem intractable. [sent-61, score-0.509]

31 An approximate decoding algorithm, originally proposed by Gallager himself [1], is known to be efﬁcient in practice. [sent-62, score-0.388]

32 It has been recently rediscovered by MacKay [3] by application of the belief propagation to the underlying graphical model. [sent-63, score-0.525]

33 The decoder implementing the algorithm is called the belief propagation decoder, and is analyzed in the following. [sent-66, score-0.798]

34 We deﬁne a family of distributions with a set of parameters ζ = (ζ1 , . [sent-67, score-0.027]

35 , v K ): S = p(x; ζ , v) p(x; ζ , v) = exp ζ · x + v · c(x) − ϕ(ζ , v) , (10) T where c(x) ≡ c1 (x), . [sent-73, score-0.042]

36 The family S includes the factorizable test distribution p0 (x; θ 0 ) (= p(x; θ 0 , 0)), the true posterior p(x|z) (= p(x; β1, ρ1)), and K partial posteriors pr (x; θ r ) (= p(x; θ r , ρer ); er ≡ (0, . [sent-77, score-0.308]

37 r ˆ We then deﬁne the expectation parameter η(ζ , v) by η(ζ , v) ≡ x p(x; ζ , v). [sent-84, score-0.032]

38 (9) corresponds to evaluating the expectation parameter of the true posterior. [sent-86, score-0.032]

39 We now introduce the equimarginal family M(θ 0 ) ≡ p(x; ζ , v) η(ζ , v) = η(θ 0 , 0) , (12) and deﬁne the marginalization operator as follows: For p ∈ S, ◦ p ≡ θ0 if p ∈ M(θ 0 ). [sent-87, score-0.196]

40 t Horizontal step: Evaluate the marginalization of pr (x; θ r ) ∈ Mr to produce a t based on the current prior information θt and the check z : guess ζ r r r t ζr = t ◦ pr (x; θ r ), r = 1, . [sent-92, score-0.424]

41 , K , (15) and calculate a net contribution (the 'cavity ﬁeld' [7]) from the check zr by subtracting the prior information: t t t ξr = ζ r − θr . [sent-95, score-0.238]

42 (16) t (It should be noted that (ξ r )i = 0 holds for i ∈ (r ) as it should be, since the constituent decoder with check zr cannot provide any contribution regarding information of xi , i ∈ (r). [sent-96, score-0.644]

43 The desired decoding result η(β1, ρ1) is then approximated by η(θ ∗ , 0), where θ ∗ is the convergent value of {θ t }. [sent-102, score-0.388]

44 4 Information-geometrical characterization of equilibrium Assume that the convergence is achieved and let (θ∗ , ξ ∗ , . [sent-103, score-0.053]

45 , ξ ∗ ) be the equilibrium values K 1 of (θ t , ξ t , . [sent-106, score-0.053]

46 (15) and (16), we have 1 ∗ ◦ pr (x; θ ∗ − ξ r ) = θ ∗ , r = 1, . [sent-111, score-0.104]

47 (19) ∗ This means that p0(x; θ ∗ ) and pr (x; θ ∗ − ξ r ), r = 1, . [sent-115, score-0.104]

48 , K , are equimarginal, that is, ∗ pr (x; θ ∗ − ξ r ) ∈ M(θ ∗ ), r = 1, . [sent-118, score-0.104]

49 , K (20) E(θ ∗ ) p(x|z) M(θ ∗ ) p2 (x; θ ∗ − ξ ∗ ) 2 p1 (x; θ ∗ − ξ ∗ ) 1 M1 M2 ¢ ¡ MK p K (x; θ ∗ − ξ ∗ ) K S M0 p0 (x; θ ∗ ) Figure 2: Geometric structure of belief propagation decoder holds. [sent-121, score-0.798]

50 Another property of the equilibrium is the log-linear relation K ∗ ∗ log pr (x; θ ∗ − ξ r ) − log p0 (x; θ ∗ ) + const. [sent-122, score-0.257]

51 log p(x|z) − log p0 (x; θ ) = (21) r=1 or, in the (ζ , v) coordinate, K (β1, ρ1) − (θ ∗ , 0) = ∗ (θ ∗ − ξ r , ρer ) − (θ ∗ , 0) . [sent-123, score-0.056]

52 (22) r=1 This means that the true posterior p(x|z) belongs to the 'log-linear submanifold' E(θ∗ ), ∗ the afﬁne subspace in the (ζ , v)-coordinate rooted at (θ∗ , 0) and spanned by (−ξ r , ρer ), r = 1, . [sent-124, score-0.056]

53 These two properties do not imply p(x|z) ∈ M(θ∗ ). [sent-128, score-0.028]

54 In fact, if we were to assume, instead of the log-linear relation (21), the linear relation K p(x|z) − p0 (x; θ ∗ ) = ∗ pr (x; ξ r ) − p0 (x; θ ∗ ) , (23) r=1 then we would have p(x|z) ∈ M(θ ∗ ) and thus η(β1, ρ1) = η(θ ∗ , 0). [sent-129, score-0.192]

55 To what degree the log-linear relation deviates from the linear relation determines the decoding error of belief propagation decoder. [sent-131, score-1.01]

56 The structure is best described on the basis of information geometry [6]. [sent-132, score-0.098]

57 Figure 2 illustrates the geometric structure of the belief propagation decoder. [sent-133, score-0.475]

58 It should be noted that the geometrical structure shown here is essentially the same as that for the turbo decoding [8, 9]. [sent-134, score-0.585]

59 5 Main result Based on the information geometry, we have evaluated decoding error, the difference between the true expectation η(β1, ρ1) and its estimate by the belief propagation decoder η(θ ∗ , 0), via perturbation analysis. [sent-135, score-1.276]

60 Taking into account the terms up to second order, we have ρ2 η(β1, ρ1) − η(θ ∗ , 0) = Brs η(θ ∗ , 0) + O(ρ 3 ), (24) 2 r,s;r =s where Brs ≡ ∂ − ∂vr N ∂ ∂θk k g kk Ar k=1 N ∂ − ∂vs j g j j As j=1 ∂ , ∂θ j (25) and ∂ηi (θ ∗ , 0) (26) = Covθ ∗ ,0 x i , cr (x) . [sent-136, score-0.055]

61 ∂vr {Brs } are the elements of the m-embedding curvature tensor of the manifold E(θ∗ ) in S. [sent-137, score-0.061]

62 g ii ≡ 1/(1 − ηii (θ ∗ , 0)2 ) are the diagonal elements of the inverse of the Fisher information of p0 (x; θ ∗ ). [sent-138, score-0.026]

63 This is the generalization of the result obtained for the turbo decoding [8]. [sent-139, score-0.518]

64 If the parity-check matrix A has no two columns with overlap greater than 1, then the principal error term, given in Eq. [sent-144, score-0.26]

65 (24) shows that the principal error term is not coordinate invariant, since the summation with respect to r and s in the right-hand side of Eq. [sent-148, score-0.129]

66 This corresponds to the empirical fact that the performance does depend on the design of the code, that is, the choice of the parity-check matrix A. [sent-150, score-0.045]

67 Explicit evaluation of the principal error term, as in Theorem 1, makes it possible to improve the performance of a code, just in the same way as the perturbational approach to improving the naive mean-ﬁeld approximation [10, 11, 12, 13, 14, 15, 16, 17]. [sent-151, score-0.101]

68 It is believed [3] that Gallager codes have smaller average probability of decoding error if we avoid any two columns of the parity-check matrix A to have overlap greater than 1. [sent-152, score-0.817]

69 An intuitive explanation to this belief is that such avoidance prevents loops with length 4 from appearing in the graphical representation. [sent-153, score-0.45]

70 Since short loops are expected to do harm in proper functioning of belief propagation, their existence may raise the possibility of decoding errors. [sent-154, score-0.785]

71 Our result supports this belief by showing analytically that the principal term of decoding error vanishes when the parity-check matrix of the code is so sparse and prepared with care so that there are no two columns with overlap greater than 1. [sent-155, score-1.046]

72 Loops with length longer than 4 do not contribute to the decoding error at least via the principal term, but they may have effects via higher-order terms. [sent-156, score-0.516]

73 Our analysis presented here can be extended in a straightforward manner to higher-order perturbation analysis in order to quantify these effects. [sent-157, score-0.085]

74 It should be noted that our approach taken in this paper is different from the common approach to analyzing the properties of the belief propagation decoder in the literature, in that we do not consider ensembles of codes. [sent-158, score-0.902]

75 The statistical-mechanical approach to performance analysis of Gallager-type codes [5] also assumes random ensembles. [sent-162, score-0.235]

76 Our analysis, on the other hand, does not assume ensembles but allows, although asymptotically, performance evaluation of the belief propagation decoder to Gallager codes with any single instance of the parity-check matrix with ﬁnite size. [sent-163, score-1.118]

77 Cheng, “Turbo decoding as an instance of Pearl's `belief propagation' algorithm,” IEEE J. [sent-180, score-0.388]

78 MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. [sent-191, score-0.235]

79 Amari, “Information geometrical framework for analyzing belief propagation decoder,” in T. [sent-237, score-0.506]

80 Amari, “Information geometry of turbo codes and low-density paritycheck codes,” submitted to IEEE Trans. [sent-246, score-0.463]

81 Rodriguez, “Boltzmann machine learning using mean ﬁeld theory and linear response correction,” in M. [sent-261, score-0.076]

82 Tanaka, “A theory of mean ﬁeld approximation,” in M. [sent-269, score-0.076]

83 Tanaka, “Information geometry of mean-ﬁeld approximation,” Neural Computation, vol. [sent-277, score-0.098]

84 Yedidia, “An idiosyncratic journey beyond mean ﬁeld theory,” in M. [sent-283, score-0.032]

85 Wiegerinck, “Mean ﬁeld theory for graphical models,” in M. [sent-292, score-0.094]

86 Shimokawa, “Information geometry of α-projection in mean ﬁeld approximation,” in M. [sent-300, score-0.13]

87 Urbanke, “The capacity of low-density parity-check codes under message-passing decodeing,” IEEE Trans. [sent-315, score-0.235]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('decoding', 0.388), ('gallager', 0.38), ('decoder', 0.323), ('belief', 0.246), ('codes', 0.235), ('propagation', 0.229), ('mod', 0.161), ('zr', 0.143), ('tanaka', 0.141), ('saad', 0.141), ('syndrome', 0.137), ('turbo', 0.13), ('loops', 0.127), ('marginalization', 0.121), ('pr', 0.104), ('geometry', 0.098), ('bsc', 0.095), ('codeword', 0.095), ('eld', 0.093), ('code', 0.089), ('ikeda', 0.087), ('opper', 0.083), ('brs', 0.082), ('amari', 0.078), ('advanced', 0.077), ('factorizable', 0.071), ('kabashima', 0.071), ('check', 0.07), ('tap', 0.069), ('principal', 0.066), ('gt', 0.061), ('kappen', 0.061), ('perturbation', 0.058), ('channel', 0.056), ('cr', 0.055), ('murayama', 0.055), ('rodriguez', 0.055), ('equilibrium', 0.053), ('graphical', 0.05), ('overlap', 0.05), ('ar', 0.048), ('decoders', 0.048), ('equimarginal', 0.048), ('mpm', 0.048), ('shiro', 0.048), ('field', 0.045), ('matrix', 0.045), ('mackay', 0.045), ('theory', 0.044), ('er', 0.044), ('relation', 0.044), ('japan', 0.043), ('generator', 0.043), ('mr', 0.043), ('exp', 0.042), ('ensembles', 0.04), ('checks', 0.04), ('submanifold', 0.04), ('xi', 0.039), ('vr', 0.038), ('tanh', 0.038), ('noted', 0.036), ('tokyo', 0.036), ('columns', 0.036), ('curvature', 0.035), ('vanishes', 0.035), ('error', 0.035), ('posteriors', 0.033), ('constituent', 0.033), ('mean', 0.032), ('boltzmann', 0.032), ('expectation', 0.032), ('geometrical', 0.031), ('knowing', 0.031), ('transmission', 0.031), ('practice', 0.03), ('mit', 0.03), ('ef', 0.03), ('ne', 0.029), ('posterior', 0.029), ('greater', 0.028), ('log', 0.028), ('properties', 0.028), ('term', 0.028), ('quantify', 0.027), ('spanned', 0.027), ('family', 0.027), ('length', 0.027), ('ensemble', 0.027), ('elements', 0.026), ('received', 0.026), ('prior', 0.025), ('de', 0.024), ('expecting', 0.024), ('ari', 0.024), ('functioning', 0.024), ('compose', 0.024), ('agt', 0.024), ('cheng', 0.024), ('deviates', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

2 0.51874119 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difﬁcult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.

3 0.18738823 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

Author: Si Wu, Shun-ichi Amari

Abstract: This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accuracy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is conveyed by the change in recurrent interactions as a result of Hebbian learning. 1

4 0.17992048 196 nips-2001-Very loopy belief propagation for unwrapping phase images

Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic

Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances

5 0.12450356 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

Author: Lehel Csató, Manfred Opper, Ole Winther

Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a speciﬁc sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classiﬁcation and density estimation with Gaussian processes and on an independent component analysis problem.

6 0.12300797 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

7 0.094641156 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

8 0.084363803 188 nips-2001-The Unified Propagation and Scaling Algorithm

9 0.06464985 119 nips-2001-Means, Correlations and Bounds

10 0.06248612 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

11 0.048760191 21 nips-2001-A Variational Approach to Learning Curves

12 0.0456945 167 nips-2001-Semi-supervised MarginBoost

13 0.045238663 152 nips-2001-Prodding the ROC Curve: Constrained Optimization of Classifier Performance

14 0.043533813 174 nips-2001-Spike timing and the coding of naturalistic sounds in a central auditory area of songbirds

15 0.042262152 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

16 0.041152988 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

17 0.039529733 43 nips-2001-Bayesian time series classification

18 0.03852066 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding

19 0.038450588 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

20 0.038296994 102 nips-2001-KLD-Sampling: Adaptive Particle Filters

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.155), (1, -0.04), (2, -0.023), (3, -0.045), (4, 0.009), (5, -0.301), (6, 0.093), (7, -0.336), (8, 0.307), (9, 0.268), (10, -0.234), (11, -0.068), (12, -0.083), (13, 0.041), (14, 0.019), (15, -0.013), (16, 0.239), (17, -0.141), (18, 0.016), (19, 0.152), (20, -0.012), (21, 0.188), (22, 0.098), (23, -0.023), (24, 0.019), (25, -0.019), (26, 0.05), (27, -0.047), (28, -0.023), (29, 0.039), (30, 0.027), (31, 0.016), (32, 0.098), (33, -0.028), (34, -0.05), (35, 0.018), (36, -0.011), (37, -0.006), (38, 0.062), (39, 0.05), (40, 0.026), (41, 0.042), (42, -0.039), (43, -0.035), (44, -0.017), (45, -0.02), (46, -0.004), (47, -0.002), (48, -0.024), (49, -0.047)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9765867 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

same-paper 2 0.97388369 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

3 0.61781245 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

Author: Si Wu, Shun-ichi Amari

4 0.61032873 196 nips-2001-Very loopy belief propagation for unwrapping phase images

Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic

5 0.39784136 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

Author: B. Zadrozny

Abstract: This paper presents a method for obtaining class membership probability estimates for multiclass classiﬁcation problems by coupling the probability estimates produced by binary classiﬁers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classiﬁcation accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.

6 0.34760416 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

7 0.28801861 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

8 0.26898369 188 nips-2001-The Unified Propagation and Scaling Algorithm

9 0.22646669 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

10 0.21510293 57 nips-2001-Correlation Codes in Neuronal Populations

11 0.18572399 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding

12 0.14668876 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

13 0.13959967 143 nips-2001-PAC Generalization Bounds for Co-training

14 0.13863941 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing

15 0.13566056 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation

16 0.13470413 3 nips-2001-ACh, Uncertainty, and Cortical Inference

17 0.13454646 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning

18 0.13402875 120 nips-2001-Minimax Probability Machine

19 0.13048315 19 nips-2001-A Rotation and Translation Invariant Discrete Saliency Network

20 0.12926236 76 nips-2001-Fast Parameter Estimation Using Green's Functions

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.022), (16, 0.313), (17, 0.022), (19, 0.033), (27, 0.175), (30, 0.079), (38, 0.018), (59, 0.054), (72, 0.063), (79, 0.031), (83, 0.015), (88, 0.018), (91, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83532256 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

2 0.68826902 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

3 0.62555742 123 nips-2001-Modeling Temporal Structure in Classical Conditioning

Author: Aaron C. Courville, David S. Touretzky

Abstract: The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1 , 2, 3], which other models are unable to explain. 1

4 0.57074857 40 nips-2001-Batch Value Function Approximation via Support Vectors

Author: Thomas G. Dietterich, Xin Wang

Abstract: We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily 'adjust the complexity of the function approximator to fit the complexity of the value function. 1

5 0.56811309 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

Author: Michael Collins, S. Dasgupta, Robert E. Schapire

Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.

6 0.55166513 190 nips-2001-Thin Junction Trees

7 0.55113357 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

8 0.55013669 137 nips-2001-On the Convergence of Leveraging

9 0.54067266 60 nips-2001-Discriminative Direction for Kernel Classifiers

10 0.54063153 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

11 0.53916776 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

12 0.53888226 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

13 0.53863764 8 nips-2001-A General Greedy Approximation Algorithm with Applications

14 0.53826731 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

15 0.5367052 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

16 0.53566855 114 nips-2001-Learning from Infinite Data in Finite Time

17 0.5328033 77 nips-2001-Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

18 0.53270775 27 nips-2001-Activity Driven Adaptive Stochastic Resonance

19 0.5326069 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms

20 0.53249383 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules