nips nips2001 nips2001-98 knowledge-graph by maker-knowledge-mining

98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Source: pdf

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difﬁcult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. [sent-13, score-0.966]

2 The loopy belief network (BN) of turbo codes makes it difﬁcult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. [sent-14, score-1.063]

3 Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. [sent-15, score-0.15]

4 Based on the framework, we reveal basic properties of the turbo decoding. [sent-16, score-0.613]

5 1 Introduction Since the proposal of turbo codes[2], they have been attracting a lot of interests because of their high performance of error correction. [sent-17, score-0.673]

6 Although the thorough experimental results strongly support the potential of this iterative decoding method, the mathematical background is not sufﬁciently understood. [sent-18, score-0.525]

7 [5] have shown its relation to the Pearl’s BP, but the BN for the turbo decoding is loopy, and the BP solution gives only an approximation. [sent-20, score-1.116]

8 The problem of the turbo decoding is a speciﬁc example of a general problem of marginalizing an exponential family distribution. [sent-21, score-1.063]

9 The distribution includes higher order correlations, and its direct marginalization is intractable. [sent-22, score-0.271]

10 By collecting and exchanging the BP results of the partial models, the true “belief” is approximated. [sent-24, score-0.075]

11 This structure is common among various iterative methods, such as Gallager codes, Beth´ approximation in statistical physics[4], and BP for loopy BN. [sent-25, score-0.106]

12 e We investigate the problem from information geometrical viewpoint[1]. [sent-26, score-0.113]

13 It gives a new framework for analyzing these iterative methods, and shows an intuitive understanding of them. [sent-27, score-0.227]

14 Also it reveals a lot of basic properties, such as characteristics of the equilibrium, the condition of stability, the cost function related to the decoder, and the decoding error. [sent-28, score-0.551]

15 In this paper, we focus on the turbo decoding, because its structure is simple, but the framework is general, and the main results can be generalized. [sent-29, score-0.644]

16 8 £ " © © ¡ © 9 4 9 £ 1 )' ¡ q s © w G£ ¢u Fb$8 £ DA$# £ 5 47B$G8 £ " y I I xp P w © H s ¥ I xp s P w 96© wuvH tI ¥ q p© £ ¡ c © fe T r G© § 8pi¢h0$8 £ " gY¡UWV dc S R § I ¦¥£ " ba`I XY¡UWV ! [sent-32, score-0.066]

17 ¨¦¥¤¢ ¥© ©§ £ ¡ Let us consider a distribution of which is deﬁned as follows (1) is the linear function of , and each is the higher order correlations where, of . [sent-34, score-0.075]

18 The problem of turbo codes and similar iterative methods are to marginalize this distribution. [sent-35, score-0.76]

19 Let denote the operator of marginalization as, marginalization is equivalent to take the expectation of as . [sent-36, score-0.442]

20 The In the case of MPM (maximization of the posterior marginals) decoding, and the sign of each is the decoding result. [sent-37, score-0.45]

21 (1) is not tractable, but the marginalization of the following distribution is tractable. [sent-40, score-0.226]

22 The iterative methods are exchanging information through for each , and ﬁnally approximate . [sent-43, score-0.12]

23 GvGv¦D§ § © ©§§ £ ¡ In the case of turbo codes, is the information bits, from which the turbo encoder generates two sets of parity bits, , and , (Fig. [sent-53, score-1.316]

24 Each parity bit is expressed as the form , where the product is taken over a subset of . [sent-55, score-0.04]

25 The codeword is then transmitted over a noisy channel, which we assume BSC (binary symmetric channel) with ﬂipping probability . [sent-56, score-0.043]

26 The ultimate goal of the turbo decoding is the MPM decoding of Since the channel is memoryless, the following relation holds based on . [sent-58, score-1.62]

27 By assuming the uniform prior on , the posterior distribution is given as follows © $ £ " G8 £ ¡ £ £ 1 ) ¡ q ¤£ u 32' $! [sent-59, score-0.049]

28 When is large, marginalization of is intractable since it needs summation over terms. [sent-64, score-0.203]

29 Turbo codes utilize two decoders which solve the MPM decoding of in eq. [sent-65, score-0.595]

30 The distribution is derived from and the prior of which has the form of is a factorizable distribution. [sent-67, score-0.045]

31 The marginalization of is feasible since its BN is loop free. [sent-68, score-0.203]

32 The parameter serves as the window of exchanging the information between the two decoders. [sent-69, score-0.075]

33 The MPM decoding is approximated by updating iteratively in “turbo” like way. [sent-70, score-0.45]

34 This is the submanifold of , every distribution of deﬁned (4) can be rewritten as follows It shows that every distribution of is decomposable, or factorizable. [sent-75, score-0.273]

35 From the information geometry[1], we have the following theorem of –projection. [sent-76, score-0.053]

36 Let be an –ﬂat submanifold in , and let minimizes the KL-divergence from to , is denoted by, 4 53 . [sent-78, score-0.196]

37 The point in It is easy to show that the marginalization corresponds to the –projection to [7]. [sent-81, score-0.203]

38 Since MPM decoding and marginalization is equivalent, MPM decoding is also equivalent to the –projection to . [sent-82, score-1.103]

39 # 5 ¨ # ¨ denote the parameters in of the # –projected distribution, B V@ 2 0 © ¡ $S % q 9D8 £ " 6$# £ $ © QPI8) 1R( ¨ $$# £ 5HF T S U5 $ $8 £ ¥S % GF Let The turbo decoding process is written as follows, ¨ 2. [sent-83, score-1.063]

40 Let us deﬁne an –ﬂat version of the submanifold as , and in log-linear manner , From its deﬁnition, for any , the expectation of is the same, and its – projection to coincides with . [sent-94, score-0.302]

41 This is an –ﬂat submanifold[1], and we call an equimarginal submanifold. [sent-95, score-0.065]

42 # Let us deﬁne a manifold as (6) (7) When the the turbo decoding converges, equilibrium solution deﬁnes three important distributions, , , and . [sent-98, score-1.185]

43 and hold because includes cross terms of and in general. [sent-108, score-0.045]

44 The information geometrical view of the turbo decoding is schematically shown in Fig. [sent-109, score-1.176]

45 We now deﬁne the submanifold corresponding to each decoder, in eq. [sent-111, score-0.166]

46 (5) is An intuitive understanding of the turbo decoding is as follows. [sent-112, score-1.153]

47 The distribution becomes , and is estimated by projecting it onto . [sent-114, score-0.053]

48 (5) is replaced with , and is estimated by – projection of . [sent-116, score-0.068]

49 # (5) The turbo decoding approximates the estimated parameter , the projection of onto , as , where the estimated distribution is for , go to step 2. [sent-117, score-1.184]

50 onto onto as , and calculate as , and calculate , and . [sent-118, score-0.06]

51 Following the same line for step 3, we derive the theorem which coincides with the result of Richardson[6]. [sent-125, score-0.085]

52 The Fisher information matrix is deﬁned as follows ! [sent-128, score-0.026]

53 8 £ " Here, We give a sufﬁciently small perturbation to –projection from to gives, and apply one turbo decoding step. [sent-129, score-1.089]

54 The Equation (6) is rewritten as follows with these parameters, | { ¡ e T c © £ { © £ e T £5 q © w ¡£ }v$! [sent-130, score-0.061]

55 8 £ " f ¤¡UWV 7£ 6"©v5 R v¡ 8 v5 " f i¡UWV 7"©vc 5 The expectation parameters are deﬁned as follows with in eq. [sent-131, score-0.062]

56 # C§ " 08 £ 5 " © £ © © If includes , is the true marginalization of . [sent-135, score-0.248]

57 This fact means that and are not necessarily equimarginal, which is the origin of the decoding error. [sent-137, score-0.479]

58 When the turbo decoding procedure converges, the convergent probability distributions , , and belong to equimarginal submanifold , while its –ﬂat version includes these three distributions and also the posterior distribution (Fig. [sent-139, score-1.395]

59 9 © 9 £ 1 ) ¡ © # § $#¤¦¨G3© ©£ $# " 3$# £ 5 #420' $3© # £ " © 3© 8 £ " Let us consider , where every distribution tion parameter, that is, holds. [sent-169, score-0.023]

60 Here, we deﬁne, the Taylor expansion, we have, This distribution includes ( , ), where parameter is deﬁned as, , ( , and ), ( For the following discussion, we deﬁne a distribution , (8) has the same expecta. [sent-170, score-0.091]

61 is always negative, and is generally R 'R and , we have u 5 ¤ R R © © R 'R And by transforming the variables as, This shows how the algorithm works, but it does not give the characteristics of the equilibrium point. [sent-174, score-0.16]

62 Direct calculation gives equilibrium, , holds, and the proof is completed. [sent-176, score-0.029]

63 We give the cost function which plays an important role in turbo decoding. [sent-182, score-0.635]

64 3 Cost Function and Characteristics of Equilibrium ¢ u ¦£ § § { © ©75 ³ £ ¡ holds for all , the equilibrium point is stable. [sent-184, score-0.166]

65 (10), we have the following approximation, © § § © From the condition , where is the parameter which is not necessarily included in © © Next, let , and we consider satisﬁes this equation. [sent-193, score-0.084]

66 Also when we put following result, © § Let © , and where, § § " # This result gives the approximation accuracy of the BP decoding. [sent-197, score-0.029]

67 Let the true belief be , and we evaluate the difference between and on . [sent-198, score-0.127]

68 The true expectation of , which is , is approximated as, $ % $ % $ & © § © § $ % " # Where (11) is the solution of the turbo decoding. [sent-201, score-0.649]

69 The result can be ) ' 0( 4 Discussion We have shown a new framework for understanding and analyzing the belief propagation decoder. [sent-205, score-0.32]

70 Since the BN of turbo codes is loopy, we don’t have enough theoretical results for BP algorithm, while a lot of experiments show that it works surprisingly well in such cases. [sent-206, score-0.753]

71 The mystery of the BP decoders is summarized in 2 points, the approximation accuracy and the convergence property. [sent-207, score-0.118]

72 Our results elucidate the mathematical background of the BP decoding algorithm. [sent-208, score-0.48]

73 The information geometrical structure of the equilibrium is summarized in Theorem 2. [sent-209, score-0.267]

74 It shows © ©£ © ©£ ° the –ﬂat submanifold plays an important role. [sent-210, score-0.188]

75 Furthermore, Theorem 5 shows that the relation between and the –ﬂat submanifold causes the decoding error, and the principal component of the error is the curvature of . [sent-211, score-0.687]

76 Since the curvature strongly depends on the codeword, we can control it by the encoder design. [sent-212, score-0.097]

77 This shows a room for improvement of the “near optimum error correcting code”[2]. [sent-213, score-0.06]

78 © "©£ © ©£ ¨ # For the convergent property, we have shown the energy function, which is known as Beth´ e free energy[4, 9]. [sent-214, score-0.093]

79 Unfortunately, the ﬁxed point of the turbo decoding algorithm is generally a saddle of the function, which makes further analysis difﬁcult. [sent-215, score-1.091]

80 We have only shown a local stability condition, and the global property is one of our future works. [sent-216, score-0.03]

81 This paper gives a ﬁrst step to the information geometrical understanding of the belief propagation decoder. [sent-217, score-0.399]

82 The main results are for the turbo decoding, but the mechanism is common with wider class, and the framework is valid for them. [sent-218, score-0.644]

83 We believe further study in this direction will lead us to better understanding and improvements of these methods. [sent-219, score-0.06]

84 (1996) Near optimum error correcting coding and decoding: Turbo-codes. [sent-230, score-0.06]

85 (2001) Information geometry of turbo codes and low-density parity-check codes. [sent-236, score-0.775]

86 Saad, editors, Advanced Mean Field Methods – Theory and Practice, chapter 6, pages 65–84. [sent-244, score-0.03]

87 (1998) Turbo decoding as an instance of Pearl’s “belief propagation” algorithm. [sent-254, score-0.45]

88 Saad, editors, Advanced Mean Field Methods – Theory and Practice, chapter 17, pages 259–273. [sent-266, score-0.03]

89 (2001) Bethe free energy, Kikuchi approximations, and belief propagation algorithms. [sent-285, score-0.219]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('turbo', 0.613), ('decoding', 0.45), ('marginalization', 0.203), ('uwv', 0.175), ('mpm', 0.174), ('submanifold', 0.166), ('bp', 0.155), ('belief', 0.127), ('equilibrium', 0.122), ('geometrical', 0.113), ('codes', 0.102), ('srq', 0.1), ('decoder', 0.087), ('exchanging', 0.075), ('vip', 0.075), ('tanaka', 0.074), ('vh', 0.074), ('propagation', 0.07), ('bn', 0.069), ('projection', 0.068), ('hf', 0.066), ('equimarginal', 0.065), ('loopy', 0.061), ('understanding', 0.06), ('geometry', 0.06), ('ikeda', 0.059), ('de', 0.053), ('theorem', 0.053), ('beth', 0.05), ('encoder', 0.05), ('curvature', 0.047), ('includes', 0.045), ('iterative', 0.045), ('holds', 0.044), ('mystery', 0.043), ('codeword', 0.043), ('decoders', 0.043), ('gallager', 0.043), ('kabashima', 0.043), ('shiro', 0.043), ('aii', 0.043), ('gf', 0.041), ('japan', 0.04), ('wy', 0.04), ('parity', 0.04), ('mceliece', 0.04), ('xf', 0.04), ('channel', 0.039), ('characteristics', 0.038), ('energy', 0.038), ('lot', 0.038), ('ned', 0.037), ('saad', 0.037), ('expectation', 0.036), ('amari', 0.036), ('rewritten', 0.035), ('correcting', 0.035), ('communications', 0.035), ('ii', 0.035), ('fisher', 0.033), ('xp', 0.033), ('tokyo', 0.033), ('convergent', 0.033), ('summarized', 0.032), ('analyzing', 0.032), ('coincides', 0.032), ('framework', 0.031), ('intuitive', 0.03), ('stability', 0.03), ('chapter', 0.03), ('opper', 0.03), ('onto', 0.03), ('mathematical', 0.03), ('let', 0.03), ('necessarily', 0.029), ('pearl', 0.029), ('gives', 0.029), ('advanced', 0.028), ('saddle', 0.028), ('ig', 0.027), ('ne', 0.027), ('ge', 0.026), ('bits', 0.026), ('perturbation', 0.026), ('follows', 0.026), ('correlations', 0.026), ('optimum', 0.025), ('condition', 0.025), ('relation', 0.024), ('ip', 0.024), ('distribution', 0.023), ('free', 0.022), ('plays', 0.022), ('wtu', 0.022), ('attracting', 0.022), ('mitsubishi', 0.022), ('uv', 0.022), ('xip', 0.022), ('bsc', 0.022), ('factorizable', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

2 0.51874119 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

Abstract: We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1. 1

3 0.19952647 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

Author: Si Wu, Shun-ichi Amari

Abstract: This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accuracy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is conveyed by the change in recurrent interactions as a result of Hebbian learning. 1

4 0.14952168 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This uniﬁed propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

5 0.12382632 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

Abstract: We develop a tree-based reparameterization framework that provides a new conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including a new characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy. Our results also have natural extensions to more structured approximations [e.g. , 1, 2]. 1

6 0.11907539 196 nips-2001-Very loopy belief propagation for unwrapping phase images

7 0.096618935 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

8 0.083518207 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

9 0.05987709 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

10 0.040683612 21 nips-2001-A Variational Approach to Learning Curves

11 0.040479701 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

12 0.037073482 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

13 0.036352962 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces

14 0.033422757 37 nips-2001-Associative memory in realistic neuronal networks

15 0.033027302 146 nips-2001-Playing is believing: The role of beliefs in multi-agent learning

16 0.032348882 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

17 0.032284096 164 nips-2001-Sampling Techniques for Kernel Methods

18 0.032067027 8 nips-2001-A General Greedy Approximation Algorithm with Applications

19 0.031741798 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex

20 0.031582762 143 nips-2001-PAC Generalization Bounds for Co-training

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.128), (1, -0.032), (2, -0.008), (3, -0.039), (4, 0.027), (5, -0.299), (6, 0.094), (7, -0.338), (8, 0.307), (9, 0.282), (10, -0.236), (11, -0.071), (12, -0.074), (13, 0.049), (14, 0.013), (15, -0.001), (16, 0.225), (17, -0.115), (18, 0.014), (19, 0.149), (20, 0.011), (21, 0.195), (22, 0.103), (23, -0.026), (24, -0.001), (25, -0.016), (26, 0.05), (27, -0.039), (28, -0.015), (29, 0.009), (30, 0.026), (31, 0.024), (32, 0.107), (33, -0.019), (34, -0.071), (35, 0.02), (36, -0.003), (37, -0.013), (38, 0.056), (39, 0.024), (40, 0.033), (41, 0.042), (42, -0.04), (43, -0.008), (44, 0.011), (45, -0.029), (46, 0.003), (47, -0.033), (48, -0.035), (49, -0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97782272 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

2 0.95959067 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

3 0.61971748 196 nips-2001-Very loopy belief propagation for unwrapping phase images

Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic

Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances

4 0.59153575 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

Author: Si Wu, Shun-ichi Amari

5 0.3592557 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

6 0.34897408 159 nips-2001-Reducing multiclass to binary by coupling probability estimates

7 0.28982323 188 nips-2001-The Unified Propagation and Scaling Algorithm

8 0.26124597 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

9 0.21704102 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

10 0.1834372 57 nips-2001-Correlation Codes in Neuronal Populations

11 0.14090729 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

12 0.13617761 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding

13 0.1233076 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

14 0.12103004 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

15 0.11993129 164 nips-2001-Sampling Techniques for Kernel Methods

16 0.11797839 3 nips-2001-ACh, Uncertainty, and Cortical Inference

17 0.11408854 143 nips-2001-PAC Generalization Bounds for Co-training

18 0.11342825 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA

19 0.11322916 120 nips-2001-Minimax Probability Machine

20 0.11041331 136 nips-2001-On the Concentration of Spectral Properties

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.018), (16, 0.13), (17, 0.018), (19, 0.03), (27, 0.258), (30, 0.061), (38, 0.012), (59, 0.036), (72, 0.05), (79, 0.026), (83, 0.03), (88, 0.149), (91, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91396916 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

2 0.85542917 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

Author: Toshiyuki Tanaka, Shiro Ikeda, Shun-ichi Amari

3 0.8492707 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models

Author: Guy Lebanon, John D. Lafferty

Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.

4 0.81100988 190 nips-2001-Thin Junction Trees

Author: Francis R. Bach, Michael I. Jordan

Abstract: We present an algorithm that induces a class of models with thin junction trees—models that are characterized by an upper bound on the size of the maximal cliques of their triangulated graph. By ensuring that the junction tree is thin, inference in our models remains tractable throughout the learning process. This allows both an efﬁcient implementation of an iterative scaling parameter estimation algorithm and also ensures that inference can be performed efﬁciently with the ﬁnal model. We illustrate the approach with applications in handwritten digit recognition and DNA splice site detection.

5 0.81032592 75 nips-2001-Fast, Large-Scale Transformation-Invariant Clustering

Author: Brendan J. Frey, Nebojsa Jojic

Abstract: In previous work on “transformed mixtures of Gaussians” and “transformed hidden Markov models”, we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to ﬁnd. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For N ×N images, learning C clusters under N rotations, N scales, N x-translations and N y-translations takes only (C + 2 log N )N 2 scalar operations per iteration. In contrast, the original algorithm takes CN 6 operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320 ×240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm. 1

6 0.80645299 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

7 0.80458951 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

8 0.80165398 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

9 0.80109715 23 nips-2001-A theory of neural integration in the head-direction system

10 0.79963136 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA

11 0.79807591 47 nips-2001-Causal Categorization with Bayes Nets

12 0.79442906 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

13 0.78043878 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

14 0.76752335 137 nips-2001-On the Convergence of Leveraging

15 0.73522878 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation

16 0.72146612 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

17 0.72073245 8 nips-2001-A General Greedy Approximation Algorithm with Applications

18 0.71448821 114 nips-2001-Learning from Infinite Data in Finite Time

19 0.71402413 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

20 0.70850658 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces