nips nips2001 nips2001-180 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. [sent-15, score-0.226]
2 It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. [sent-16, score-0.088]
3 In particular, we prove relationships to some applications of Legendre transform techniques. [sent-17, score-0.05]
4 We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). [sent-18, score-0.029]
5 CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. [sent-19, score-0.112]
6 1 Introduction There is a lot of interest in designing discrete time dynamical systems for inference and learning (see, for example, [10], [3], [7], [13]). [sent-20, score-0.106]
7 This paper describes a simple geometrical Concave-Convex procedure (CCCP) for constructing discrete time dynamical systems which can be guaranteed to decrease almost any global optimization/energy function (see technical conditions in section (2)). [sent-21, score-0.16]
8 We prove that there is a relationship between CCCP and optimization techniques based on introducing auxiliary variables using Legendre transforms. [sent-22, score-0.088]
9 In the former, see [6], the introduction of auxiliary variables converts the problem to a min-max problem where the goal is to find a saddle point. [sent-24, score-0.093]
10 By contrast, in Legendre minimization, see [8], the problem remains a minimization one (and so it becomes easier to analyze convergence). [sent-25, score-0.06]
11 CCCP relates to Legendre minimization only and gives a geometrical perspective which complements the algebraic manipulations presented in [8]. [sent-26, score-0.182]
12 CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. [sent-27, score-0.112]
13 We illustrate this by giving examples from Potts models, EM, linear assignment, and Generalized Iterative Scaling. [sent-28, score-0.029]
14 Recently, CCCP has also been used to construct algorithms to minimize the Bethe/Kikuchi free energy [13]. [sent-29, score-0.267]
15 Theorem 1 shows that any function , subject to weak conditions, can be expressed as the sum of a convex and concave part (this decomposition is not unique). [sent-33, score-0.403]
16 This implies that CCCP can be applied to (almost) any optimization problem. [sent-34, score-0.029]
17 Let E(x) be an energy function with bounded Hessian [J2 E(x)/8x8x. [sent-36, score-0.179]
18 Then we can always decompose it into the sum of a convex function and a concave function. [sent-37, score-0.437]
19 Select any convex function F(x) with positive definite Hessian with eigenvalues bounded below by f > o. [sent-39, score-0.209]
20 Then there exists a positive constant A such that the Hessian of E(x) + AF(x) is positive definite and hence E(x) + AF(x) is convex. [sent-40, score-0.026]
21 Hence we can express E(x) as the sum of a convex part, E(x) + AF(x) , and a concave part -AF(x). [sent-41, score-0.403]
22 Figure 1: Decomposing a function into convex and concave parts. [sent-42, score-0.403]
23 The original function (Left Panel) can be expressed as the sum of a convex function (Centre Panel) and a concave function (Right Panel). [sent-43, score-0.403]
24 Our main result is given by Theorem 2 which defines the CCCP procedure and proves that it converges to a minimum or saddle point of the energy. [sent-46, score-0.081]
25 Consider an energy function E(x) (bounded below) of form E(x) = Evex (x) + E cave (x) where Evex (x), E cave (x) are convex and concave functions of x respectively. [sent-48, score-0.74]
26 Then the discrete iterative CCCP algorithm ;zt f-7 ;zt+1 given by: -t+l _ -t \1Evex (x ) - -\1Ecave (x ), (1) is guaranteed to monotonically decrease the energy E(x) as a function of time and hence to converge to a minimum or saddle point of E(x). [sent-49, score-0.399]
27 We can get a graphical illustration of this algorithm by the reformulation shown in figure (2) (suggested by James M. [sent-58, score-0.038]
28 Think of decomposing the energy function E(x) into E 1(x) - E 2(x) where both E 1(x) and E 2(x) are convex. [sent-60, score-0.215]
29 (This is equivalent to decomposing E(x) into a a convex term E 1(x) plus a concave term -E2(X)) . [sent-61, score-0.514]
30 The algorithm proceeds by matching points on the two terms which have the same tangents. [sent-62, score-0.063]
31 7~------~--------~------, 60 50 40 - 30 - 20 - o 10 O L---~=-~O-=~~~~----~ 10 XO Figure 2: A CCCP algorithm illustrated for Convex minus Convex. [sent-65, score-0.074]
32 We want to minimize the function in the Left Panel. [sent-66, score-0.038]
33 We decompose it (Right Panel) into a convex part (top curve) minus a convex term (bottom curve). [sent-67, score-0.462]
34 The algorithm iterates by matching points on the two curves which have the same tangent vectors, see text for more details. [sent-68, score-0.063]
35 The algorithm rapidly converges to the solution at x = 5. [sent-69, score-0.038]
36 We can extend Theorem 2 to allow for linear constraints on the variables X, for example Li et Xi = aM where the {en, {aM} are constants. [sent-71, score-0.066]
37 This follows directly because properties such as convexity and concavity are preserved when linear constraints are imposed. [sent-72, score-0.114]
38 In these cases we will need the following theorem where we re-express CCCP in terms of minimizing a time sequence of convex update energy functions Et+1 (xt+1) to obtain the updates xt+1 (i. [sent-78, score-0.494]
39 at the tth iteration of CCCP we need to minimize the energy Et+1 (xt+1 )). [sent-80, score-0.217]
40 Let E(x) = Evex (x) + E cave (x) where X is required to satisfy the linear constraints Li et Xi = aM, where the {et}, {aM} are constants. [sent-83, score-0.112]
41 Then the update rule for xt+1 can be formulated as minimizing a time sequence of convex update energy functions Et+1 (;rt+1): (2) where the lagrange parameters P'J1} impose linear comnstraints. [sent-84, score-0.532]
42 The convexity of EH1 (;rt+1) implies that there is a unique minimum corresponding to ;rt+1. [sent-87, score-0.047]
43 3 Legendre Transformations The Legendre transform can be used to reformulate optimization problems by introducing auxiliary variables [6]. [sent-89, score-0.166]
44 The idea is that some of the formulations may be more effective (and computationally cheaper) than others. [sent-90, score-0.035]
45 An advantage of Legendre minimization is that mathematical convergence proofs can be given. [sent-92, score-0.06]
46 (For example, [8] proved convergence results for the algorithm implemented in [7]. [sent-93, score-0.038]
47 ) In Theorem 4 we show that Legendre minimization algorithms are equivalent to CCCP. [sent-94, score-0.109]
48 The CCCP viewpoint emphasizes the geometry of the approach and complements the algebraic manipulations given in [8]. [sent-95, score-0.093]
49 (Moreover, our results of the previous section show the generality of CCCP while, by contrast, the Legendre transform methods have been applied only on a case by case basis). [sent-96, score-0.05]
50 Then F*(Y) is concave and is the Legendre transform of F(x). [sent-101, score-0.27]
51 Then applying CCCP to E1 (x) is equivalent to minimizing E2 (x, Y) with respect to x and y alternatively (for suitable choices of g(. [sent-114, score-0.104]
52 Thus minimizing E1 (x) with respect to x is equivalent to minimizing E1 (x, Y) = f(x) + x . [sent-125, score-0.107]
53 ) Alternatively minimization over x and y gives: (i) of/ax = y to determine Xt+1 in terms of Yt, and (ii) ag* / ay = x to determine Yt in terms of Xt which, by Property 1 of the Legendre transform is equivalent to setting y = -ag / ax. [sent-128, score-0.133]
54 Combining these two stages gives CCCP: af (_) = - ax (_) . [sent-129, score-0.071]
55 These attempt to minimize discrete energy functions of form E[V] = 2: i ,j,a,b Tij ab Via V)b + 2: ia Bia Vi a, where the {Via} take discrete values {a, I} with linear constraints 2:i Via = 1, Va. [sent-133, score-0.504]
56 Mean field algorithms minimize a continuous effective energy E ett [S; T] to obtain a minimum of the discrete energy E[V] in the limit as T f-7 a. [sent-135, score-0.614]
57 The {Sial are continuous variables in the range [0 ,1] and correspond to (approximate) estimates of the mean states of the {Via}. [sent-136, score-0.033]
58 As described in [12}, to ensure that the minima of E[V] and E ett [S; T] all coincide (as T f-7 0) it is sufficient that T ijab be negative definite. [sent-137, score-0.079]
59 Moreover, this can be attained by adding a term -K 2: ia to E[V] (for sufficiently large K) without altering the structure of the minima of E[V] . [sent-138, score-0.176]
60 Hence, without loss of generality we can consider 2: i ,j,a,b Tijab Via V)b to be a concave function . [sent-139, score-0.22]
61 We impose the linear constraints by adding a Lagrange multiplier term 2: a Pa {2: i Via - I} to the energy where the {Pa} are the Lagrange multipliers. [sent-141, score-0.27]
62 The effective energy becomes: ia i,j,a ,b ia a We can then incorporate the Lagrange multiplier term into the convex part. [sent-142, score-0.723]
63 This gives a discrete update rule: 2:i Sia(t + 1) = Sia (t + 1) = e(-1/T){2 2:. [sent-144, score-0.106]
64 The EM algorithm seeks to estimate a variable f* = argmaxt log 2:{I} P(f, l), where {f}, {l} are variables that depend on the specific problem formulation. [sent-149, score-0.187]
65 It was shown in [4] that this is equivalent to minimizing the following effective energy with respect to the variables f and P(l): E ett [! [sent-150, score-0.391]
66 , P(l)] = - ~ 2:1 P(l) log P(f, l) + ~ 2:{I} P(l) log P(l). [sent-151, score-0.18]
67 To apply CCCP to an effective energy like this we need either: (a) to decompose E ett [! [sent-152, score-0.327]
68 , P(l)] into convex and concave functions of f, P(l), or (b) to eliminate either variable and obtain a convex concave decomposition in the remaining variable (d. [sent-153, score-0.806]
69 Let E el I [S, 171 be the effective energy for the elastic net, then the {y} variables can be eliminated and the resulting Es[S] can be minimized using GGGP. [sent-159, score-0.373]
70 (Note that the standard elastic net only enforces the second set of linear constraints). [sent-160, score-0.193]
71 The elastic net energy function can be expressed as [11]: ia a,b where we impose the conditions L: a Sia = 1, V i and i,a L:i Sia = 1, V a. [sent-162, score-0.554]
72 The EM algorithm can be applied to estimate the {Ya}. [sent-163, score-0.038]
73 Alternatively we can solve for the {Ya} variables to obtain Yb = L:i a PabSiaXi where {Pab } = {Jab + 2')'Aab} -1. [sent-164, score-0.033]
74 We substitute this back into E ell [S, 171 to get a new energy Es[S] given by: (6) i ,j,a,b i,a Once again this is a sum of a concave and a convex part (the first term is concave because of the minus sign and the fact that {Pba } and Xi . [sent-165, score-0.864]
75 ) We can now apply GGGP and obtain the standard EM algorithm for this problem. [sent-167, score-0.038]
76 Our final example is a discrete iterative algorithm to solve the linear assignment problem. [sent-169, score-0.237]
77 This algorithm was reported by Kosowsky and Yuille in [5] where it was also shown to correspond to the well-known Sinkhorn algorithm [9]. [sent-170, score-0.076]
78 We now show that both Kosowsky and Yuille's linear assignment algorithm, and hence Sinkhorn's algorithm are examples of CCCP (after a change of variables). [sent-171, score-0.119]
79 The linear assignment problem seeks to find the permutation matrix {TI ia } which minimizes the energy E[m = L: ia TI ia A ia , where {Aia} is a set of assignment values. [sent-173, score-0.967]
80 As shown in [5} this is equivalent to minimizing the (convex) Ep[P] energy given by Ep[P] = L: aPa + ~ L:i log L:a e-,B(Aia+Pa) , where the solution is given by TI;a = e-,B(Aia+Pa) / L:b e-,B(Aib+Pb) rounded off to the nearest integer (for sufficiently large fJ). [sent-174, score-0.334]
81 The iterative algorithm to minimize Ep[P] (which can be re-expressed as Sinkhorn's algorithm, see [5}) is of form: (7) and can be re-expressed as GGGP. [sent-175, score-0.142]
82 By performing the change of coordinates fJPa = - log r a V a (for r a > 0, Va) we can re-express the Ep[P] energy as: (8) Observe that the first term of Er [r] is convex and the second term is concave (this can be verified by calculating the Hessian). [sent-177, score-0.724]
83 Applying CCCP gives the update rule: 1 rt+l = a 2:= 2::: i e-,BAia e-,BAibrt' b (9) b which corresponds to equation (7). [sent-178, score-0.054]
84 The GIS algorithm is designed to estimate the parameter Xof a distri- bution P(x : X) = eX. [sent-182, score-0.038]
85 Darroch and Ratcliff [ll prove that the following GIS algorithm is guaranteed to converge to value X* that minimizes the (convex) cost function E(X) = log Z[X]-X. [sent-189, score-0.158]
86 = Xt - log ht + log h, (10) where ht = 2::: x P(x; Xt )¢(x) {evaluate log h componentwise: (log h)fJ, = log hf),') To show that GIS can be reformulated as CCCP, we introduce a new variable iJ = eX (componentwise). [sent-192, score-0.436]
87 We reformulate the problem in terms of minimizing the cost function E,B [iJ] = log Z[log(iJ)] - h . [sent-193, score-0.16]
88 (log iJ) is a convex function of iJ with first derivative being -hi iJ (where the division is componentwise). [sent-196, score-0.183]
89 The first derivative of log Z[log(iJ)] is (II iJ) 2::: x ¢(x)P(x: log ,8) (evaluated componentwise). [sent-197, score-0.18]
90 To show that log Z[log(iJ)] is concave requires computing its Hessian and applying the Cauchy-Schwarz inequality using the fact that 2:::fJ, ¢fJ,(x) = 1, V x and that ¢fJ,(x) ::::: 0, V j. [sent-198, score-0.31]
91 We can therefore apply CCCP to E,B [iJ] which yields l/iJH1 = l/iJt x Ilh x ht (componentwise) , which is GIS (by taking logs and using log ,8 = X). [sent-200, score-0.128]
92 5 Conclusion CCCP is a general principle which can be used to construct discrete time iterative dynamical systems for almost any energy minimization problem. [sent-201, score-0.411]
93 It gives a geometric perspective on Legendre minimization (though not on Legendre min-max). [sent-202, score-0.089]
94 We have illustrated that several existing discrete time iterative algorithms can be reinterpreted in terms of CCCP (see Yuille and Rangarajan, in preparation, for other examples). [sent-203, score-0.177]
95 Therefore CCCP gives a novel way ofthinking about and classifying existing algorithms. [sent-204, score-0.062]
96 See, for example, recent work [13] where CCCP was used to construct a double loop algorithm to minimize the Bethe/Kikuchi free energy (which are generalizations of the mean field free energy). [sent-206, score-0.353]
97 There also are similarities to the work of Hoang Tuy who has shown that any arbitrary closed set is the projection of a difference of two convex sets in a space with one more dimension. [sent-210, score-0.183]
98 " An Analysis of an Elastic net Approach to the Traveling Salesman Problem". [sent-235, score-0.067]
99 " A Convergence Proof for the Softassign Quadratic assignment Problem". [sent-285, score-0.081]
100 "Generalized Deformable Models, Statistical Physics and Matching Problems," Neural Computation, 2 pp 1-24. [sent-310, score-0.088]
wordName wordTfidf (topN-words)
[('cccp', 0.566), ('legendre', 0.299), ('evex', 0.258), ('concave', 0.22), ('xt', 0.217), ('convex', 0.183), ('energy', 0.179), ('ecave', 0.159), ('ia', 0.15), ('elastic', 0.126), ('gis', 0.126), ('yuille', 0.124), ('sia', 0.119), ('fj', 0.105), ('rangarajan', 0.103), ('log', 0.09), ('pp', 0.088), ('assignment', 0.081), ('cave', 0.079), ('ett', 0.079), ('potts', 0.079), ('sinkhorn', 0.079), ('ij', 0.079), ('componentwise', 0.079), ('net', 0.067), ('iterative', 0.066), ('theorem', 0.065), ('minimization', 0.06), ('aia', 0.06), ('kosowsky', 0.06), ('tijabsjb', 0.06), ('xo', 0.055), ('em', 0.055), ('hessian', 0.054), ('dynamical', 0.054), ('pa', 0.053), ('rt', 0.053), ('discrete', 0.052), ('anand', 0.052), ('bia', 0.052), ('coughlan', 0.052), ('transform', 0.05), ('ep', 0.048), ('convexity', 0.047), ('lagrange', 0.046), ('af', 0.042), ('minimizing', 0.042), ('ya', 0.041), ('james', 0.041), ('ti', 0.04), ('apa', 0.04), ('hfj', 0.04), ('jab', 0.04), ('sial', 0.04), ('algebraic', 0.039), ('alternatively', 0.039), ('algorithm', 0.038), ('minimize', 0.038), ('preparation', 0.038), ('ht', 0.038), ('generalized', 0.037), ('minus', 0.036), ('decomposing', 0.036), ('panel', 0.035), ('effective', 0.035), ('concavity', 0.034), ('cities', 0.034), ('salesman', 0.034), ('saddle', 0.034), ('decompose', 0.034), ('variables', 0.033), ('existing', 0.033), ('constraints', 0.033), ('xd', 0.032), ('impose', 0.032), ('guaranteed', 0.03), ('illustrate', 0.029), ('complements', 0.029), ('gives', 0.029), ('optimization', 0.029), ('reformulate', 0.028), ('darroch', 0.028), ('field', 0.026), ('auxiliary', 0.026), ('algorithms', 0.026), ('definite', 0.026), ('zt', 0.026), ('seeks', 0.026), ('term', 0.026), ('matching', 0.025), ('manipulations', 0.025), ('ag', 0.025), ('update', 0.025), ('loop', 0.024), ('physics', 0.024), ('procedure', 0.024), ('scaling', 0.024), ('free', 0.024), ('equivalent', 0.023), ('proves', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 180 nips-2001-The Concave-Convex Procedure (CCCP)
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
2 0.13373628 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1
3 0.10704603 105 nips-2001-Kernel Machines and Boolean Functions
Author: Adam Kowalczyk, Alex J. Smola, Robert C. Williamson
Abstract: We give results about the learnability and required complexity of logical formulae to solve classification problems. These results are obtained by linking propositional logic with kernel machines. In particular we show that decision trees and disjunctive normal forms (DNF) can be represented by the help of a special kernel, linking regularized risk to separation margin. Subsequently we derive a number of lower bounds on the required complexity of logic formulae using properties of algorithms for generation of linear estimators, such as perceptron and maximal perceptron learning.
4 0.096211821 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Author: Anand Rangarajan, Alan L. Yuille
Abstract: Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille’s algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development. 1
5 0.091821015 188 nips-2001-The Unified Propagation and Scaling Algorithm
Author: Yee W. Teh, Max Welling
Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.
6 0.083698452 8 nips-2001-A General Greedy Approximation Algorithm with Applications
7 0.077706791 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation
8 0.07658112 62 nips-2001-Duality, Geometry, and Support Vector Regression
9 0.075631328 147 nips-2001-Pranking with Ranking
10 0.071932636 171 nips-2001-Spectral Relaxation for K-means Clustering
11 0.070621714 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
12 0.06483987 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images
13 0.062427361 21 nips-2001-A Variational Approach to Learning Curves
14 0.061482765 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
15 0.059613924 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
16 0.057481568 137 nips-2001-On the Convergence of Leveraging
17 0.056972507 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
18 0.051924914 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
19 0.050255895 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
20 0.048591603 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
topicId topicWeight
[(0, -0.149), (1, 0.034), (2, 0.03), (3, -0.019), (4, 0.07), (5, -0.155), (6, 0.017), (7, -0.037), (8, 0.017), (9, 0.003), (10, 0.042), (11, 0.057), (12, 0.074), (13, 0.105), (14, 0.064), (15, 0.039), (16, -0.114), (17, 0.109), (18, 0.019), (19, -0.047), (20, 0.025), (21, -0.093), (22, -0.069), (23, 0.065), (24, 0.026), (25, -0.195), (26, -0.007), (27, -0.034), (28, -0.051), (29, -0.007), (30, -0.102), (31, -0.035), (32, -0.176), (33, -0.012), (34, 0.062), (35, 0.104), (36, 0.068), (37, -0.018), (38, 0.019), (39, -0.033), (40, -0.058), (41, -0.107), (42, -0.036), (43, -0.106), (44, 0.012), (45, -0.154), (46, 0.044), (47, 0.013), (48, -0.069), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.94856936 180 nips-2001-The Concave-Convex Procedure (CCCP)
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
2 0.56084257 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
Author: Hilbert J. Kappen, Wim Wiegerinck
Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1
3 0.48372722 156 nips-2001-Rao-Blackwellised Particle Filtering via Data Augmentation
Author: Christophe Andrieu, Nando D. Freitas, Arnaud Doucet
Abstract: In this paper, we extend the Rao-Blackwellised particle filtering method to more complex hybrid models consisting of Gaussian latent variables and discrete observations. This is accomplished by augmenting the models with artificial variables that enable us to apply Rao-Blackwellisation. Other improvements include the design of an optimal importance proposal distribution and being able to swap the sampling an selection steps to handle outliers. We focus on sequential binary classifiers that consist of linear combinations of basis functions , whose coefficients evolve according to a Gaussian smoothness prior. Our results show significant improvements. 1
4 0.47967637 120 nips-2001-Minimax Probability Machine
Author: Gert Lanckriet, Laurent E. Ghaoui, Chiranjib Bhattacharyya, Michael I. Jordan
Abstract: When constructing a classifier, the probability of correct classification of future data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization. We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly. 1
5 0.47820628 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Author: Anand Rangarajan, Alan L. Yuille
Abstract: Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille’s algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development. 1
6 0.41811603 195 nips-2001-Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
7 0.39786336 62 nips-2001-Duality, Geometry, and Support Vector Regression
8 0.38853109 188 nips-2001-The Unified Propagation and Scaling Algorithm
9 0.37203208 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
10 0.36087516 147 nips-2001-Pranking with Ranking
11 0.34509903 105 nips-2001-Kernel Machines and Boolean Functions
12 0.31933787 45 nips-2001-Boosting and Maximum Likelihood for Exponential Models
13 0.31339246 69 nips-2001-Escaping the Convex Hull with Extrapolated Vector Machines
14 0.29887578 171 nips-2001-Spectral Relaxation for K-means Clustering
15 0.29521647 21 nips-2001-A Variational Approach to Learning Curves
16 0.29136154 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
17 0.2860077 8 nips-2001-A General Greedy Approximation Algorithm with Applications
18 0.27514538 89 nips-2001-Grouping with Bias
19 0.25619861 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images
20 0.25334799 167 nips-2001-Semi-supervised MarginBoost
topicId topicWeight
[(14, 0.032), (17, 0.037), (19, 0.03), (27, 0.123), (30, 0.037), (38, 0.013), (59, 0.014), (72, 0.048), (79, 0.467), (91, 0.1)]
simIndex simValue paperId paperTitle
1 0.96608502 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
2 0.91628861 2 nips-2001-3 state neurons for contextual processing
Author: Ádám Kepecs, S. Raghavachari
Abstract: Neurons receive excitatory inputs via both fast AMPA and slow NMDA type receptors. We find that neurons receiving input via NMDA receptors can have two stable membrane states which are input dependent. Action potentials can only be initiated from the higher voltage state. Similar observations have been made in several brain areas which might be explained by our model. The interactions between the two kinds of inputs lead us to suggest that some neurons may operate in 3 states: disabled, enabled and firing. Such enabled, but non-firing modes can be used to introduce context-dependent processing in neural networks. We provide a simple example and discuss possible implications for neuronal processing and response variability. 1
3 0.91589051 115 nips-2001-Linear-time inference in Hierarchical HMMs
Author: Kevin P. Murphy, Mark A. Paskin
Abstract: The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original infertime, where is ence algorithm is rather complicated, and takes the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference. ¥ ©§ £ ¨¦¥¤¢ © £ ¦¥¤¢
same-paper 4 0.91181713 180 nips-2001-The Concave-Convex Procedure (CCCP)
Author: Alan L. Yuille, Anand Rangarajan
Abstract: We introduce the Concave-Convex procedure (CCCP) which constructs discrete time iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1
5 0.88678622 78 nips-2001-Fragment Completion in Humans and Machines
Author: David Jacobs, Bas Rokers, Archisman Rudra, Zili Liu
Abstract: Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory. This model iterates between bottom-up and top-down computations. These are tied together using a Markov model of words that allows memory to be accessed with a simple feature set, and enables a bottom-up process to compute a probability distribution of possible completions of word fragments, in a manner similar to models of visual perceptual completion.
6 0.59585774 183 nips-2001-The Infinite Hidden Markov Model
7 0.55455112 118 nips-2001-Matching Free Trees with Replicator Equations
8 0.54898149 86 nips-2001-Grammatical Bigrams
9 0.53815949 172 nips-2001-Speech Recognition using SVMs
10 0.53724921 184 nips-2001-The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank
11 0.52826023 162 nips-2001-Relative Density Nets: A New Way to Combine Backpropagation with HMM's
12 0.52171057 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
13 0.51886868 3 nips-2001-ACh, Uncertainty, and Cortical Inference
14 0.51883543 169 nips-2001-Small-World Phenomena and the Dynamics of Information
15 0.51670378 171 nips-2001-Spectral Relaxation for K-means Clustering
16 0.51159227 188 nips-2001-The Unified Propagation and Scaling Algorithm
17 0.51035279 132 nips-2001-Novel iteration schemes for the Cluster Variation Method
18 0.50502884 79 nips-2001-Gaussian Process Regression with Mismatched Models
19 0.50308168 194 nips-2001-Using Vocabulary Knowledge in Bayesian Multinomial Estimation
20 0.50084919 89 nips-2001-Grouping with Bias