nips nips2006 nips2006-39 knowledge-graph by maker-knowledge-mining

39 nips-2006-Balanced Graph Matching

Source: pdf

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

Abstract: Graph matching is a fundamental problem in Computer Vision and Machine Learning. We present two contributions. First, we give a new spectral relaxation technique for approximate solutions to matching problems, that naturally incorporates one-to-one or one-to-many constraints within the relaxation scheme. The second is a normalization procedure for existing graph matching scoring functions that can dramatically improve the matching accuracy. It is based on a reinterpretation of the graph matching compatibility matrix as a bipartite graph on edges for which we seek a bistochastic normalization. We evaluate our two contributions on a comprehensive test set of random graph matching problems, as well as on image correspondence problem. Our normalization procedure can be used to improve the performance of many existing graph matching algorithms, including spectral matching, graduated assignment and semideﬁnite programming. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Graph matching is a fundamental problem in Computer Vision and Machine Learning. [sent-3, score-0.373]

2 First, we give a new spectral relaxation technique for approximate solutions to matching problems, that naturally incorporates one-to-one or one-to-many constraints within the relaxation scheme. [sent-5, score-0.809]

3 The second is a normalization procedure for existing graph matching scoring functions that can dramatically improve the matching accuracy. [sent-6, score-1.16]

4 It is based on a reinterpretation of the graph matching compatibility matrix as a bipartite graph on edges for which we seek a bistochastic normalization. [sent-7, score-1.409]

5 We evaluate our two contributions on a comprehensive test set of random graph matching problems, as well as on image correspondence problem. [sent-8, score-0.698]

6 Our normalization procedure can be used to improve the performance of many existing graph matching algorithms, including spectral matching, graduated assignment and semideﬁnite programming. [sent-9, score-1.096]

7 As a result, the correspondence problem is commonly referred to as graph matching. [sent-12, score-0.293]

8 In this setting, graph nodes represent feature points extracted from each instance (e. [sent-13, score-0.248]

9 a test image and a template image) and graph edges represent relationships between feature points. [sent-15, score-0.333]

10 The problem of graph matching is to ﬁnd a mapping between the two node sets that preserves as much as possible the relationships between nodes. [sent-16, score-0.637]

11 Because of its combinatorial nature, graph matching is either solved exactly in a very restricted setting (bipartite matching, for example with the Hungarian method) or approximately. [sent-17, score-0.582]

12 Most of the recent literature on graph matching has followed this second path, developing approximate relaxations to the graph matching problem. [sent-18, score-1.22]

13 The ﬁrst contribution is a spectral relaxation for the graph matching problem that incorporates one-to-one or one-to-many mapping constraints, represented as afﬁne constraints. [sent-20, score-0.9]

14 Our second contribution relates to the graph matching scoring function itself, which we argue, is prone to systematic confusion errors. [sent-23, score-0.66]

15 We show how a proper bistochastic normalization of the graph matching compatibility matrix is able to considerably reduce those errors and improve the overall matching performance. [sent-24, score-1.659]

16 This improvement is demonstrated both for our spectral relaxation algorithm, and for three state of the art graph matching algorithms: spectral matching, graduated assignment and semideﬁnite programming. [sent-25, score-1.181]

17 2 Problem formulation Attributed Graph We deﬁne an attributed graph[1] as a graph G = (V, E, A) where each edge e = ij ∈ E is assigned an attribute Ae , which could be a real number or a vector in case of multiattributes. [sent-26, score-0.477]

18 We represent vertex attributes as special edge attributes, i. [sent-27, score-0.224]

19 For example, the nodes could represent feature points with attributes for spatial location/orientation and image feature descriptors, while edge attributes could represent spatial relationships between two nodes such as relative position/orientation. [sent-30, score-0.377]

20 We want to ﬁnd a mapping between V and V ′ that best preserves the attributes between edges e = ij ∈ E and e′ = i′ j ′ ∈ E ′ . [sent-32, score-0.279]

21 Equivalently, we seek a set of correspondences, or matches M = {ii′ } so as to maximize the graph matching score, deﬁned as: ǫGM (M ) = f (Aij , A′ ′ j ′ ) = i ii′ ∈M,jj ′ ∈M ′ ′ f (Ae , A′ ′ ), e (1) e∼e′ with the shorthand notation e ∼ e iff ii ∈ M, jj ′ ∈ M . [sent-33, score-0.724]

22 The function f (·, ·) measures the similarity between edge attributes. [sent-34, score-0.168]

23 For most problems, one requires the matching to have a special structure, such as one-to-one or one-to-many: this is the mapping constraint. [sent-39, score-0.397]

24 Cx ≤ b, ′ x ∈ {0, 1}nn (2) W is a nn′ × nn′ compatibility matrix with Wii′ ,jj ′ = f (Aij , A′ ′ j ′ ). [sent-44, score-0.339]

25 Graph Matching Relaxations Continuous relaxations of the IQP (2) are among the most successful methods for non-bipartite graph matching, and so we focus on them. [sent-46, score-0.265]

26 We review three state of the art matching algorithms: semideﬁnite programming (SDP) [2, 3], graduated assignment (GA) [4], and spectral matching (SM) [5]. [sent-47, score-1.082]

27 We also introduce a new method, Spectral Matching with Afﬁne Constraints (SMAC) that provides a tigher relaxation than SM (and more accurate results in our experiments) while still retaining the speed and scalability beneﬁts of spectral methods, which we also quantify in our evaluations. [sent-48, score-0.335]

28 The relaxation squares the problem size, which we will see, prevents SDP from scaling to large problems. [sent-55, score-0.146]

29 Spectral Matching (SM) In [5], the authors drop the constraint Cx ≤ b during relaxation and only incorporate it during the discretization step. [sent-59, score-0.252]

30 Our method is closely related to the spectral matching formulation of [5], but we are able to impose afﬁne constraints Cx = b on the relaxed solution. [sent-65, score-0.557]

31 We demonstrate later that the ability to maintain this constraint, coupled with scalability and speed of spectral methods, results in a very effective solution to graph matching. [sent-66, score-0.422]

32 Cx = b (3) Note, for one-to-one matching the objective coincides with the IQP for binary x since xT x = n. [sent-69, score-0.373]

33 Discretization We show here how to tighten our approximation during the discretization step in the case of one-to-one matching (we can fall back to this case by introducing dummy nodes). [sent-75, score-0.424]

34 It is a well known result that for any n × n matrix X, X is a permutation matrix iff X1 = X T 1 = 1, X is orthogonal, and X ≥ 0 elementwise. [sent-77, score-0.176]

35 We show here we can obtain a tighter relaxation by incorporating the ﬁrst 2 (out of 3) constraints as a postprocessing before the ﬁnal discretization. [sent-78, score-0.173]

36 We ran extensive graph matching experiments on both real image graphs and synthetic graphs with the algorithms presented above. [sent-92, score-0.709]

37 We noticed a clear trend: the algorithms get confused when there is ambiguity in the compatibility matrix. [sent-93, score-0.301]

38 We extracted a set of feature points (indexed by i and i′ ) in two airplane images, and for each edge e = ij in the ﬁrst graph, we plotted the most similar edges e′ = i′ j ′ in the second graph. [sent-95, score-0.318]

39 As we can see, the ﬁrst edge plotted has many correspondences everywhere in the image and is therefore uninformative. [sent-96, score-0.225]

40 The second edge on the other hand has correspondences with roughly only 5 locations, it is informative, and yet its contribution is outweighted by the ﬁrst edge. [sent-97, score-0.224]

41 Blue arrows indicate edges with high similarity, showing 2 groups: cliques of type 1 (pairing roughly horizontal edges in the 2 images) are uninformative, cliques of type 2 (pairing vertical edges) are distinctive. [sent-104, score-0.291]

42 Figure 2: Left: edges 12 and 13 are uninformative and make spurious connections of strength σ to all edges in the second graph. [sent-105, score-0.349]

43 Middle: corresponding compatibility matrices W (top: before normalization, bottom: after normalization). [sent-107, score-0.329]

44 Right: margin as a function of σ (difference between correct matching score and best runner-up score). [sent-108, score-0.423]

45 In our simple noise model, edges 12 and 13 are uninformative and make connections to every edge in the second graph, with strength σ (our noise parameter). [sent-111, score-0.509]

46 The informative edge 23 on the other hand only connects to 2′ 3′ . [sent-112, score-0.164]

47 When the noise is small enough, the optimal matching is the desired permutation p∗ = {11′ , 22′ , 33′ }, with an initial score of 8 for σ = 0. [sent-114, score-0.564]

48 W is unbalanced, with some edges making spurious connections, overwhelming the inﬂuence of other edges with few connections. [sent-117, score-0.235]

49 In fact we argue this is the main source of confusion for graph matching. [sent-119, score-0.256]

50 The next section introduces a normalization algorithm to address this problem. [sent-120, score-0.205]

51 Figure 3: Left: matching compatibility matrix W and edge similarity matrix S. [sent-121, score-0.918]

52 5 How to balance the Compatibility Matrix As we saw in the previous section, a main source of confusion for graph matching algorithms is the unbalance in the compatibility matrix. [sent-124, score-0.96]

53 This confusion occurs when an edge e ∈ E has many good potential matches e′ ∈ E ′ . [sent-125, score-0.209]

54 Such an edge is not discriminative and its inﬂuence should be decreased. [sent-126, score-0.125]

55 On the other hand, an edge with small number of good matches will help disambiguate the optimal matching. [sent-127, score-0.162]

56 Edge Similarity Matrix S The similarity function f (·, ·) can be interpreted in two ways: either as a similarity between edges ij ∈ E and i′ j ′ ∈ E ′ , or as a compatibility between match hypothesis ii′ ∈ M and jj ′ ∈ M . [sent-132, score-0.595]

57 We deﬁne the similarity matrix S of size m × m′ as Sij,i′ j ′ = f (Aij , A′ ′ j ′ ), and (as before) the i compatibility matrix W of size nn′ × nn′ as Wii′ ,jj ′ = f (Aij , A′ ′ j ′ ), see Figure 3. [sent-133, score-0.468]

58 Each vertex i i in the ﬁrst graph should ideally match to a small number of vertices i′ in the second graph. [sent-134, score-0.265]

59 Similarly, each edge e = ij ∈ E should also match to a small number of edges e′ = i′ j ′ ∈ E ′ . [sent-135, score-0.306]

60 2 Bistochastic Normalization of Edge Similarity Matrix S Recall we are given a compatibility matrix W . [sent-140, score-0.339]

61 We can formulate the normalization as solving the following balancing problem: Find (D, D′ ) diagonal matrices of order m, m′ s. [sent-144, score-0.233]

62 DSD′ is rectangular bistochastic (5) We propose the following algorithm to solve (5), and then show its correctness. [sent-146, score-0.19]

63 Input: compatibility matrix W, of size nn′ × nn′ 2. [sent-148, score-0.363]

64 Spurious correspondences are suppressed and informative correspondances such as W23,2′ 3′ are enhanced, which makes the correct correspondence clearer. [sent-158, score-0.191]

65 The plot on the right shows that normalization makes the matching robust to arbitrarily large noise in this model, while unnormalized correspondences will eventually result in incorrect matchings. [sent-159, score-0.735]

66 Spectral matching and SMAC were implemented using the standard Lanczos eigensolver available with MATLAB, and we implemented an optimized version of GA in C++. [sent-164, score-0.413]

67 1 One-to-one Attributed Graph Matching on Random Graphs Following [4], we performed a comprehensive evaluation of the 4 algorithms on random one-to-one graph matching problems. [sent-166, score-0.582]

68 For each matching problem, we constructed a graph G with n = 20 nodes, and m random edges (m = 10%n2 in a ﬁrst series of experiments). [sent-167, score-0.674]

69 Each edge ij ∈ E, was assigned a random attribute Aij distributed uniformly in [0, 1]. [sent-168, score-0.215]

70 We then created a perturbed graph G′ by adding noise on the edge attributes: A′ ′ j ′ = Ap(i)p(j) + noise, where p is a random permutation; i the noise was distributed uniformly in [0, σ], σ varying from 0 to 6. [sent-169, score-0.538]

71 The compatibility matrix W was computed from graphs G, G′ as follows: Wii′ ,jj ′ = exp(−|Aij − A′ ′ j ′ |2 ), ∀ij ∈ E, i′ j ′ ∈ E ′ . [sent-170, score-0.37]

72 i For each noise level we generated 100 different matching problems and computed the average error rate by comparing the discretized matching to the ground truth permutation. [sent-171, score-0.928]

73 Effect of Normalization on each method We computed the average error rates with and without normalization of the compatibility matrix W , for each method: SDP, GA, SM and SMAC, see Figure 4. [sent-172, score-0.569]

74 We can see dramatic improvement due to normalization, regardless of the relaxation method used. [sent-173, score-0.146]

75 Comparison Across Methods We plotted the performance of all 4 methods using normalized compatibility matrices in Figure 5 (left), again, with 100 trials per noise level. [sent-175, score-0.445]

76 These results validate SMAC with normalization as a state-of-the-art relaxation method for graph matching. [sent-177, score-0.56]

77 Inﬂuence of edge density and graph size We experimented with varying edge density: noise σ = 2, n = 20, edge density varying from 10% to 100% by increments of 10% with 20 trials per increment. [sent-178, score-0.776]

78 For SMAC, the normalization resulted in an average absolute error reduction of 60%, and for all density levels the reduction was at least 40%. [sent-179, score-0.23]

79 We also did the same experiments, but with ﬁxed edge density and varying graph sizes, from 10 to 100 nodes. [sent-181, score-0.36]

80 For SMAC, normalization resulted in an average absolute error reduction of 52%; for all graph sizes the reduction was at least 40%. [sent-182, score-0.439]

81 1 1 2 3 4 Noise level 5 6 0 1 2 3 4 Noise level 5 6 Figure 4: Comparison of matching performance with normalized and unnormalized compatibility matrices. [sent-228, score-0.744]

82 As mentioned previously, the SDP relaxation squares the problem size (in addition to requiring expensive solvers), greatly impacting its speed and scalability. [sent-233, score-0.195]

83 For a set of random one-to-one matching problems of varying size n (horizontal axis), we averaged the time for computing the relaxed solution of all four methods (10 trials for each n). [sent-235, score-0.514]

84 2 Image correspondence We also tested the effect of normalization on a simple but instructive image correspondence task. [sent-240, score-0.405]

85 In each of two images to match, we formed a multiple attribute graph by sub-sampling n = 100 canny edge points as graph nodes. [sent-241, score-0.572]

86 Each pair of feature points e = ij within 30 pixels was assigned two → − → − attributes: angle ∠e = ∠ ij and distance d(e) = || ij ||. [sent-242, score-0.183]

87 By using simple geometric attributes, we ′ emphasized the effect of normalization on the energy function, rather than feature design. [sent-245, score-0.205]

88 Figure 6 shows an image correspondence example between the two airplane images of Figure 1. [sent-246, score-0.156]

89 Let us return to Figure 1 to see the effect of normalization on S(e, e′ ). [sent-252, score-0.205]

90 As we saw, there are roughly 2 types of connections: 1) horizontal edges (uninformative) and 2) vertical edges (discriminative). [sent-253, score-0.217]

91 We can view normalization as imposing an upper bound on the contribution of each connection: the upper bound is smaller for spurious matches, and higher for discriminative matches. [sent-259, score-0.287]

92 7 Conclusion While recent literature mostly focuses on improving relaxation methods for graph matching problems, we contribute both an improved relaxation algorithm, SMAC, and a method for improving the energy function itself, graph balancing with bistochastic normalization. [sent-260, score-1.243]

93 We motivate the normalization with an intuitive example, showing it improves noise tolerance by enhancing informative matches and de-enhancing uninformative matches. [sent-262, score-0.444]

94 The experiments we performed on random one-to-one matchings show that normalization dramatically improves both our relaxation method SMAC, and the three algorithms mentioned. [sent-263, score-0.351]

95 We also demonstrated the value of normalization for establishing one-to-one correspondences between image pairs. [sent-264, score-0.305]

96 Normalization imposes an upper bound on the score contribution of each edge in proportion to its saliency. [sent-265, score-0.206]

97 05 0 1 2 3 4 Noise level 5 6 −3 10 0 50 100 Problem Size (|V|=|V’| = # of nodes) 150 0 0 50 100 Problem Size (|V|=|V’| = # of nodes) 150 Figure 5: Left: comparison of different methods with normalized compatibility matrices. [sent-283, score-0.336]

98 Axes: vertical is error rate averaged over 100 trials; horizontal is noise level. [sent-284, score-0.18]

99 Middle,right: computation times of graph matching methods (left: log-scale, right: linear scale). [sent-286, score-0.582]

100 A spectral technique for correspondence problems using pairwise constraints. [sent-306, score-0.201]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('smac', 0.38), ('matching', 0.373), ('compatibility', 0.301), ('sdp', 0.254), ('graph', 0.209), ('normalization', 0.205), ('ga', 0.177), ('sm', 0.174), ('bistochastic', 0.16), ('iqp', 0.16), ('xorth', 0.16), ('relaxation', 0.146), ('ceq', 0.14), ('graduated', 0.14), ('nn', 0.138), ('edge', 0.125), ('spectral', 0.117), ('cx', 0.095), ('edges', 0.092), ('noise', 0.089), ('af', 0.087), ('correspondence', 0.084), ('wii', 0.079), ('uninformative', 0.074), ('attributes', 0.071), ('correspondences', 0.068), ('aij', 0.064), ('ij', 0.061), ('weq', 0.06), ('relaxations', 0.056), ('constraint', 0.055), ('attributed', 0.053), ('assignment', 0.052), ('permutation', 0.052), ('discretization', 0.051), ('spurious', 0.051), ('score', 0.05), ('pc', 0.048), ('iff', 0.048), ('discretize', 0.048), ('scalability', 0.047), ('confusion', 0.047), ('xii', 0.044), ('similarity', 0.043), ('relaxed', 0.04), ('affine', 0.04), ('aii', 0.04), ('airplane', 0.04), ('beq', 0.04), ('dsd', 0.04), ('eigensolver', 0.04), ('jianbo', 0.04), ('sedumi', 0.04), ('xt', 0.04), ('connections', 0.04), ('nodes', 0.039), ('informative', 0.039), ('matrix', 0.038), ('matches', 0.037), ('cliques', 0.037), ('level', 0.035), ('pairing', 0.035), ('uence', 0.033), ('synthetic', 0.033), ('horizontal', 0.033), ('rate', 0.033), ('image', 0.032), ('program', 0.032), ('rayleigh', 0.032), ('elementwise', 0.032), ('eigenvector', 0.032), ('contribution', 0.031), ('graphs', 0.031), ('preserves', 0.031), ('ii', 0.03), ('rectangular', 0.03), ('orthogonal', 0.03), ('saw', 0.03), ('ae', 0.03), ('semide', 0.029), ('attribute', 0.029), ('match', 0.028), ('matrices', 0.028), ('vertex', 0.028), ('art', 0.027), ('trials', 0.027), ('proposition', 0.027), ('constraints', 0.027), ('bipartite', 0.027), ('jj', 0.027), ('varying', 0.026), ('normalizing', 0.026), ('contributed', 0.025), ('speed', 0.025), ('error', 0.025), ('rows', 0.025), ('solution', 0.024), ('size', 0.024), ('mapping', 0.024), ('mm', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 39 nips-2006-Balanced Graph Matching

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

2 0.20220561 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Author: Kilian Q. Weinberger, Fei Sha, Qihui Zhu, Lawrence K. Saul

Abstract: In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semideﬁnite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further reﬁned by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes. 1

3 0.14998892 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

Author: Ron Zass, Amnon Shashua

Abstract: In this paper we focus on the issue of normalization of the afﬁnity matrix in spectral clustering. We show that the difference between N-cuts and Ratio-cuts is in the error measure being used (relative-entropy versus L1 norm) in ﬁnding the closest doubly-stochastic matrix to the input afﬁnity matrix. We then develop a scheme for ﬁnding the optimal, under Frobenius norm, doubly-stochastic approximation using Von-Neumann’s successive projections lemma. The new normalization scheme is simple and efﬁcient and provides superior clustering performance over many of the standardized tests. 1

4 0.14739154 34 nips-2006-Approximate Correspondences in High Dimensions

Author: Kristen Grauman, Trevor Darrell

Abstract: Pyramid intersection is an efﬁcient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classiﬁer, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1

5 0.14183837 201 nips-2006-Using Combinatorial Optimization within Max-Product Belief Propagation

Author: Daniel Tarlow, Gal Elidan, Daphne Koller, John C. Duchi

Abstract: In general, the problem of computing a maximum a posteriori (MAP) assignment in a Markov random ﬁeld (MRF) is computationally intractable. However, in certain subclasses of MRF, an optimal or close-to-optimal assignment can be found very efﬁciently using combinatorial optimization algorithms: certain MRFs with mutual exclusion constraints can be solved using bipartite matching, and MRFs with regular potentials can be solved using minimum cut methods. However, these solutions do not apply to the many MRFs that contain such tractable components as sub-networks, but also other non-complying potentials. In this paper, we present a new method, called C OMPOSE, for exploiting combinatorial optimization for sub-networks within the context of a max-product belief propagation algorithm. C OMPOSE uses combinatorial optimization for computing exact maxmarginals for an entire sub-network; these can then be used for inference in the context of the network as a whole. We describe highly efﬁcient methods for computing max-marginals for subnetworks corresponding both to bipartite matchings and to regular networks. We present results on both synthetic and real networks encoding correspondence problems between images, which involve both matching constraints and pairwise geometric constraints. We compare to a range of current methods, showing that the ability of C OMPOSE to transmit information globally across the network leads to improved convergence, decreased running time, and higher-scoring assignments.

6 0.11690661 35 nips-2006-Approximate inference using planar graph decomposition

7 0.1145426 80 nips-2006-Fundamental Limitations of Spectral Clustering

8 0.099696368 77 nips-2006-Fast Computation of Graph Kernels

9 0.09707208 86 nips-2006-Graph-Based Visual Saliency

10 0.093162969 163 nips-2006-Prediction on a Graph with a Perceptron

11 0.083292291 117 nips-2006-Learning on Graph with Laplacian Regularization

12 0.076899931 110 nips-2006-Learning Dense 3D Correspondence

13 0.075109929 14 nips-2006-A Small World Threshold for Economic Network Formation

14 0.07506986 92 nips-2006-High-Dimensional Graphical Model Selection Using $\ell 1$-Regularized Logistic Regression

15 0.074250698 5 nips-2006-A Kernel Method for the Two-Sample-Problem

16 0.073249429 66 nips-2006-Detecting Humans via Their Pose

17 0.07098686 7 nips-2006-A Local Learning Approach for Clustering

18 0.069208354 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

19 0.068582065 128 nips-2006-Manifold Denoising

20 0.066476703 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, 0.072), (2, 0.032), (3, 0.105), (4, 0.134), (5, -0.108), (6, 0.025), (7, 0.092), (8, -0.012), (9, -0.159), (10, 0.149), (11, -0.139), (12, 0.01), (13, -0.035), (14, 0.08), (15, -0.052), (16, -0.162), (17, -0.077), (18, 0.149), (19, -0.03), (20, 0.173), (21, -0.119), (22, 0.072), (23, 0.012), (24, 0.006), (25, -0.062), (26, 0.013), (27, -0.053), (28, 0.184), (29, 0.062), (30, 0.02), (31, -0.03), (32, 0.111), (33, -0.015), (34, -0.021), (35, 0.055), (36, -0.007), (37, 0.041), (38, -0.128), (39, 0.204), (40, -0.056), (41, -0.047), (42, -0.13), (43, -0.073), (44, -0.029), (45, 0.052), (46, -0.054), (47, 0.032), (48, 0.104), (49, -0.129)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97189778 39 nips-2006-Balanced Graph Matching

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

2 0.58954626 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

Author: Kilian Q. Weinberger, Fei Sha, Qihui Zhu, Lawrence K. Saul

3 0.58701277 201 nips-2006-Using Combinatorial Optimization within Max-Product Belief Propagation

Author: Daniel Tarlow, Gal Elidan, Daphne Koller, John C. Duchi

4 0.55721134 34 nips-2006-Approximate Correspondences in High Dimensions

Author: Kristen Grauman, Trevor Darrell

5 0.53652185 35 nips-2006-Approximate inference using planar graph decomposition

Author: Amir Globerson, Tommi S. Jaakkola

Abstract: A number of exact and approximate methods are available for inference calculations in graphical models. Many recent approximate methods for graphs with cycles are based on tractable algorithms for tree structured graphs. Here we base the approximation on a different tractable model, planar graphs with binary variables and pure interaction potentials (no external ﬁeld). The partition function for such models can be calculated exactly using an algorithm introduced by Fisher and Kasteleyn in the 1960s. We show how such tractable planar models can be used in a decomposition to derive upper bounds on the partition function of non-planar models. The resulting algorithm also allows for the estimation of marginals. We compare our planar decomposition to the tree decomposition method of Wainwright et. al., showing that it results in a much tighter bound on the partition function, improved pairwise marginals, and comparable singleton marginals. Graphical models are a powerful tool for modeling multivariate distributions, and have been successfully applied in various ﬁelds such as coding theory and image processing. Applications of graphical models typically involve calculating two types of quantities, namely marginal distributions, and MAP assignments. The evaluation of the model partition function is closely related to calculating marginals [12]. These three problems can rarely be solved exactly in polynomial time, and are provably computationally hard in the general case [1]. When the model conforms to a tree structure, however, all these problems can be solved in polynomial time. This has prompted extensive research into tree based methods. For example, the junction tree method [6] converts a graphical model into a tree by clustering nodes into cliques, such that the graph over cliques is a tree. The resulting maximal clique size (cf. tree width) may nevertheless be prohibitively large. Wainwright et. al. [9, 11] proposed an approximate method based on trees known as tree reweighting (TRW). The TRW approach decomposes the potential vector of a graphical model into a mixture over spanning trees of the model, and then uses convexity arguments to bound various quantities, such as the partition function. One key advantage of this approach is that it provides bounds on partition function value, a property which is not shared by approximations based on Bethe free energies [13]. In this paper we focus on a different class of tractable models: planar graphs. A graph is called planar if it can be drawn in the plane without crossing edges. Works in the 1960s by physicists Fisher [5] and Kasteleyn [7], among others, have shown that the partition function for planar graphs may be calculated in polynomial time. This, however, is true under two key restrictions. One is that the variables xi are binary. The other is that the interaction potential depends only on xi xj (where xi ∈ {±1}), and not on their individual values (i.e., the zero external ﬁeld case). Here we show how the above method can be used to obtain upper bounds on the partition function for non-planar graphs. As in TRW, we decompose the potential of a non-planar graph into a sum over spanning planar models, and then use a convexity argument to obtain an upper bound on the log partition function. The bound optimization is a convex problem, and can be solved in polynomial time. We compare our method with TRW on a planar graph with an external ﬁeld, and show that it performs favorably with respect to both pairwise marginals and the bound on the partition function, and the two methods give similar results for singleton marginals. 1 Deﬁnitions and Notations Given a graph G with n vertices and a set of edges E, we are interested in pairwise Markov Random Fields (MRF) over the graph G. A pairwise MRF [13] is a multivariate distribution over variables x = {x1 , . . . , xn } deﬁned as 1 P p(x) = e ij∈E fij (xi ,xj ) (1) Z where fij are a set of |E| functions, or interaction potentials, deﬁned over pairs of variables. The P partition function is deﬁned as Z = x e ij∈E fij (xi ,xj ) . Here we will focus on the case where xi ∈ {±1}. Furthermore, we will be interested in interaction potentials which only depend on agreement or disagreement between the signs of their variables. We deﬁne those by 1 θij (1 + xi xj ) = θij I(xi = xj ) (2) 2 so that fij (xi , xj ) is zero if xi = xj and θij if xi = xj . The model is then deﬁned via the set of parameters θij . We use θ to denote the vector of parameters θij , and denote the partition function by Z(θ) to highlight its dependence on these parameters. f (xi , xj ) = A graph G is deﬁned as planar if it can be drawn in the plane without any intersection of edges [4]. With some abuse of notation, we deﬁne E as the set of line segments in 2 corresponding to the edges in the graph. The regions of 2 \ E are deﬁned as the faces of the graph. The face which corresponds to an unbounded region is called the external face. Given a planar graph G, its dual graph G∗ is deﬁned in the following way: the vertices of G∗ correspond to faces of G, and there is an edge between two vertices in G∗ iff the two corresponding faces in G share an edge. If the graph G is weighted, the weight on an edge in G∗ is the weight on the edge shared by the corresponding faces in G. A plane triangulation of a planar graph G is obtained from G by adding edges such that all the faces of the resulting graph have exactly three vertices. Thus a plane triangulated graph has a dual where all vertices have degree three. It can be shown that every plane graph can be plane triangulated [4]. We shall also need the notion of a perfect matching on a graph. A perfect matching on a graph G is deﬁned as a set of edges H ⊆ E such that every vertex in G has exactly one edge in H incident on it. If the graph is weighted, the weight of the matching is deﬁned as the product of the weights of the edges in the matching. Finally, we recall the deﬁnition of a marginal polytope of a graph [12]. Consider an MRF over a graph G where fij are given by Equation 2. Denote the probability of the event I(xi = xj ) under p(x) by τij . The marginal polytope of G, denoted by M(G), is deﬁned as the set of values τij that can be obtained under some assignment to the parameters θij . For a general graph G the polytope M(G) cannot be described using a polynomial number of inequalities. However, for planar graphs, it turns out that a set of O(n3 ) constraints, commonly referred to as triangle inequalities, sufﬁce to describe M(G) (see [3] page 434). The triangle inequalities are deﬁned by 1 TRI(n) = {τij : τij + τjk − τik ≤ 1, τij + τjk + τik ≥ 1, ∀i, j, k ∈ {1, . . . , n}} (3) Note that the above inequalities actually contain variables τij which do not correspond to edges in the original graph G. Thus the equality M(G) = TRI(n) should be understood as referring only to the values of τij that correspond to edges in the graph. Importantly, the values of τij for edges not in the graph need not be valid marginals for any MRF. In other words M(G) is a projection of TRI(n) on the set of edges of G. It is well known that the marginal polytope for trees is described via pairwise constraints. It is thus interesting that for planar graphs, it is triplets, rather than pairwise 1 The deﬁnition here is slightly different from that in [3], since here we refer to agreement probabilities, whereas [3] refers to disagreement probabilities. This polytope is also referred to as the cut polytope. constraints, that characterize the polytope. In this sense, planar graphs and trees may be viewed as a hierarchy of polytope complexity classes. It remains an interesting problem to characterize other structures in this hierarchy and their related inference algorithms. 2 Exact calculation of partition function using perfect matching The seminal works of Kasteleyn [7] and Fisher [5] have shown how one can calculate the partition function for a binary MRF over a planar graph with pure interaction potentials. We brieﬂy review Fisher’s construction, which we will use in what follows. Our interpretation of the method differs somewhat from that of Fisher, but we believe it is more straightforward. The key idea in calculating the partition function is to convert the summation over values of x to the problem of calculating the sum of weights of all perfect matchings in a graph constructed from G, as shown below. In this section, we consider weighted graphs (graphs with numbers assigned to their edges). For the graph G associated with the pairwise MRF, we assign weights wij = e2θij to the edges. The ﬁrst step in the construction is to plane triangulate the graph G. Let us call the resulting graph GT . We deﬁne an MRF on GT by assigning a parameter θij = 0 to the edges that have been added to G, and the corresponding weight wij = 1. Thus GT essentially describes the same distribution as G, and therefore has the same partition function. We can thus restrict our attention to calculating the partition function for the MRF on GT . As a ﬁrst step in calculating a partition function over GT , we introduce the following deﬁnition: a ˆ set of edges E in GT is an agreement edge set (or AES) if for every triangle face F in GT one of the ˆ ˆ following holds: The edges in F are all in E, or exactly one of the edges in F is in E. The weight ˆ is deﬁned as the product of the weights of the edges in E. ˆ of a set E It can be shown that there exists a bijection between pairs of assignments {x, −x} and agreement edge sets. The mapping from x to an edge set is simply the set of edges such that xi = xj . It is easy to see that this is an agreement edge set. The reverse mapping is obtained by ﬁnding an assignment x such that xi = xj iff the corresponding edge is in the agreement edge set. The existence of this mapping can be shown by induction on the number of (triangle) faces. P The contribution of a given assignment x to the partition function is e ˆ sponds to an AES denoted by E it is easy to see that P e ij∈E θij I(xi =xj ) = e− P ij∈E θij P e ˆ ij∈E 2θij = ce P ˆ ij∈E ij∈E 2θij θij I(xi =xj ) =c wij . If x corre(4) ˆ ij∈E P where c = e− ij∈E θij . Deﬁne the superset Λ as the set of agreement edge sets. The above then implies that Z(θ) = 2c E∈Λ ij∈E wij , and is thus proportional to the sum of AES weights. ˆ ˆ To sum over agreement edge sets, we use the following elegant trick introduced by Fisher [5]. Construct a new graph GPM from the dual of GT by introducing new vertices and edges according to the following rule: Replace each original vertex with three vertices that are connected to each other, and assign a weight of one to the new edges. Next, consider the three neighbors of the original vertex 2 . Connect each of the three new vertices to one of these three neighbors, keeping the original weights on these edges. The transformation is illustrated in Figure 1. The new graph GPM has O(3n) vertices, and is also planar. It can be seen that there is a one to one correspondence between perfect matchings in GPM and agreement edge sets in GT . Deﬁne Ω to be the set of perfect matchings in GPM . Then Z(θ) = 2c M ∈Ω ij∈M wij where we have used the fact that all the new weights have a value of one. Thus, the partition function is a sum over the weights of perfect matchings in GPM . Finally, we need a way of summing over the weights of the set of perfect matchings in a graph. Kasteleyn [7] proved that for a planar graph GPM , this sum may be obtained using the following sequence of steps: • Direct the edges of the graph GPM such that for every face (except possibly the external face), the number of edges on its perimeter oriented in a clockwise manner is odd. Kasteleyn showed that such a so called Pfafﬁan orientation may be constructed in polynomial time for a planar graph (see also [8] page 322). 2 Note that in the dual of GT all vertices have degree three, since GT is plane triangulated. 1.2 0.7 0.6 1 1 1 0.8 0.6 0.8 1.5 1.4 1.5 1 1 1.2 1 1 1 1 0.7 1.4 1 1 1 Figure 1: Illustration of the graph transformations in Section 2 for a complete graph with four vertices. Left panel shows the original weighted graph (dotted edges and grey vertices) and its dual (solid edges and black vertices). Right panel shows the dual graph with each vertex replaced by a triangle (the graph GPM in the text). Weights for dual graph edges correspond to the weights on the original graph. • Deﬁne the matrix P (GPM ) to be a skew symmetric matrix such that Pij = 0 if ij is not an edge, Pij = wij if the arrow on edge ij runs from i to j and Pij = −wij otherwise. • The sum over weighted matchings can then be shown to equal |P (GPM )|. The partition function is thus given by Z(θ) = 2c |P (GPM )|. To conclude this section we reiterate the following two key points: the partition function of a binary MRF over a planar graph with interaction potentials as in Equation 2 may be calculated in polynomial time by calculating the determinant of a matrix of size O(3n). An important outcome of this result is that the functional relation between Z(θ) and the parameters θij is known, a fact we shall use in what follows. 3 Partition function bounds via planar decomposition Given a non-planar graph G over binary variables with a vector of interaction potentials θ, we wish to use the exact planar computation to obtain a bound on the partition function of the MRF on G. We assume for simplicity that the potentials on the MRF for G are given in the form of Equation 2. Thus, G violates the assumptions of the previous section only in its non-planarity. Deﬁne G(r) as a set of spanning planar subgraphs of G, i.e., each graph G(r) is planar and contains all the vertices of G and some its edges. Denote by m the number of such graphs. Introduce the following deﬁnitions: (r) • θ (r) is a set of parameters on the edges of G(r) , and θij is an element in this set. Z(θ (r) ) is the partition function of the MRF on G(r) with parameters θ (r) . ˆ (r) ˆ(r) • θ is a set of parameters on the edges of G such that if edge (ij) is in G(r) then θij = (r) ˆ(r) θ , and otherwise θ = 0. ij ij Given a distribution ρ(r) on the graphs G(r) (i.e., ρ(r) ≥ 0 for r = 1, . . . , m and assume that the parameters for G(r) are such that ˆ ρ(r)θ θ= (r) r ρ(r) = 1), (5) r Then, by the convexity of the log partition function, as a function of the model parameters, we have ρ(r) log Z(θ (r) ) ≡ f (θ, ρ, θ (r) ) log Z(θ) ≤ (6) r Since by assumption the graphs G(r) are planar, this bound can be calculated in polynomial time. Since this bound is true for any set of parameters θ (r) which satisﬁes the condition in Equation 5 and for any distribution ρ(r), we may optimize over these two variables to obtain the tightest bound possible. Deﬁne the optimal bound for a ﬁxed value of ρ(r) by g(ρ, θ) (optimization is w.r.t. θ (r) ) g(ρ, θ) = f (θ, ρ, θ (r) ) min θ (r) : P ˆ ρ(r)θ (r) =θ (7) Also, deﬁne the optimum of the above w.r.t. ρ by h(θ). h(θ) = min g(θ, ρ) ρ(r) ≥ 0, ρ(r) = 1 (8) Thus, h(θ) is the optimal upper bound for the given parameter vector θ. In the following section we argue that we can in fact ﬁnd the global optimum of the above problem. 4 Globally Optimal Bound Optimization First consider calculating g(ρ, θ) from Equation 7. Note that since log Z(θ (r) ) is a convex function of θ (r) , and the constraints are linear, the overall optimization is convex and can be solved efﬁciently. In the current implementation, we use a projected gradient algorithm [2]. The gradient of f (θ, ρ, θ (r) ) w.r.t. θ (r) is given by ∂f (θ, ρ, θ (r) ) (r) ∂θij (r) = ρ(r) 1 + eθij (r) P −1 (GPM ) (r) k(i,j) Sign(Pk(i,j) (GPM )) (9) where k(i, j) returns the row and column indices of the element in the upper triangular matrix of (r) (r) P (GPM ), which contains the element e2θij . Since the optimization in Equation 7 is convex, it has an equivalent convex dual. Although we do not use this dual for optimization (because of the difﬁculty of expressing the entropy of planar models solely in terms of triplet marginals), it nevertheless allows some insight into the structure of the problem. The dual in this case is closely linked to the notion of the marginal polytope deﬁned in Section 1. Using a derivation similar to [11], we arrive at the following characterization of the dual g(ρ, θ) = max τ ∈TRI(n) ρ(r)H(θ (r) (τ )) θ·τ + (10) r where θ (r) (τ ) denotes the parameters of an MRF on G(r) such that its marginals are given by the restriction of τ to the edges of G(r) , and H(θ (r) (τ )) denotes the entropy of the MRF over G(r) with parameters θ (r) (τ ). The maximized function in Equation 10 is linear in ρ and thus g(ρ, θ) is a pointwise maximum over (linear) convex functions in ρ and is thus convex in ρ. It therefore has no (r) local minima. Denote by θmin (ρ) the set of parameters that minimizes Equation 7 for a given value of ρ. Using a derivation similar to that in [11], the gradient of g(ρ, θ) can be shown to be ∂g(ρ, θ) (r) = H(θmin (ρ)) ∂ρ(r) (11) Since the partition function for G(r) can be calculated efﬁciently, so can the entropy. We can now summarize the algorithm for calculating h(θ) • Initialize ρ0 . Iterate: – For ρt , ﬁnd θ (r) which solves the minimization in Equation 7. – Calculate the gradient of g(ρ, θ) at ρt using the expression in Equation 11 – Update ρt+1 = ρt + αv where v is a feasible search direction calculated from the gradient of g(ρ, θ) and the simplex constraints on ρ. The step size α is calculated via an Armijo line search. – Halt when the change in g(ρ, θ) is smaller than some threshold. Note that the minimization w.r.t. θ (r) is not very time consuming since we can initialize it with the minimum from the previous step, and thus only a few iterations are needed to ﬁnd the new optimum, provided the change in ρ is not too big. The above algorithm is guaranteed to converge to a global optimum of ρ [2], and thus we obtain the tightest possible upper bound on Z(θ) given our planar graph decomposition. The procedure described here is asymmetric w.r.t. ρ and θ (r) . In a symmetric formulation the minimizing gradient steps could be carried out jointly or in an alternating sequence. The symmetric ˆ (r) formulation can be obtained by decoupling ρ and θ (r) in the bi-linear constraint ρ(r)θ = θ. Field Figure 2: Illustration of planar subgraph construction for a rectangular lattice with external ﬁeld. Original graph is shown on the left. The ﬁeld vertex is connected to all vertices (edges not shown). The graph on the right results from isolating the 4th ,5th columns of the original graph (shown in grey), and connecting the ﬁeld vertex to the external vertices of the three disconnected components. Note that the resulting graph is planar. ˜ ˜ Speciﬁcally, we introduce θ (r) = θ (r) ρ(r) and perform the optimization w.r.t. ρ and θ (r) . It can be ˜(r) ) with the relevant (de-coupled) constraint is equivalent shown that a stationary point of f (θ, ρ, θ to the procedure described above. The advantage of this approach is that the exact minimization w.r.t θ (r) is not required before modifying ρ. Our experiments have shown, however, that the methods take comparable times to converge, although this may be a property of the implementation. 5 Estimating Marginals The optimization problem as deﬁned above minimizes an upper bound on the partition function. However, it may also be of interest to obtain estimates of the marginals of the MRF over G. To obtain marginal estimates, we follow the approach in [11]. We ﬁrst characterize the optimum of Equation 7 for a ﬁxed value of ρ. Deriving the Lagrangian of Equation 7 w.r.t. θ (r) we obtain the (r) following characterization of θmin (ρ): Marginal Optimality Criterion: For any two graphs G(r) , G(s) such that the edge (ij) is in both (r) (s) graphs, the optimal parameter vector satisﬁes τij (θmin (ρ)) = τij (θmin (ρ)). Thus, the optimal set of parameters for the graphs G(r) is such that every two graphs agree on the marginals of all the edges they share. This implies that at the optimum, there is a well deﬁned set of marginals over all the edges. We use this set as an approximation to the true marginals. A different method for estimating marginals uses the partition function bound directly. We ﬁrst P calculate partition function bounds on the sums: αi (1) = x:xi =1 e ij∈E fij (xi ,xj ) and αi (−1) = P αi (1) e ij∈E fij (xi ,xj ) and then normalize αi (1)+αi (−1) to obtain an estimate for p(xi = 1). This method has the advantage of being more numerically stable (since it does not depend on derivatives of log Z). However, it needs to be calculated separately for each variable, so that it may be time consuming if one is interested in marginals for a large set of variables. x:xi =−1 6 Experimental Evaluation We study the application of our Planar Decomposition (PDC) P method to a binary MRF on a square P lattice with an external ﬁeld. The MRF is given by p(x) ∝ e ij∈E θij xi xj + i∈V θi xi where V are the lattice vertices, and θi and θij are parameters. Note that this interaction does not satisfy the conditions for exact calculation of the partition function, even though the graph is planar. This problem is in fact NP hard [1]. However, it is possible to obtain the desired interaction form by introducing an additional variable xn+1 that is connected to all the original variables.P Denote the correspondP ij∈E θij xi xj + i∈V θi,n+1 xi xn+1 , where ing graph by Gf . Consider the distribution p(x, xn+1 ) ∝ e θi,n+1 = θi . It is easy to see that any property of p(x) (e.g., partition function, marginals) may be calculated from the corresponding property of p(x, xn+1 ). The advantage of the latter distribution is that it has the desired interaction form. We can thus apply PDC by choosing planar subgraphs of the non-planar graph Gf . 0.25 0.15 0.1 0.05 0.5 1 1.5 Interaction Strength 0.03 Singleton Marginal Error Z Bound Error Pairwise Marginals Error 0.08 PDC TRW 0.2 0.07 0.06 0.05 0.04 0.03 0.02 2 0.5 1 1.5 Interaction Strength 0.025 0.02 0.015 0.01 0.005 2 0.5 1 1.5 Interaction Strength 2 !3 x 10 0.025 0.02 0.015 0.5 1 Field Strength 1.5 2 Singleton Marginal Error Pairwise Marginals Error Z Bound Error 0.03 0.03 0.025 0.02 0.015 0.5 1 Field Strength 1.5 2 9 8 7 6 5 4 3 0.5 1 Field Strength 1.5 2 Figure 3: Comparison of the TRW and Planar Decomposition (PDC) algorithms on a 7×7 square lattice. TRW results shown in red squares, and PDC in blue circles. Left column shows the error in the log partition bound. Middle column is the mean error for pairwise marginals, and right column is the error for the singleton marginal of the variable at the lattice center. Results in upper row are for ﬁeld parameters drawn from U[−0.05, 0.05] and various interaction parameters. Results in the lower row are for interaction parameters drawn from U [−0.5, 0.5] and various ﬁeld parameters. Error bars are standard errors calculated from 40 random trials. There are clearly many ways to choose spanning planar subgraphs of Gf . Spanning subtrees are one option, and were used in [11]. Since our optimization is polynomial in the number of subgraphs, √ we preferred to use a number of subgraphs that is linear in n. The key idea in generating these planar subgraphs is to generate disconnected components of the lattice and connect xn+1 only to the external vertices of these components. Here we generate three disconnected components by isolating two neighboring columns (or rows) from the rest of the graph, resulting in three components. This is √ illustrated in Figure 2. To this set of 2 n graphs, we add the independent variables graph consisting only of edges from the ﬁeld node to all the other nodes. We compared the performance of the PDC and TRW methods 3 4 on a 7 × 7 lattice . Since the exact partition function and marginals can be calculated for this case, we could compare both algorithms to the true values. The MRF parameters were set according to the two following scenarios: 1) Varying Interaction - The ﬁeld parameters θi were drawn uniformly from U[−0.05, 0.05], and the interaction θij from U[−α, α] where α ∈ {0.2, 0.4, . . . , 2}. This is the setting tested in [11]. 2) Varying Field θi was drawn uniformly from U[−α, α], where α ∈ {0.2, 0.4, . . . , 2} and θij from U[−0.5, 0.5]. For each scenario, we calculated the following measures: 1) Normalized log partition error 1 1 alg − log Z true ). 2) Error in pairwise marginals |E| ij∈E |palg (xi = 1, xj = 1) − 49 (log Z ptrue (xi = 1, xj = 1)|. Pairwise marginals were calculated jointly using the marginal optimality criterion of Section 5. 3) Error in singleton marginals. We calculated the singleton marginals for the innermost node in the lattice (i.e., coordinate [3, 3]), which intuitively should be the most difﬁcult for the planar based algorithm. This marginal was calculated using two partition functions, as explained in Section 5 5 . The same method was used for TRW. The reported error measure is |palg (xi = 1) − ptrue (xi = 1)|. Results were averaged over 40 random trials. Results for the two scenarios and different evaluation measures are given in Figure 3. It can be seen that the partition function bound for PDC is signiﬁcantly better than TRW for almost all parameter settings, although the difference becomes smaller for large ﬁeld values. Error for the PDC pairwise 3 TRW and PDC bounds were optimized over both the subgraph parameters and the mixture parameters ρ. In terms of running time, PDC optimization for a ﬁxed value of ρ took about 30 seconds, which is still slower than the TRW message passing implementation. 5 Results using the marginal optimality criterion were worse for PDC, possibly due to its reduced numerical precision. 4 marginals are smaller than those of TRW for all parameter settings. For the singleton parameters, TRW slightly outperforms PDC. This is not surprising since the ﬁeld is modeled by every spanning tree in the TRW decomposition, whereas in PDC not all the structures model a given ﬁeld. 7 Discussion We have presented a method for using planar graphs as the basis for approximating non-planar graphs such as planar graphs with external ﬁelds. While the restriction to binary variables limits the applicability of our approach, it remains relevant in many important applications, such as coding theory and combinatorial optimization. Moreover, it is always possible to convert a non-binary graphical model to a binary one by introducing additional variables. The resulting graph will typically not be planar, even when the original graph over k−ary variables is. However, the planar decomposition method can then be applied to this non-planar graph. The optimization of the decomposition is carried out explicitly over the planar subgraphs, thus limiting the number of subgraphs that can be used in the approximation. In the TRW method this problem is circumvented since it is possible to implicitly optimize over all spanning trees. The reason this can be done for trees is that the entropy of an MRF over a tree may be written as a function of its marginal variables. We do not know of an equivalent result for planar graphs, and it remains a challenge to ﬁnd one. It is however possible to combine the planar and tree decompositions into one single bound, which is guaranteed to outperform the tree or planar approximations alone. The planar decomposition idea may in principle be applied to bounding the value of the MAP assignment. However, as in TRW, it can be shown that the solution is not dependent on the decomposition (as long as each edge appears in some structure), and the problem is equivalent to maximizing a linear function over the marginal polytope (which can be done in polynomial time for planar graphs). However, such a decomposition may suggest new message passing algorithms, as in [10]. Acknowledgments The authors acknowledge support from the Defense Advanced Research Projects Agency (Transfer Learning program). Amir Globerson is also supported by the Rothschild Yad-Hanadiv fellowship. The authors also wish to thank Martin Wainwright for providing his TRW code. References [1] F. Barahona. On the computational complexity of ising spin glass models. J. Phys. A., 15(10):3241–3253, 1982. [2] D. P. Bertsekas, editor. Nonlinear Programming. Athena Scientiﬁc, Belmont, MA, 1995. [3] M.M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springe-Verlag, 1997. [4] R. Diestel. Graph Theory. Springer-Verlag, 1997. [5] M.E. Fisher. On the dimer solution of planar ising models. J. Math. Phys., 7:1776–1781, 1966. [6] M.I. Jordan, editor. Learning in graphical models. MIT press, Cambridge, MA, 1998. [7] P.W. Kasteleyn. Dimer statistics and phase transitions. Journal of Math. Physics, 4:287–293, 1963. [8] L. Lovasz and M.D. Plummer. Matching Theory, volume 29 of Annals of discrete mathematics. NorthHolland, New-York, 1986. [9] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Trans. on Information Theory, 49(5):1120–1146, 2003. [10] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Map estimation via agreement on trees: messagepassing and linear programming. IEEE Trans. on Information Theory, 51(11):1120–1146, 2005. [11] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition function. IEEE Trans. on Information Theory, 51(7):2313–2335, 2005. [12] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Technical report, UC Berkeley Dept. of Statistics, 2003. [13] J.S. Yedidia, W.T. W.T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. on Information Theory, 51(7):2282–2312, 2005.

6 0.52740347 70 nips-2006-Doubly Stochastic Normalization for Spectral Clustering

7 0.49972424 77 nips-2006-Fast Computation of Graph Kernels

8 0.49375498 93 nips-2006-Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

9 0.48290262 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

10 0.46947065 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints

11 0.40478256 163 nips-2006-Prediction on a Graph with a Perceptron

12 0.352193 117 nips-2006-Learning on Graph with Laplacian Regularization

13 0.34616461 5 nips-2006-A Kernel Method for the Two-Sample-Problem

14 0.34138885 128 nips-2006-Manifold Denoising

15 0.33928451 86 nips-2006-Graph-Based Visual Saliency

16 0.30890739 110 nips-2006-Learning Dense 3D Correspondence

17 0.30828744 80 nips-2006-Fundamental Limitations of Spectral Clustering

18 0.2712968 14 nips-2006-A Small World Threshold for Economic Network Formation

19 0.26608017 147 nips-2006-Non-rigid point set registration: Coherent Point Drift

20 0.26601055 166 nips-2006-Recursive Attribute Factoring

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.077), (3, 0.033), (7, 0.093), (9, 0.067), (12, 0.366), (20, 0.015), (22, 0.05), (44, 0.054), (57, 0.066), (65, 0.033), (69, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82072711 39 nips-2006-Balanced Graph Matching

Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi

2 0.78849882 122 nips-2006-Learning to parse images of articulated bodies

Author: Deva Ramanan

Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1

3 0.67877239 33 nips-2006-Analysis of Representations for Domain Adaptation

Author: Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira

Abstract: Discriminative learning methods for classiﬁcation perform well when training and test data are drawn from the same distribution. In many situations, though, we have labeled training data for a source domain, and we wish to learn a classiﬁer which performs well on a target domain with a different distribution. Under what conditions can we adapt a classiﬁer trained on the source domain for use in the target domain? Intuitively, a good feature representation is a crucial factor in the success of domain adaptation. We formalize this intuition theoretically with a generalization bound for domain adaption. Our theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justiﬁcation for a recently proposed model. It also points toward a promising new model for domain adaptation: one which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set. 1

4 0.49120843 80 nips-2006-Fundamental Limitations of Spectral Clustering

Author: Boaz Nadler, Meirav Galun

Abstract: Spectral clustering methods are common graph-based approaches to clustering of data. Spectral clustering algorithms typically start from local information encoded in a weighted graph on the data and cluster according to the global eigenvectors of the corresponding (normalized) similarity matrix. One contribution of this paper is to present fundamental limitations of this general local to global approach. We show that based only on local information, the normalized cut functional is not a suitable measure for the quality of clustering. Further, even with a suitable similarity measure, we show that the ﬁrst few eigenvectors of such adjacency matrices cannot successfully cluster datasets that contain structures at different scales of size and density. Based on these ﬁndings, a second contribution of this paper is a novel diffusion based measure to evaluate the coherence of individual clusters. Our measure can be used in conjunction with any bottom-up graph-based clustering method, it is scale-free and can determine coherent clusters at all scales. We present both synthetic examples and real image segmentation problems where various spectral clustering algorithms fail. In contrast, using this coherence measure ﬁnds the expected clusters at all scales. Keywords: Clustering, kernels, learning theory. 1

5 0.46194336 110 nips-2006-Learning Dense 3D Correspondence

Author: Florian Steinke, Volker Blanz, Bernhard Schölkopf

Abstract: Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difﬁcult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads. 1

6 0.44976926 87 nips-2006-Graph Laplacian Regularization for Large-Scale Semidefinite Programming

7 0.44212982 65 nips-2006-Denoising and Dimension Reduction in Feature Space

8 0.43712658 56 nips-2006-Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data

9 0.43630809 163 nips-2006-Prediction on a Graph with a Perceptron

10 0.43198556 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

11 0.43106014 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks

12 0.43067175 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification

13 0.42953902 158 nips-2006-PG-means: learning the number of clusters in data

14 0.42729267 86 nips-2006-Graph-Based Visual Saliency

15 0.42658252 167 nips-2006-Recursive ICA

16 0.42575213 184 nips-2006-Stratification Learning: Detecting Mixed Density and Dimensionality in High Dimensional Point Clouds

17 0.42426237 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

18 0.42300272 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets

19 0.42076841 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions

20 0.41772494 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning