nips nips2004 nips2004-141 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mukund Narasimhan, Jeff A. Bilmes
Abstract: We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. We then give an algorithm which uses this representation to compute the optimal H ∈ H. Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. In particular, this applies to problems such as finding an optimal subgraphical model over a (k − 1)-tree of a graphical model over a k-tree (for arbitrary k) and selecting an optimal subgraphical model with (a constant) d fewer edges with respect to KL-divergence can be solved in time polynomial in |V (G)| using this formulation. 1 Introduction and Preliminaries The complexity of inference in graphical models is typically exponential in some parameter of the graph, such as the size of the largest clique. Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. The KL-divergence between the original probability distribution and the probability distribution on the simplified graphical model is often used to measure the impact on inference. Existing techniques for reducing the complexity of graphical models including annihilation and edge-removal [4] are greedy in nature and cannot make any guarantees regarding the optimality of the solution. This problem is NP-complete [9] and so, in general, one cannot expect a polynomial time algorithm to find the optimal solution. However, we show that when we restrict the problem to some sets of subgraphs, the optimal solution can be found quite quickly using a dynamic programming algorithm in time polynomial in the tree-width of the graph. 1.1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. A clique of G is a non-empty set S ⊆ V such that {a, b} ∈ E for all ∗ This work was supported by NSF grant IIS-0093430 and an Intel Corporation Grant. {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. A clique S is maximal if S is not properly contained in another clique. If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. S is a minimal (α, β)-separator if no proper subset of S is an (α, β)-separator. S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). If G is triangulated, then the number of maximal cliques is at most |V |. For example, in the graph G shown in Figure 1, K = {{b, c, d} , {a, b, e} , {b, e, c} , {e, c, f } , {c, f, g}}. The links S of T correspond to minimal-separators of G in the following way. If Vi Vj ∈ S (where Vi , Vj ∈ K and hence are cliques of G), then Vi ∩ Vj = φ. We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). We will let K(i) be the nodes of T (i) , and V (i) = ∪V ∈K (i) V be the set of vertices corresponding to the subtree T (i) . The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . We will let G(i) be the subgraph induced by V (i) . A graphical model is a pair (G, P ) where P is the joint probability distribution for random variables X1 , X2 , . . . , Xn , and G is a graph with vertex set V (G) = {X1 , X2 , . . . , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). If G is triangulated, then the junction-tree algorithm can be used for exact inference in the probability distribution P . The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). The general problem of finding a graphical model of tree-width at most k so as to minimize the KL-divergence from a specified probability distribution is NP complete for general k ([9]) However, it is known that this problem is solvable in polynomial time (in |V (G)|) for some special cases cases (such as when G has bounded treewidth or when k = 1 [1]). If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). As observed above, if Vi Vj ∈ S, then the removal {b, c, d} d {b, c} {b, e, c} b c c {c, f, g} {f, c} {e, c, f } g {b, e} a e e f {a, b, e} Figure 2: The graphs G(i) , G(j) and junction-trees T (i) and T (j) resulting from the removal of the link Vij = {c, e} of Vij = Vi ∩ Vj disconnects G into two (vertex-induced) subgraphs G(i) and G(j) which are both triangulated, with junction-trees T (i) and T (j) respectively. We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). We will be interested in finding (triangulated) subgraphs of G that satisfy some conditions, such as a bound on the number of edges, or a bound on the tree-width and which optimize separable objective functions (described in Section 2) One reason why problems such as this can often be solved easily when the tree-width of G is bounded by some constant is this : If Vij is a separator decomposing G into G(i) and G(j) , then a divide-and-conquer approach would suggest that we try and find optimal subgraphs of G(i) and G(j) and then splice the two together to get an optimal subgraph of G. There are two issues with this approach. First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). To rectify the situation, we can do the following. We partition the set of subgraphs of G(i) and G(j) into classes, so that any subgraph of G(i) and any subgraph G(j) corresponding to the same class are compatible in the sense that they match up on their intersection namely Vij , and so that by splicing the two subgraphs together, we get a subgraph of G which is acceptable (and in particular is triangulated). Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. Of course, to ensure that we do not repeatedly solve the same problem, we need to work bottom-up (a.k.a dynamic programming) or memoize our solutions. This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. The bound on the tree-width helps us here. If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. 2.1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. We will denote by H the set of subgraphs of G that are tractable for inference. For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . Suppose that H is any triangulated subgraph of G. Let PG(i) and PG(j) be the (marginalized) distributions of PG on V (i) and V (j) respectively, and PH (i) and PH (j) be the (marginalized) distributions of the distribution PH on V (i) and V (j) where H (i) = H[V (i) ] and H (j) = H[V (j) ], The following result assures us that the KL-divergence also factors according to the separator Vij . Lemma 1. Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. Then D(PG PH ) = D(PG(i) PH (i) ) + D(PG(j) PH (j) ) − D(PG[Vij ] PH[Vij ] ). Proof. Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. Therefore, PH {Xv }v∈V = PH (i) ({Xv }v∈V (i) )·PH (j) ({Xv }v∈V (j) ) . PH[Vij ] ({Xv }v∈V ) The result ij follows immediately. Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. Throughout this paper, for the sake of concreteness, we will assume that the objective criterion is to minimize the KL-divergence. However, all the results can be extended to other objective functions, as long as they “separate” in the sense that for any separator, the objective function is the sum of the objective functions of the two parts, possibly modulo some correction factor which is purely a function of the separator. Another example might be the complexity r(H) of representing the graphical model H. A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . Therefore, the representation cost reduction would satisfy r(G) − r(H) = (r(G (i) ) − r(H (i) )) + (r(G(j) ) − r(H (j) )), and so also factors according to the separators. Finally note that any linear combinations of such separable functions is also separable, and so this technique could also be used to determine tradeoffs (representation cost vs. KL-divergence loss for example). In Section 4 we discuss some issues regarding computing this function. 2.2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. This representation was introduced in [2, 3], and is similar in spirit to Darwiche’s dtrees [6] which specify decompositions of directed acyclic graphs. In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). R(T ) can be recursively constructed from T as follows : If T consists of just one node K, (and hence no edges), then R consists of just one node, which is given the label K as well. If however, T has more than one node, then T must contain at least one link. To begin, let Vi Vj ∈ S be any link in T . Then removal of the link Vi Vj results in two disjoint junctiontrees T (i) and T (j) . We label the root of R by the decomposition (V (i) ; Vij ; V (j) ). The rest of R is recursively built by successively picking links of T (i) and T (j) (decompositions of G(i) and G(j) ) to form the interior nodes of R. The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. Let M be the set of all nodes of R(T ). For any interior node M induced by the the link Vi Vj ∈ S of T , then we will let M (i) and M (j) represent the left and right children of M , and R(i) and R(j) be the left and right trees below M . 3 3.1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. We will be interested in finding sub (k − 1)-trees of k trees and this problem is trivial by the result of [1] when k = 2. Therefore, we assume that k ≥ 3. The following result characterizes the various possibilities for H[Vij ] in this case. Lemma 2. Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . In any (k − 1)-tree H ⊆ G either 1. There is a u ∈ S such that u is not connected to vertices in both V (i) \ S and V (j) \ S. Then S \ {u} is a minimal separator in H and hence is complete. 2. Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. Proof. We consider two possibilities. In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. Since the removal of S disconnects G, the removal of S must also disconnect H. Therefore, S must contain a minimal separator of H. Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. This corresponds to case (1) above. Clearly this possiblity can occur. If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). Therefore, every vertex in S is part of a minimal separator. Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. Let Sx = S \ {x} and Sy = S \ {y} be two distinct minimal separators. We claim that H[S] contains all edges except the edge {x, y}. To see this, note that if z, w ∈ S, with z = w and {z, w} = {x, y} (as sets), then either {z, w} ⊂ Sy or {z, w} ⊂ Sx . Since both Sx and Sy are complete in H, this edge must be present in H. The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. There are k possible constraints corresponding to case (1) above (one for each choice of x), and k · 2 choices corresponding to case (2) above. This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. If y = x, then this corresponds to case (1) of the above lemma. If s = i, then x is connected only to H (i) and if s = j, then x is connected only to H (j) . If y = x, then this corresponds to case (2) in the above lemma. If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. Lemma 3. Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. Then the graph H obtained by splicing H (i) and H (j) together is triangulated. Proof. Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. This contradiction shows that H is triangulated. So assume that both of them satisfy the constraint corresponding to case (2) of Lemma 2. Then if H is not triangulated, there must be a t-cycle (for t ≥ 4) with no chord. Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. If |S| = k, then there are 2k + 2 · k ∈ O(k 2 ) ∈ O(n2 ). We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. Therefore, the optimal (k−1)-tree can be found using Algorithm 1, and in this case, the complexity of Algorithm 1 is O(n(k + 1) 2 ). This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. However, the number of constraints grows exponentially with d (though it is still polynomial in n). Therefore for small, fixed values of d, we can compute the optimal sub (k − d)-tree of G. While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. However, this might be a good algorithm to try when d is large. 3.2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). Two possibilities for finding such an optimal subgraph are 1. Use the procedure described in [4]. This is a greedy procedure which works in d steps by deleting an edge at each step. At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. Then the edge which causes the least increase in KL-divergence is picked at each stage. 2. For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. Since there are |E(G)| such sets, this can be done in polynomial d time (in |V (G)|) when d is a constant. The first greedy algorithm is not guaranteed to yield the optimal solution. The second takes time that is O(n2d ). Now, let us solve this problem using the framework we’ve described. Let H be the set of subgraphs of G which may be obtained by deletion of d edges. For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. c represents the double count, and so is subtracted from the total. If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. See [10] for additional details. 4 Conclusions Algorithm 1 will compute the optimal H ∈ H for the two examples discussed above and is polynomial (for fixed constant d) even if k is O(n). In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. Now, we assume an oracle model for computing KL-divergences of probability distributions on vertex sets of cliques. It is clear that these KL-divergences can be computed R ← separator-tree for G; for each vertex M of R in order of increasing height (bottom up) do for each constraint cM of M do if M is an interior vertex of R corresponding to edge ij of the junction tree then Let Ml and Mr be the left and right children of M ; Pick constraint cl ∈ CMl compatible with cM to minimize table[Ml , cl ]; Pick constraint cr ∈ CMr compatible with cM to minimize table[Mr , cr ]; loss ← D(PG [M ] PH [M ]); table[M, cM ] ← table[Ml , cl ] + table[Mr , cr ] − loss; else table[M, cM ] ← D(PG [M ] PH [M ]); end end end Algorithm 1: Finding optimal set of constraints efficiently for distributions like Gaussians, but for discrete distributions this may not be possible when k is large. However even in this case this algorithm will result in only polynomial calls to the oracle. The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. Therefore, when the cost of computing the KL-divergence is large, this algorithm becomes even more attractive as it results in exponential speedup over the standard algorithm. Alternatively, if we can compute approximate KL-divergences, or approximately optimal solutions, then we can compute an approximate solution by using the same algorithm. References [1] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees”, IEEE Transactions on Information Theory, v. 14, 1968, Pages 462–467. [2] F. Gavril, “Algorithms on clique separable graphs”, Discrete Mathematics v. 9 (1977), pp. 159–165. [3] R. E. Tarjan. “Decomposition by Clique Separators”, Discrete Mathematics, v. 55 (1985), pp. 221–232. [4] U. Kjaerulff. “Reduction of computational complexity in Bayesian networks through removal of weak dependencies”, Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 374–382, 1994. [5] T. Kloks, “Treewidth: Computations and Approximations”, Springer-Verlag, 1994. [6] A. Darwiche and M. Hopkins. “Using recursive decomposition to construct elimination orders, jointrees and dtrees”, Technical Report D-122, Computer Science Dept., UCLA. [7] S. Lauritzen. “Graphical Models”, Oxford University Press, Oxford, 1996. [8] T. A. McKee and F. R. McMorris. “Topics in Intersection Graph Theory”, SIAM Monographs on Discrete Mathematics and Applications, 1999. [9] D. Karger and N. Srebro. “Learning Markov networks: Maximum bounded tree-width graphs.” In Symposium on Discrete Algorithms, 2001, Pages 391-401. [10] M. Narasimhan and J. Bilmes. “Optimization on separator-clique trees.”, Technical report UWEETR 2004-10, June 2004.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. [sent-4, score-0.69]
2 We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. [sent-5, score-0.191]
3 Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. [sent-7, score-0.405]
4 We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. [sent-8, score-0.299]
5 Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. [sent-11, score-0.267]
6 1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. [sent-17, score-0.448]
7 {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. [sent-19, score-0.38]
8 If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. [sent-21, score-0.198]
9 S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . [sent-23, score-0.417]
10 If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). [sent-24, score-0.399]
11 We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. [sent-29, score-0.271]
12 The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). [sent-30, score-0.159]
13 The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . [sent-32, score-0.157]
14 We will let G(i) be the subgraph induced by V (i) . [sent-33, score-0.219]
15 , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). [sent-40, score-0.22]
16 The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). [sent-42, score-0.242]
17 The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. [sent-43, score-0.216]
18 Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). [sent-44, score-0.224]
19 If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. [sent-46, score-0.435]
20 Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. [sent-47, score-0.484]
21 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). [sent-48, score-0.484]
22 We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. [sent-50, score-0.677]
23 When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. [sent-51, score-0.39]
24 For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). [sent-52, score-0.326]
25 First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. [sent-55, score-0.4]
26 Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). [sent-56, score-1.464]
27 Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. [sent-59, score-0.433]
28 This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. [sent-63, score-0.21]
29 Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). [sent-64, score-0.81]
30 For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. [sent-65, score-0.518]
31 If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. [sent-67, score-0.578]
32 So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. [sent-68, score-0.247]
33 In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. [sent-69, score-0.157]
34 1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. [sent-71, score-0.275]
35 We will denote by H the set of subgraphs of G that are tractable for inference. [sent-72, score-0.326]
36 For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. [sent-73, score-0.936]
37 For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). [sent-74, score-0.219]
38 Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. [sent-75, score-0.403]
39 If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. [sent-76, score-1.046]
40 However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . [sent-77, score-0.243]
41 Suppose that H is any triangulated subgraph of G. [sent-78, score-0.546]
42 Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. [sent-81, score-0.635]
43 Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. [sent-84, score-0.686]
44 Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. [sent-87, score-0.619]
45 A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . [sent-91, score-0.234]
46 2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. [sent-97, score-0.525]
47 In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. [sent-99, score-0.21]
48 abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. [sent-100, score-0.987]
49 We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. [sent-101, score-0.157]
50 The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). [sent-102, score-0.607]
51 The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). [sent-103, score-0.149]
52 The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. [sent-110, score-0.321]
53 1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. [sent-114, score-0.269]
54 A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. [sent-115, score-0.304]
55 Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. [sent-116, score-0.39]
56 It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. [sent-117, score-0.652]
57 Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . [sent-122, score-0.376]
58 Then S \ {u} is a minimal separator in H and hence is complete. [sent-125, score-0.335]
59 Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. [sent-127, score-0.203]
60 Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. [sent-128, score-0.243]
61 Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. [sent-129, score-0.194]
62 In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. [sent-132, score-0.203]
63 Since the removal of S disconnects G, the removal of S must also disconnect H. [sent-133, score-0.222]
64 Therefore, S must contain a minimal separator of H. [sent-134, score-0.38]
65 Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. [sent-135, score-0.516]
66 If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. [sent-138, score-0.203]
67 If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). [sent-139, score-0.378]
68 Therefore, every vertex in S is part of a minimal separator. [sent-140, score-0.173]
69 Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. [sent-141, score-0.579]
70 The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. [sent-146, score-0.386]
71 Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. [sent-147, score-0.448]
72 Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. [sent-148, score-0.241]
73 This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. [sent-150, score-0.43]
74 More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. [sent-151, score-0.47]
75 If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . [sent-155, score-0.23]
76 Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . [sent-156, score-0.23]
77 Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . [sent-157, score-0.917]
78 Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. [sent-158, score-0.199]
79 Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. [sent-160, score-0.729]
80 Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. [sent-163, score-0.367]
81 If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . [sent-164, score-0.341]
82 Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. [sent-165, score-0.218]
83 Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. [sent-169, score-0.337]
84 We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. [sent-170, score-0.466]
85 We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. [sent-172, score-0.321]
86 However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. [sent-173, score-0.386]
87 This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. [sent-175, score-0.252]
88 While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. [sent-178, score-0.176]
89 2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. [sent-181, score-1.027]
90 That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. [sent-182, score-0.272]
91 Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). [sent-183, score-0.625]
92 Two possibilities for finding such an optimal subgraph are 1. [sent-184, score-0.272]
93 At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. [sent-187, score-0.477]
94 For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. [sent-190, score-0.451]
95 Let H be the set of subgraphs of G which may be obtained by deletion of d edges. [sent-195, score-0.365]
96 For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . [sent-196, score-0.236]
97 The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. [sent-197, score-0.315]
98 If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. [sent-199, score-0.157]
99 In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. [sent-202, score-0.206]
100 The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. [sent-206, score-0.154]
wordName wordTfidf (topN-words)
[('vij', 0.398), ('ph', 0.392), ('triangulated', 0.327), ('subgraphs', 0.326), ('pg', 0.256), ('subgraph', 0.219), ('separator', 0.217), ('separators', 0.199), ('unchorded', 0.164), ('subgraphical', 0.131), ('treewidth', 0.131), ('minimal', 0.1), ('vertices', 0.099), ('tree', 0.096), ('polynomial', 0.096), ('vj', 0.094), ('decompositions', 0.091), ('clique', 0.087), ('sub', 0.085), ('vi', 0.078), ('cm', 0.077), ('vertex', 0.073), ('splicing', 0.071), ('removal', 0.07), ('graphical', 0.068), ('cycle', 0.068), ('decomposition', 0.061), ('junction', 0.061), ('edges', 0.057), ('edge', 0.054), ('optimal', 0.053), ('graph', 0.053), ('interior', 0.05), ('disconnects', 0.049), ('xv', 0.048), ('nding', 0.044), ('cliques', 0.043), ('link', 0.04), ('satisfy', 0.04), ('separable', 0.039), ('deletion', 0.039), ('sx', 0.039), ('cr', 0.039), ('compatible', 0.038), ('bounded', 0.037), ('constraint', 0.036), ('sy', 0.036), ('missing', 0.036), ('suppose', 0.035), ('nodes', 0.035), ('cl', 0.034), ('must', 0.033), ('darwiche', 0.033), ('dtrees', 0.033), ('gavril', 0.033), ('narasimhan', 0.033), ('graphs', 0.033), ('pick', 0.032), ('acceptable', 0.031), ('connected', 0.031), ('links', 0.031), ('contain', 0.03), ('contained', 0.03), ('mr', 0.029), ('deleting', 0.028), ('greedy', 0.027), ('lemma', 0.027), ('discrete', 0.027), ('objective', 0.026), ('recursively', 0.025), ('inference', 0.025), ('complexity', 0.024), ('splice', 0.024), ('trees', 0.024), ('constraints', 0.024), ('ml', 0.024), ('calls', 0.023), ('paths', 0.022), ('node', 0.022), ('classes', 0.022), ('fewer', 0.022), ('therefore', 0.022), ('xy', 0.022), ('match', 0.021), ('factors', 0.021), ('marginalized', 0.021), ('cf', 0.021), ('rooted', 0.021), ('mathematics', 0.021), ('interested', 0.019), ('ij', 0.019), ('intersection', 0.019), ('procedure', 0.018), ('hence', 0.018), ('maximal', 0.018), ('together', 0.018), ('individually', 0.017), ('representation', 0.017), ('correspond', 0.017), ('spanning', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 141 nips-2004-Optimal sub-graphical models
Author: Mukund Narasimhan, Jeff A. Bilmes
Abstract: We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. We then give an algorithm which uses this representation to compute the optimal H ∈ H. Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. In particular, this applies to problems such as finding an optimal subgraphical model over a (k − 1)-tree of a graphical model over a k-tree (for arbitrary k) and selecting an optimal subgraphical model with (a constant) d fewer edges with respect to KL-divergence can be solved in time polynomial in |V (G)| using this formulation. 1 Introduction and Preliminaries The complexity of inference in graphical models is typically exponential in some parameter of the graph, such as the size of the largest clique. Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. The KL-divergence between the original probability distribution and the probability distribution on the simplified graphical model is often used to measure the impact on inference. Existing techniques for reducing the complexity of graphical models including annihilation and edge-removal [4] are greedy in nature and cannot make any guarantees regarding the optimality of the solution. This problem is NP-complete [9] and so, in general, one cannot expect a polynomial time algorithm to find the optimal solution. However, we show that when we restrict the problem to some sets of subgraphs, the optimal solution can be found quite quickly using a dynamic programming algorithm in time polynomial in the tree-width of the graph. 1.1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. A clique of G is a non-empty set S ⊆ V such that {a, b} ∈ E for all ∗ This work was supported by NSF grant IIS-0093430 and an Intel Corporation Grant. {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. A clique S is maximal if S is not properly contained in another clique. If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. S is a minimal (α, β)-separator if no proper subset of S is an (α, β)-separator. S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). If G is triangulated, then the number of maximal cliques is at most |V |. For example, in the graph G shown in Figure 1, K = {{b, c, d} , {a, b, e} , {b, e, c} , {e, c, f } , {c, f, g}}. The links S of T correspond to minimal-separators of G in the following way. If Vi Vj ∈ S (where Vi , Vj ∈ K and hence are cliques of G), then Vi ∩ Vj = φ. We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). We will let K(i) be the nodes of T (i) , and V (i) = ∪V ∈K (i) V be the set of vertices corresponding to the subtree T (i) . The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . We will let G(i) be the subgraph induced by V (i) . A graphical model is a pair (G, P ) where P is the joint probability distribution for random variables X1 , X2 , . . . , Xn , and G is a graph with vertex set V (G) = {X1 , X2 , . . . , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). If G is triangulated, then the junction-tree algorithm can be used for exact inference in the probability distribution P . The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). The general problem of finding a graphical model of tree-width at most k so as to minimize the KL-divergence from a specified probability distribution is NP complete for general k ([9]) However, it is known that this problem is solvable in polynomial time (in |V (G)|) for some special cases cases (such as when G has bounded treewidth or when k = 1 [1]). If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). As observed above, if Vi Vj ∈ S, then the removal {b, c, d} d {b, c} {b, e, c} b c c {c, f, g} {f, c} {e, c, f } g {b, e} a e e f {a, b, e} Figure 2: The graphs G(i) , G(j) and junction-trees T (i) and T (j) resulting from the removal of the link Vij = {c, e} of Vij = Vi ∩ Vj disconnects G into two (vertex-induced) subgraphs G(i) and G(j) which are both triangulated, with junction-trees T (i) and T (j) respectively. We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). We will be interested in finding (triangulated) subgraphs of G that satisfy some conditions, such as a bound on the number of edges, or a bound on the tree-width and which optimize separable objective functions (described in Section 2) One reason why problems such as this can often be solved easily when the tree-width of G is bounded by some constant is this : If Vij is a separator decomposing G into G(i) and G(j) , then a divide-and-conquer approach would suggest that we try and find optimal subgraphs of G(i) and G(j) and then splice the two together to get an optimal subgraph of G. There are two issues with this approach. First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). To rectify the situation, we can do the following. We partition the set of subgraphs of G(i) and G(j) into classes, so that any subgraph of G(i) and any subgraph G(j) corresponding to the same class are compatible in the sense that they match up on their intersection namely Vij , and so that by splicing the two subgraphs together, we get a subgraph of G which is acceptable (and in particular is triangulated). Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. Of course, to ensure that we do not repeatedly solve the same problem, we need to work bottom-up (a.k.a dynamic programming) or memoize our solutions. This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. The bound on the tree-width helps us here. If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. 2.1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. We will denote by H the set of subgraphs of G that are tractable for inference. For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . Suppose that H is any triangulated subgraph of G. Let PG(i) and PG(j) be the (marginalized) distributions of PG on V (i) and V (j) respectively, and PH (i) and PH (j) be the (marginalized) distributions of the distribution PH on V (i) and V (j) where H (i) = H[V (i) ] and H (j) = H[V (j) ], The following result assures us that the KL-divergence also factors according to the separator Vij . Lemma 1. Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. Then D(PG PH ) = D(PG(i) PH (i) ) + D(PG(j) PH (j) ) − D(PG[Vij ] PH[Vij ] ). Proof. Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. Therefore, PH {Xv }v∈V = PH (i) ({Xv }v∈V (i) )·PH (j) ({Xv }v∈V (j) ) . PH[Vij ] ({Xv }v∈V ) The result ij follows immediately. Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. Throughout this paper, for the sake of concreteness, we will assume that the objective criterion is to minimize the KL-divergence. However, all the results can be extended to other objective functions, as long as they “separate” in the sense that for any separator, the objective function is the sum of the objective functions of the two parts, possibly modulo some correction factor which is purely a function of the separator. Another example might be the complexity r(H) of representing the graphical model H. A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . Therefore, the representation cost reduction would satisfy r(G) − r(H) = (r(G (i) ) − r(H (i) )) + (r(G(j) ) − r(H (j) )), and so also factors according to the separators. Finally note that any linear combinations of such separable functions is also separable, and so this technique could also be used to determine tradeoffs (representation cost vs. KL-divergence loss for example). In Section 4 we discuss some issues regarding computing this function. 2.2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. This representation was introduced in [2, 3], and is similar in spirit to Darwiche’s dtrees [6] which specify decompositions of directed acyclic graphs. In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). R(T ) can be recursively constructed from T as follows : If T consists of just one node K, (and hence no edges), then R consists of just one node, which is given the label K as well. If however, T has more than one node, then T must contain at least one link. To begin, let Vi Vj ∈ S be any link in T . Then removal of the link Vi Vj results in two disjoint junctiontrees T (i) and T (j) . We label the root of R by the decomposition (V (i) ; Vij ; V (j) ). The rest of R is recursively built by successively picking links of T (i) and T (j) (decompositions of G(i) and G(j) ) to form the interior nodes of R. The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. Let M be the set of all nodes of R(T ). For any interior node M induced by the the link Vi Vj ∈ S of T , then we will let M (i) and M (j) represent the left and right children of M , and R(i) and R(j) be the left and right trees below M . 3 3.1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. We will be interested in finding sub (k − 1)-trees of k trees and this problem is trivial by the result of [1] when k = 2. Therefore, we assume that k ≥ 3. The following result characterizes the various possibilities for H[Vij ] in this case. Lemma 2. Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . In any (k − 1)-tree H ⊆ G either 1. There is a u ∈ S such that u is not connected to vertices in both V (i) \ S and V (j) \ S. Then S \ {u} is a minimal separator in H and hence is complete. 2. Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. Proof. We consider two possibilities. In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. Since the removal of S disconnects G, the removal of S must also disconnect H. Therefore, S must contain a minimal separator of H. Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. This corresponds to case (1) above. Clearly this possiblity can occur. If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). Therefore, every vertex in S is part of a minimal separator. Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. Let Sx = S \ {x} and Sy = S \ {y} be two distinct minimal separators. We claim that H[S] contains all edges except the edge {x, y}. To see this, note that if z, w ∈ S, with z = w and {z, w} = {x, y} (as sets), then either {z, w} ⊂ Sy or {z, w} ⊂ Sx . Since both Sx and Sy are complete in H, this edge must be present in H. The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. There are k possible constraints corresponding to case (1) above (one for each choice of x), and k · 2 choices corresponding to case (2) above. This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. If y = x, then this corresponds to case (1) of the above lemma. If s = i, then x is connected only to H (i) and if s = j, then x is connected only to H (j) . If y = x, then this corresponds to case (2) in the above lemma. If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. Lemma 3. Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. Then the graph H obtained by splicing H (i) and H (j) together is triangulated. Proof. Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. This contradiction shows that H is triangulated. So assume that both of them satisfy the constraint corresponding to case (2) of Lemma 2. Then if H is not triangulated, there must be a t-cycle (for t ≥ 4) with no chord. Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. If |S| = k, then there are 2k + 2 · k ∈ O(k 2 ) ∈ O(n2 ). We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. Therefore, the optimal (k−1)-tree can be found using Algorithm 1, and in this case, the complexity of Algorithm 1 is O(n(k + 1) 2 ). This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. However, the number of constraints grows exponentially with d (though it is still polynomial in n). Therefore for small, fixed values of d, we can compute the optimal sub (k − d)-tree of G. While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. However, this might be a good algorithm to try when d is large. 3.2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). Two possibilities for finding such an optimal subgraph are 1. Use the procedure described in [4]. This is a greedy procedure which works in d steps by deleting an edge at each step. At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. Then the edge which causes the least increase in KL-divergence is picked at each stage. 2. For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. Since there are |E(G)| such sets, this can be done in polynomial d time (in |V (G)|) when d is a constant. The first greedy algorithm is not guaranteed to yield the optimal solution. The second takes time that is O(n2d ). Now, let us solve this problem using the framework we’ve described. Let H be the set of subgraphs of G which may be obtained by deletion of d edges. For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. c represents the double count, and so is subtracted from the total. If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. See [10] for additional details. 4 Conclusions Algorithm 1 will compute the optimal H ∈ H for the two examples discussed above and is polynomial (for fixed constant d) even if k is O(n). In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. Now, we assume an oracle model for computing KL-divergences of probability distributions on vertex sets of cliques. It is clear that these KL-divergences can be computed R ← separator-tree for G; for each vertex M of R in order of increasing height (bottom up) do for each constraint cM of M do if M is an interior vertex of R corresponding to edge ij of the junction tree then Let Ml and Mr be the left and right children of M ; Pick constraint cl ∈ CMl compatible with cM to minimize table[Ml , cl ]; Pick constraint cr ∈ CMr compatible with cM to minimize table[Mr , cr ]; loss ← D(PG [M ] PH [M ]); table[M, cM ] ← table[Ml , cl ] + table[Mr , cr ] − loss; else table[M, cM ] ← D(PG [M ] PH [M ]); end end end Algorithm 1: Finding optimal set of constraints efficiently for distributions like Gaussians, but for discrete distributions this may not be possible when k is large. However even in this case this algorithm will result in only polynomial calls to the oracle. The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. Therefore, when the cost of computing the KL-divergence is large, this algorithm becomes even more attractive as it results in exponential speedup over the standard algorithm. Alternatively, if we can compute approximate KL-divergences, or approximately optimal solutions, then we can compute an approximate solution by using the same algorithm. References [1] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees”, IEEE Transactions on Information Theory, v. 14, 1968, Pages 462–467. [2] F. Gavril, “Algorithms on clique separable graphs”, Discrete Mathematics v. 9 (1977), pp. 159–165. [3] R. E. Tarjan. “Decomposition by Clique Separators”, Discrete Mathematics, v. 55 (1985), pp. 221–232. [4] U. Kjaerulff. “Reduction of computational complexity in Bayesian networks through removal of weak dependencies”, Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 374–382, 1994. [5] T. Kloks, “Treewidth: Computations and Approximations”, Springer-Verlag, 1994. [6] A. Darwiche and M. Hopkins. “Using recursive decomposition to construct elimination orders, jointrees and dtrees”, Technical Report D-122, Computer Science Dept., UCLA. [7] S. Lauritzen. “Graphical Models”, Oxford University Press, Oxford, 1996. [8] T. A. McKee and F. R. McMorris. “Topics in Intersection Graph Theory”, SIAM Monographs on Discrete Mathematics and Applications, 1999. [9] D. Karger and N. Srebro. “Learning Markov networks: Maximum bounded tree-width graphs.” In Symposium on Discrete Algorithms, 2001, Pages 391-401. [10] M. Narasimhan and J. Bilmes. “Optimization on separator-clique trees.”, Technical report UWEETR 2004-10, June 2004.
2 0.1347885 19 nips-2004-An Application of Boosting to Graph Classification
Author: Taku Kudo, Eisaku Maeda, Yuji Matsumoto
Abstract: This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency. 1
3 0.074605398 23 nips-2004-Analysis of a greedy active learning strategy
Author: Sanjoy Dasgupta
Abstract: We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels. 1
4 0.074058004 165 nips-2004-Semi-supervised Learning on Directed Graphs
Author: Dengyong Zhou, Thomas Hofmann, Bernhard Schölkopf
Abstract: Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach. 1
5 0.06645079 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
6 0.053910337 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
7 0.052308638 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
8 0.050939001 100 nips-2004-Learning Preferences for Multiclass Problems
9 0.044305697 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
10 0.042913586 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
11 0.039282378 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
12 0.038606666 168 nips-2004-Semigroup Kernels on Finite Sets
13 0.038247623 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
14 0.038085494 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons
15 0.038000781 160 nips-2004-Seeing through water
16 0.037378147 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
17 0.036677882 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data
18 0.036573157 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference
19 0.035243966 137 nips-2004-On the Adaptive Properties of Decision Trees
20 0.034237023 70 nips-2004-Following Curved Regularized Optimization Solution Paths
topicId topicWeight
[(0, -0.109), (1, 0.041), (2, 0.005), (3, 0.03), (4, 0.01), (5, 0.043), (6, 0.004), (7, 0.029), (8, -0.063), (9, -0.099), (10, -0.037), (11, -0.046), (12, -0.014), (13, 0.035), (14, -0.007), (15, 0.046), (16, 0.064), (17, -0.013), (18, 0.045), (19, 0.047), (20, 0.09), (21, 0.01), (22, -0.036), (23, 0.029), (24, -0.031), (25, 0.009), (26, -0.004), (27, -0.133), (28, 0.046), (29, -0.114), (30, -0.064), (31, 0.068), (32, 0.016), (33, 0.001), (34, -0.148), (35, 0.053), (36, 0.051), (37, -0.036), (38, 0.142), (39, -0.04), (40, -0.051), (41, 0.151), (42, -0.195), (43, 0.059), (44, -0.215), (45, 0.03), (46, 0.067), (47, 0.057), (48, -0.106), (49, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.95322776 141 nips-2004-Optimal sub-graphical models
Author: Mukund Narasimhan, Jeff A. Bilmes
Abstract: We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. We then give an algorithm which uses this representation to compute the optimal H ∈ H. Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. In particular, this applies to problems such as finding an optimal subgraphical model over a (k − 1)-tree of a graphical model over a k-tree (for arbitrary k) and selecting an optimal subgraphical model with (a constant) d fewer edges with respect to KL-divergence can be solved in time polynomial in |V (G)| using this formulation. 1 Introduction and Preliminaries The complexity of inference in graphical models is typically exponential in some parameter of the graph, such as the size of the largest clique. Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. The KL-divergence between the original probability distribution and the probability distribution on the simplified graphical model is often used to measure the impact on inference. Existing techniques for reducing the complexity of graphical models including annihilation and edge-removal [4] are greedy in nature and cannot make any guarantees regarding the optimality of the solution. This problem is NP-complete [9] and so, in general, one cannot expect a polynomial time algorithm to find the optimal solution. However, we show that when we restrict the problem to some sets of subgraphs, the optimal solution can be found quite quickly using a dynamic programming algorithm in time polynomial in the tree-width of the graph. 1.1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. A clique of G is a non-empty set S ⊆ V such that {a, b} ∈ E for all ∗ This work was supported by NSF grant IIS-0093430 and an Intel Corporation Grant. {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. A clique S is maximal if S is not properly contained in another clique. If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. S is a minimal (α, β)-separator if no proper subset of S is an (α, β)-separator. S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). If G is triangulated, then the number of maximal cliques is at most |V |. For example, in the graph G shown in Figure 1, K = {{b, c, d} , {a, b, e} , {b, e, c} , {e, c, f } , {c, f, g}}. The links S of T correspond to minimal-separators of G in the following way. If Vi Vj ∈ S (where Vi , Vj ∈ K and hence are cliques of G), then Vi ∩ Vj = φ. We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). We will let K(i) be the nodes of T (i) , and V (i) = ∪V ∈K (i) V be the set of vertices corresponding to the subtree T (i) . The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . We will let G(i) be the subgraph induced by V (i) . A graphical model is a pair (G, P ) where P is the joint probability distribution for random variables X1 , X2 , . . . , Xn , and G is a graph with vertex set V (G) = {X1 , X2 , . . . , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). If G is triangulated, then the junction-tree algorithm can be used for exact inference in the probability distribution P . The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). The general problem of finding a graphical model of tree-width at most k so as to minimize the KL-divergence from a specified probability distribution is NP complete for general k ([9]) However, it is known that this problem is solvable in polynomial time (in |V (G)|) for some special cases cases (such as when G has bounded treewidth or when k = 1 [1]). If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). As observed above, if Vi Vj ∈ S, then the removal {b, c, d} d {b, c} {b, e, c} b c c {c, f, g} {f, c} {e, c, f } g {b, e} a e e f {a, b, e} Figure 2: The graphs G(i) , G(j) and junction-trees T (i) and T (j) resulting from the removal of the link Vij = {c, e} of Vij = Vi ∩ Vj disconnects G into two (vertex-induced) subgraphs G(i) and G(j) which are both triangulated, with junction-trees T (i) and T (j) respectively. We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). We will be interested in finding (triangulated) subgraphs of G that satisfy some conditions, such as a bound on the number of edges, or a bound on the tree-width and which optimize separable objective functions (described in Section 2) One reason why problems such as this can often be solved easily when the tree-width of G is bounded by some constant is this : If Vij is a separator decomposing G into G(i) and G(j) , then a divide-and-conquer approach would suggest that we try and find optimal subgraphs of G(i) and G(j) and then splice the two together to get an optimal subgraph of G. There are two issues with this approach. First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). To rectify the situation, we can do the following. We partition the set of subgraphs of G(i) and G(j) into classes, so that any subgraph of G(i) and any subgraph G(j) corresponding to the same class are compatible in the sense that they match up on their intersection namely Vij , and so that by splicing the two subgraphs together, we get a subgraph of G which is acceptable (and in particular is triangulated). Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. Of course, to ensure that we do not repeatedly solve the same problem, we need to work bottom-up (a.k.a dynamic programming) or memoize our solutions. This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. The bound on the tree-width helps us here. If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. 2.1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. We will denote by H the set of subgraphs of G that are tractable for inference. For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . Suppose that H is any triangulated subgraph of G. Let PG(i) and PG(j) be the (marginalized) distributions of PG on V (i) and V (j) respectively, and PH (i) and PH (j) be the (marginalized) distributions of the distribution PH on V (i) and V (j) where H (i) = H[V (i) ] and H (j) = H[V (j) ], The following result assures us that the KL-divergence also factors according to the separator Vij . Lemma 1. Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. Then D(PG PH ) = D(PG(i) PH (i) ) + D(PG(j) PH (j) ) − D(PG[Vij ] PH[Vij ] ). Proof. Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. Therefore, PH {Xv }v∈V = PH (i) ({Xv }v∈V (i) )·PH (j) ({Xv }v∈V (j) ) . PH[Vij ] ({Xv }v∈V ) The result ij follows immediately. Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. Throughout this paper, for the sake of concreteness, we will assume that the objective criterion is to minimize the KL-divergence. However, all the results can be extended to other objective functions, as long as they “separate” in the sense that for any separator, the objective function is the sum of the objective functions of the two parts, possibly modulo some correction factor which is purely a function of the separator. Another example might be the complexity r(H) of representing the graphical model H. A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . Therefore, the representation cost reduction would satisfy r(G) − r(H) = (r(G (i) ) − r(H (i) )) + (r(G(j) ) − r(H (j) )), and so also factors according to the separators. Finally note that any linear combinations of such separable functions is also separable, and so this technique could also be used to determine tradeoffs (representation cost vs. KL-divergence loss for example). In Section 4 we discuss some issues regarding computing this function. 2.2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. This representation was introduced in [2, 3], and is similar in spirit to Darwiche’s dtrees [6] which specify decompositions of directed acyclic graphs. In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). R(T ) can be recursively constructed from T as follows : If T consists of just one node K, (and hence no edges), then R consists of just one node, which is given the label K as well. If however, T has more than one node, then T must contain at least one link. To begin, let Vi Vj ∈ S be any link in T . Then removal of the link Vi Vj results in two disjoint junctiontrees T (i) and T (j) . We label the root of R by the decomposition (V (i) ; Vij ; V (j) ). The rest of R is recursively built by successively picking links of T (i) and T (j) (decompositions of G(i) and G(j) ) to form the interior nodes of R. The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. Let M be the set of all nodes of R(T ). For any interior node M induced by the the link Vi Vj ∈ S of T , then we will let M (i) and M (j) represent the left and right children of M , and R(i) and R(j) be the left and right trees below M . 3 3.1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. We will be interested in finding sub (k − 1)-trees of k trees and this problem is trivial by the result of [1] when k = 2. Therefore, we assume that k ≥ 3. The following result characterizes the various possibilities for H[Vij ] in this case. Lemma 2. Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . In any (k − 1)-tree H ⊆ G either 1. There is a u ∈ S such that u is not connected to vertices in both V (i) \ S and V (j) \ S. Then S \ {u} is a minimal separator in H and hence is complete. 2. Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. Proof. We consider two possibilities. In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. Since the removal of S disconnects G, the removal of S must also disconnect H. Therefore, S must contain a minimal separator of H. Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. This corresponds to case (1) above. Clearly this possiblity can occur. If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). Therefore, every vertex in S is part of a minimal separator. Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. Let Sx = S \ {x} and Sy = S \ {y} be two distinct minimal separators. We claim that H[S] contains all edges except the edge {x, y}. To see this, note that if z, w ∈ S, with z = w and {z, w} = {x, y} (as sets), then either {z, w} ⊂ Sy or {z, w} ⊂ Sx . Since both Sx and Sy are complete in H, this edge must be present in H. The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. There are k possible constraints corresponding to case (1) above (one for each choice of x), and k · 2 choices corresponding to case (2) above. This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. If y = x, then this corresponds to case (1) of the above lemma. If s = i, then x is connected only to H (i) and if s = j, then x is connected only to H (j) . If y = x, then this corresponds to case (2) in the above lemma. If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. Lemma 3. Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. Then the graph H obtained by splicing H (i) and H (j) together is triangulated. Proof. Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. This contradiction shows that H is triangulated. So assume that both of them satisfy the constraint corresponding to case (2) of Lemma 2. Then if H is not triangulated, there must be a t-cycle (for t ≥ 4) with no chord. Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. If |S| = k, then there are 2k + 2 · k ∈ O(k 2 ) ∈ O(n2 ). We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. Therefore, the optimal (k−1)-tree can be found using Algorithm 1, and in this case, the complexity of Algorithm 1 is O(n(k + 1) 2 ). This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. However, the number of constraints grows exponentially with d (though it is still polynomial in n). Therefore for small, fixed values of d, we can compute the optimal sub (k − d)-tree of G. While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. However, this might be a good algorithm to try when d is large. 3.2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). Two possibilities for finding such an optimal subgraph are 1. Use the procedure described in [4]. This is a greedy procedure which works in d steps by deleting an edge at each step. At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. Then the edge which causes the least increase in KL-divergence is picked at each stage. 2. For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. Since there are |E(G)| such sets, this can be done in polynomial d time (in |V (G)|) when d is a constant. The first greedy algorithm is not guaranteed to yield the optimal solution. The second takes time that is O(n2d ). Now, let us solve this problem using the framework we’ve described. Let H be the set of subgraphs of G which may be obtained by deletion of d edges. For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. c represents the double count, and so is subtracted from the total. If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. See [10] for additional details. 4 Conclusions Algorithm 1 will compute the optimal H ∈ H for the two examples discussed above and is polynomial (for fixed constant d) even if k is O(n). In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. Now, we assume an oracle model for computing KL-divergences of probability distributions on vertex sets of cliques. It is clear that these KL-divergences can be computed R ← separator-tree for G; for each vertex M of R in order of increasing height (bottom up) do for each constraint cM of M do if M is an interior vertex of R corresponding to edge ij of the junction tree then Let Ml and Mr be the left and right children of M ; Pick constraint cl ∈ CMl compatible with cM to minimize table[Ml , cl ]; Pick constraint cr ∈ CMr compatible with cM to minimize table[Mr , cr ]; loss ← D(PG [M ] PH [M ]); table[M, cM ] ← table[Ml , cl ] + table[Mr , cr ] − loss; else table[M, cM ] ← D(PG [M ] PH [M ]); end end end Algorithm 1: Finding optimal set of constraints efficiently for distributions like Gaussians, but for discrete distributions this may not be possible when k is large. However even in this case this algorithm will result in only polynomial calls to the oracle. The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. Therefore, when the cost of computing the KL-divergence is large, this algorithm becomes even more attractive as it results in exponential speedup over the standard algorithm. Alternatively, if we can compute approximate KL-divergences, or approximately optimal solutions, then we can compute an approximate solution by using the same algorithm. References [1] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees”, IEEE Transactions on Information Theory, v. 14, 1968, Pages 462–467. [2] F. Gavril, “Algorithms on clique separable graphs”, Discrete Mathematics v. 9 (1977), pp. 159–165. [3] R. E. Tarjan. “Decomposition by Clique Separators”, Discrete Mathematics, v. 55 (1985), pp. 221–232. [4] U. Kjaerulff. “Reduction of computational complexity in Bayesian networks through removal of weak dependencies”, Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 374–382, 1994. [5] T. Kloks, “Treewidth: Computations and Approximations”, Springer-Verlag, 1994. [6] A. Darwiche and M. Hopkins. “Using recursive decomposition to construct elimination orders, jointrees and dtrees”, Technical Report D-122, Computer Science Dept., UCLA. [7] S. Lauritzen. “Graphical Models”, Oxford University Press, Oxford, 1996. [8] T. A. McKee and F. R. McMorris. “Topics in Intersection Graph Theory”, SIAM Monographs on Discrete Mathematics and Applications, 1999. [9] D. Karger and N. Srebro. “Learning Markov networks: Maximum bounded tree-width graphs.” In Symposium on Discrete Algorithms, 2001, Pages 391-401. [10] M. Narasimhan and J. Bilmes. “Optimization on separator-clique trees.”, Technical report UWEETR 2004-10, June 2004.
2 0.58347702 19 nips-2004-An Application of Boosting to Graph Classification
Author: Taku Kudo, Eisaku Maeda, Yuji Matsumoto
Abstract: This paper presents an application of Boosting for classifying labeled graphs, general structures for modeling a number of real-world data, such as chemical compounds, natural language texts, and bio sequences. The proposal consists of i) decision stumps that use subgraph as features, and ii) a Boosting algorithm in which subgraph-based decision stumps are used as weak learners. We also discuss the relation between our algorithm and SVMs with convolution kernels. Two experiments using natural language data and chemical compounds show that our method achieves comparable or even better performance than SVMs with convolution kernels as well as improves the testing efficiency. 1
3 0.57689476 165 nips-2004-Semi-supervised Learning on Directed Graphs
Author: Dengyong Zhou, Thomas Hofmann, Bernhard Schölkopf
Abstract: Given a directed graph in which some of the nodes are labeled, we investigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions defined over nodes of a directed graph that forces the classification function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classification algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classification problems demonstrates encouraging results that validate our approach. 1
4 0.51267099 57 nips-2004-Economic Properties of Social Networks
Author: Sham M. Kakade, Michael Kearns, Luis E. Ortiz, Robin Pemantle, Siddharth Suri
Abstract: We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical economics. We are particularly interested in how the statistical structure of such networks influences global economic quantities such as price variation. Our findings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations. 1
5 0.44591057 177 nips-2004-Supervised Graph Inference
Author: Jean-philippe Vert, Yoshihiro Yamanishi
Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1
6 0.36754894 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
7 0.34054723 130 nips-2004-Newscast EM
8 0.33624163 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
9 0.32343361 95 nips-2004-Large-Scale Prediction of Disulphide Bond Connectivity
10 0.3186821 100 nips-2004-Learning Preferences for Multiclass Problems
11 0.30446479 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
12 0.30415806 54 nips-2004-Distributed Information Regularization on Graphs
13 0.30031589 30 nips-2004-Binet-Cauchy Kernels
14 0.29395124 23 nips-2004-Analysis of a greedy active learning strategy
15 0.29290411 170 nips-2004-Similarity and Discrimination in Classical Conditioning: A Latent Variable Account
16 0.28476825 22 nips-2004-An Investigation of Practical Approximate Nearest Neighbor Algorithms
17 0.27437493 146 nips-2004-Pictorial Structures for Molecular Modeling: Interpreting Density Maps
18 0.24832389 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data
19 0.24411698 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
20 0.23999979 126 nips-2004-Nearly Tight Bounds for the Continuum-Armed Bandit Problem
topicId topicWeight
[(13, 0.045), (15, 0.096), (26, 0.055), (31, 0.027), (32, 0.024), (33, 0.178), (39, 0.038), (50, 0.015), (77, 0.401)]
simIndex simValue paperId paperTitle
same-paper 1 0.76301956 141 nips-2004-Optimal sub-graphical models
Author: Mukund Narasimhan, Jeff A. Bilmes
Abstract: We investigate the problem of reducing the complexity of a graphical model (G, PG ) by finding a subgraph H of G, chosen from a class of subgraphs H, such that H is optimal with respect to KL-divergence. We do this by first defining a decomposition tree representation for G, which is closely related to the junction-tree representation for G. We then give an algorithm which uses this representation to compute the optimal H ∈ H. Gavril [2] and Tarjan [3] have used graph separation properties to solve several combinatorial optimization problems when the size of the minimal separators in the graph is bounded. We present an extension of this technique which applies to some important choices of H even when the size of the minimal separators of G are arbitrarily large. In particular, this applies to problems such as finding an optimal subgraphical model over a (k − 1)-tree of a graphical model over a k-tree (for arbitrary k) and selecting an optimal subgraphical model with (a constant) d fewer edges with respect to KL-divergence can be solved in time polynomial in |V (G)| using this formulation. 1 Introduction and Preliminaries The complexity of inference in graphical models is typically exponential in some parameter of the graph, such as the size of the largest clique. Therefore, it is often required to find a subgraphical model that has lower complexity (smaller clique size) without introducing a large error in inference results. The KL-divergence between the original probability distribution and the probability distribution on the simplified graphical model is often used to measure the impact on inference. Existing techniques for reducing the complexity of graphical models including annihilation and edge-removal [4] are greedy in nature and cannot make any guarantees regarding the optimality of the solution. This problem is NP-complete [9] and so, in general, one cannot expect a polynomial time algorithm to find the optimal solution. However, we show that when we restrict the problem to some sets of subgraphs, the optimal solution can be found quite quickly using a dynamic programming algorithm in time polynomial in the tree-width of the graph. 1.1 Notation and Terminology A graph G = (V, E) is said to be triangulated if every cycle of length greater than 3 has a chord. A clique of G is a non-empty set S ⊆ V such that {a, b} ∈ E for all ∗ This work was supported by NSF grant IIS-0093430 and an Intel Corporation Grant. {b, c, d} d {c, f, g} {b, c} {b, e, c} b c {f, c} {c, e} {e, c, f } g {b, e} a e f {a, b, e} Figure 1: A triangulated graph G and a junction-tree for G a, b ∈ S. A clique S is maximal if S is not properly contained in another clique. If α and β are non-adjacent vertices of G then a set of vertices S ⊆ V \ {α, β} is called an (α, β)-separator if α and β are in distinct components of G[V \ S]. S is a minimal (α, β)-separator if no proper subset of S is an (α, β)-separator. S is said to be a minimal separator if S is a minimal (α, β)-separator for some non adjacent a, b ∈ V . If T = (K, S) is a junction-tree for G (see [7]), then the nodes K of T correspond to the maximalcliques of G, while the links S correspond to minimal separators of G (We reserve the terms vertices/edges for elements of G, and nodes/links for the elements of T ). If G is triangulated, then the number of maximal cliques is at most |V |. For example, in the graph G shown in Figure 1, K = {{b, c, d} , {a, b, e} , {b, e, c} , {e, c, f } , {c, f, g}}. The links S of T correspond to minimal-separators of G in the following way. If Vi Vj ∈ S (where Vi , Vj ∈ K and hence are cliques of G), then Vi ∩ Vj = φ. We label each edge Vi Vj ∈ S with the set Vij = Vi ∩ Vj , which is a non-empty complete separator in G. The removal of any link Vi Vj ∈ S disconnects T into two subtrees which we denote T (i) and T (j) (chosen so that T (i) contains Vi ). We will let K(i) be the nodes of T (i) , and V (i) = ∪V ∈K (i) V be the set of vertices corresponding to the subtree T (i) . The junction tree property ensures that V (i) ∩ V (j) = Vi ∩ Vj = Vij . We will let G(i) be the subgraph induced by V (i) . A graphical model is a pair (G, P ) where P is the joint probability distribution for random variables X1 , X2 , . . . , Xn , and G is a graph with vertex set V (G) = {X1 , X2 , . . . , Xn } such that the separators in G imply conditional independencies in P (so P factors according to G). If G is triangulated, then the junction-tree algorithm can be used for exact inference in the probability distribution P . The complexity of this algorithm grows with the treewidth of G (which is one less than the size of the largest clique in G when G is triangulated). The growth is exponential when P is a discrete probability distribution, thus rendering exact inference for graphs with large treewidth impractical. Therefore, we seek another graphical model (H, PH ) which allows tractable inference (so H should have lower treewidth than G has). The general problem of finding a graphical model of tree-width at most k so as to minimize the KL-divergence from a specified probability distribution is NP complete for general k ([9]) However, it is known that this problem is solvable in polynomial time (in |V (G)|) for some special cases cases (such as when G has bounded treewidth or when k = 1 [1]). If (G, PG ) and (H, PH ) are graphical models, then we say that (H, PH ) is a subgraphical model of (G, PG ) if H is a spanning subgraph of G. Note in particular that separators in G are separators in H, and hence (G, PH ) is also a graphical model. 2 Graph Decompositions and Divide-and-Conquer Algorithms For the remainder of the paper, we will be assuming that G = (V, E) is some triangulated graph, with junction tree T = (K, S). As observed above, if Vi Vj ∈ S, then the removal {b, c, d} d {b, c} {b, e, c} b c c {c, f, g} {f, c} {e, c, f } g {b, e} a e e f {a, b, e} Figure 2: The graphs G(i) , G(j) and junction-trees T (i) and T (j) resulting from the removal of the link Vij = {c, e} of Vij = Vi ∩ Vj disconnects G into two (vertex-induced) subgraphs G(i) and G(j) which are both triangulated, with junction-trees T (i) and T (j) respectively. We can recursively decompose each of G(i) and G(j) into smaller and smaller subgraphs till the resulting subgraphs are cliques. When the size of all the minimal separators are bounded, we may use these decompositions to easily solve problems that are hard in general. For example, in [5] it is shown that NP-complete problems like vertex coloring, and finding maximum independent sets can be solved in polynomial time on graphs with bounded tree-width (which are equivalent to spanning graphs with bounded size separators). We will be interested in finding (triangulated) subgraphs of G that satisfy some conditions, such as a bound on the number of edges, or a bound on the tree-width and which optimize separable objective functions (described in Section 2) One reason why problems such as this can often be solved easily when the tree-width of G is bounded by some constant is this : If Vij is a separator decomposing G into G(i) and G(j) , then a divide-and-conquer approach would suggest that we try and find optimal subgraphs of G(i) and G(j) and then splice the two together to get an optimal subgraph of G. There are two issues with this approach. First, the optimal subgraphs of G (i) and G(j) need not necessarily match up on Vij , the set of common vertices. Second, even if the two subgraphs agree on the set of common vertices, the graph resulting from splicing the two subgraphs together need not be triangulated (which could happen even if the two subgraphs individually are triangulated). To rectify the situation, we can do the following. We partition the set of subgraphs of G(i) and G(j) into classes, so that any subgraph of G(i) and any subgraph G(j) corresponding to the same class are compatible in the sense that they match up on their intersection namely Vij , and so that by splicing the two subgraphs together, we get a subgraph of G which is acceptable (and in particular is triangulated). Then given optimal subgraphs of both G(i) and G(j) corresponding to each class, we can enumerate over all the classes and pick the best one. Of course, to ensure that we do not repeatedly solve the same problem, we need to work bottom-up (a.k.a dynamic programming) or memoize our solutions. This procedure can be carried out in polynomial (in |V |) time as long as we have only a polynomial number of classes. Now, if we have a polynomial number of classes, these classes need not actually be a partition of all the acceptable subgraphs, though the union of the classes must cover all acceptable subgraphs (so the same subgraph can be contained in more than one class). For our application, every class can be thought of to be the set of subgraphs that satisfy some constraint, and we need to pick a polynomial number of constraints that cover all possibilities. The bound on the tree-width helps us here. If k |Vij | = k, then in any subgraph H of G, H[Vij ] must be one of the 2(2) possible subgraphs k of G[Vij ]. So, if k is sufficiently small (so 2(2) is bounded by some polynomial in |V |), then this procedure results in a polynomial time algorithm. In this paper, we show that in some cases we can characterize the space H so that we still have a polynomial number of constraints even when the tree-width of G is not bounded by a small constant. 2.1 Separable objective functions For cases where exact inference in the graphical model (G, PG ) is intractable, it is natural to try to find a subgraphical model (H, PH ) such that D(PG PH ) is minimized, and inference using H is tractable. We will denote by H the set of subgraphs of G that are tractable for inference. For example, this set could be the set of subgraphs of G with treewidth one less than the treewidth of G, or perhaps the set of subgraphs of G with at d fewer edges. For a specified subgraph H of G, there is a unique probability distribution PH factoring over H that minimizes D(PG PH ). Hence, finding a optimal subgraphical model is equivalent to finding a subgraph H for which D(PG PH ) is minimized. If Vij is a separator of G, we will attempt to find optimal subgraphs of G by finding optimal subgraphs of G (i) and G(j) and splicing them together. However, to do this, we need to ensure that the objective criteria also decomposes along the separator Vij . Suppose that H is any triangulated subgraph of G. Let PG(i) and PG(j) be the (marginalized) distributions of PG on V (i) and V (j) respectively, and PH (i) and PH (j) be the (marginalized) distributions of the distribution PH on V (i) and V (j) where H (i) = H[V (i) ] and H (j) = H[V (j) ], The following result assures us that the KL-divergence also factors according to the separator Vij . Lemma 1. Suppose that (G, PG ) is a graphical model, H is a triangulated subgraph of G, and PH factors over H. Then D(PG PH ) = D(PG(i) PH (i) ) + D(PG(j) PH (j) ) − D(PG[Vij ] PH[Vij ] ). Proof. Since H is a subgraph of G, and Vij is a separator of G, Vij must also be a separator of H. Therefore, PH {Xv }v∈V = PH (i) ({Xv }v∈V (i) )·PH (j) ({Xv }v∈V (j) ) . PH[Vij ] ({Xv }v∈V ) The result ij follows immediately. Therefore, there is hope that we can reduce our our original problem of finding an optimal subgraph H ∈ H as one of finding subgraphs of H (i) ⊆ G(i) and H (j) ⊆ G(j) that are compatible, in the sense that they match up on the overlap Vij , and for which D(PG PH ) is minimized. Throughout this paper, for the sake of concreteness, we will assume that the objective criterion is to minimize the KL-divergence. However, all the results can be extended to other objective functions, as long as they “separate” in the sense that for any separator, the objective function is the sum of the objective functions of the two parts, possibly modulo some correction factor which is purely a function of the separator. Another example might be the complexity r(H) of representing the graphical model H. A very natural representation satisfies r(G) = r(G(i) ) + r(G(j) ) if G has a separator G(i) ∩ G(j) . Therefore, the representation cost reduction would satisfy r(G) − r(H) = (r(G (i) ) − r(H (i) )) + (r(G(j) ) − r(H (j) )), and so also factors according to the separators. Finally note that any linear combinations of such separable functions is also separable, and so this technique could also be used to determine tradeoffs (representation cost vs. KL-divergence loss for example). In Section 4 we discuss some issues regarding computing this function. 2.2 Decompositions and decomposition trees For the algorithms considered in this paper, we will be mostly interested in the decompositions that are specified by the junction tree, and we will represent these decompositions by a rooted tree called a decomposition tree. This representation was introduced in [2, 3], and is similar in spirit to Darwiche’s dtrees [6] which specify decompositions of directed acyclic graphs. In this section and the next, we show how a decomposition tree for a graph may be constructed, and show how it is used to solve a number of optimization problems. abd; ce; gf a; be; cd d; bc; e abe dbc ebc e; cf ; g cef cf g Figure 3: The separator tree corresponding to Figure 1 A decomposition tree for G is a rooted tree whose vertices correspond to separators and cliques of G. We describe the construction of the decomposition tree in terms of a junctiontree T = (K, S) for G. The interior nodes of the decomposition tree R(T ) correspond to S (the links of T and hence the minimal separators of G). The leaf or terminal nodes represent the elements of K (the nodes of T and hence the maximal cliques of G). R(T ) can be recursively constructed from T as follows : If T consists of just one node K, (and hence no edges), then R consists of just one node, which is given the label K as well. If however, T has more than one node, then T must contain at least one link. To begin, let Vi Vj ∈ S be any link in T . Then removal of the link Vi Vj results in two disjoint junctiontrees T (i) and T (j) . We label the root of R by the decomposition (V (i) ; Vij ; V (j) ). The rest of R is recursively built by successively picking links of T (i) and T (j) (decompositions of G(i) and G(j) ) to form the interior nodes of R. The effect of this procedure on the junction tree of Figure 1 is shown in Figure 3, where the decomposition associated with the interior nodes is shown inside the nodes. Let M be the set of all nodes of R(T ). For any interior node M induced by the the link Vi Vj ∈ S of T , then we will let M (i) and M (j) represent the left and right children of M , and R(i) and R(j) be the left and right trees below M . 3 3.1 Finding optimal subgraphical models Optimal sub (k − 1)-trees of k-trees Suppose that G is a k-tree. A sub (k − 1)-tree of G is a subgraph H of G that is (k − 1)tree. Now, if Vij is any minimal separator of G, then both G(i) and G(j) are k-trees on vertex sets V (i) and V (j) respectively. It is clear that the induced subgraphs H[V (i) ] and H[V (j) ] are subgraphs of G(i) and G(j) and are partial (k − 1)-trees. We will be interested in finding sub (k − 1)-trees of k trees and this problem is trivial by the result of [1] when k = 2. Therefore, we assume that k ≥ 3. The following result characterizes the various possibilities for H[Vij ] in this case. Lemma 2. Suppose that G is a k-tree, and S = Vij is a minimal separator of G corresponding to the link ij of the junction-tree T . In any (k − 1)-tree H ⊆ G either 1. There is a u ∈ S such that u is not connected to vertices in both V (i) \ S and V (j) \ S. Then S \ {u} is a minimal separator in H and hence is complete. 2. Every vertex in S is connected to vertices in both V (i) \S and V (j) \S. Then there are vertices {x, y} ⊆ S such that the edge H[S] is missing only the edge {x, y}. Further either H[V (i) ] or H[V (j) ] does not contain a unchorded x-y path. Proof. We consider two possibilities. In the first, there is some vertex u ∈ S such that u is not connected to vertices in both V (i) \S and V (j) \. Since the removal of S disconnects G, the removal of S must also disconnect H. Therefore, S must contain a minimal separator of H. Since H is a (k − 1)-tree, all minimal separators of H must contain k − 1 vertices which must therefore be S \{u}. This corresponds to case (1) above. Clearly this possiblity can occur. If there is no such u ∈ S, then every vertex in S is connected to vertices in both V (i) \ S and V (j) \ S. If x ∈ S is connected to some yi ∈ V (i) \ S and yj ∈ V (j) \ S, then x is contained in every minimal yi /yj separator (see [5]). Therefore, every vertex in S is part of a minimal separator. Since each minimal separator contains k − 1 vertices, there must be at least two distinct minimum separators contained in S. Let Sx = S \ {x} and Sy = S \ {y} be two distinct minimal separators. We claim that H[S] contains all edges except the edge {x, y}. To see this, note that if z, w ∈ S, with z = w and {z, w} = {x, y} (as sets), then either {z, w} ⊂ Sy or {z, w} ⊂ Sx . Since both Sx and Sy are complete in H, this edge must be present in H. The edge {x, y} is not present in H[S] because all minimal separators in H must be of size k − 1. Further, if both V (i) and V (j) contain an unchorded path between x and y, then by joining the two paths at x and y, we get a unchorded cycle in H which contradicts the fact that H is triangulated. Therefore, we may associate k · 2 + 2 · k constraints with each separator Vij of G as 2 follows. There are k possible constraints corresponding to case (1) above (one for each choice of x), and k · 2 choices corresponding to case (2) above. This is because for each 2 pair {x, y} corresponding to the missing edge, we have either V (i) contains no unchorded xy paths or V (j) contains no unchorded xy paths. More explicitly, we can encode the set of constraints CM associated with each separator S corresponding to an interior node M of the decomposition tree as follows: CM = { (x, y, s) : x ∈ S, y ∈ S, s ∈ {i, j}}. If y = x, then this corresponds to case (1) of the above lemma. If s = i, then x is connected only to H (i) and if s = j, then x is connected only to H (j) . If y = x, then this corresponds to case (2) in the above lemma. If s = i, then H (i) does not contain any unchorded path between x and y, and there is no constraint on H (j) . Similarly if s = j, then H (j) does not contain any unchorded path between x and y, and there is no constraint on H (i) . Now suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively, then it is clear that if H (i) and H (j) both satisfy the same constraint they must match up on the common vertices Vij . Therefore to splice together two solutions corresponding to the same constraint, we only need to check that the graph obtained by splicing the graphs is triangulated. Lemma 3. Suppose that H (i) and H (j) are triangulated subgraphs of G(i) and G(j) respectively such that both of them satisfy the same constraint as described above. Then the graph H obtained by splicing H (i) and H (j) together is triangulated. Proof. Suppose that both H (i) and H (j) are both triangulated and both satisfy the same constraint. If both H (i) and H (j) satisfy the same constraint corresponding to case (1) in Lemma 2 and H has an unchorded cycle, then this cycle must involve elements of both H (i) and H (j) . Therefore, there must be two vertices of S \{u} on the cycle, and hence this cycle has a chord as S \ {u} is complete. This contradiction shows that H is triangulated. So assume that both of them satisfy the constraint corresponding to case (2) of Lemma 2. Then if H is not triangulated, there must be a t-cycle (for t ≥ 4) with no chord. Now, since {x, y} is the only missing edge of S in H, and because H (i) and H (j) are individually triangulated, the cycle must contain x, y and vertices of both V (i) \ S and V (j) \ S. We may split this unchorded cycle into two unchorded paths, one contained in V (i) and one in V (j) thus violating our assumption that both H (i) and H (j) satisfy the same constraint. If |S| = k, then there are 2k + 2 · k ∈ O(k 2 ) ∈ O(n2 ). We can use a divide and conquer 2 strategy to find the optimal sub (k − 1) tree once we have taken care of the base case, where G is just a single clique (of k + 1) elements. However, for this case, it is easily checked that any subgraph of G obtained by deleting exactly one edge results in a (k − 1) tree, and every sub (k−1)-tree results from this operation. Therefore, the optimal (k−1)-tree can be found using Algorithm 1, and in this case, the complexity of Algorithm 1 is O(n(k + 1) 2 ). This procedure can be generalized to find the optimal sub (k − d)- tree for any fixed d. However, the number of constraints grows exponentially with d (though it is still polynomial in n). Therefore for small, fixed values of d, we can compute the optimal sub (k − d)-tree of G. While we can compute (k − d)-trees of G by first going from a k tree to a (k − 1) tree, then from a (k − 1)-tree to a (k − 2)-tree, and so on in a greedy fashion, this will not be optimal in general. However, this might be a good algorithm to try when d is large. 3.2 Optimal triangulated subgraphs with |E(G)| − d edges Suppose that we are interested in a (triangulated) subgraph of G that contains d fewer edges that G does. That is, we want to find an optimal subgraph H ⊂ G such that |E(H)| = |E(G)| − d. Note that by the result of [4] there is always a triangulated subgraph with d fewer edges (if d < |E(G)|). Two possibilities for finding such an optimal subgraph are 1. Use the procedure described in [4]. This is a greedy procedure which works in d steps by deleting an edge at each step. At each state, the edge is picked from the set of edges whose deletion leaves a triangulated graph. Then the edge which causes the least increase in KL-divergence is picked at each stage. 2. For each possible subset A of E(G) of size d, whose deletion leaves a triangulated graph, compute the KL divergence using the formula above, and then pick the optimal one. Since there are |E(G)| such sets, this can be done in polynomial d time (in |V (G)|) when d is a constant. The first greedy algorithm is not guaranteed to yield the optimal solution. The second takes time that is O(n2d ). Now, let us solve this problem using the framework we’ve described. Let H be the set of subgraphs of G which may be obtained by deletion of d edges. For each M = ij ∈ M corresponding to the separator Vij , let CM = (l, r, c, s, A) : l + r − c = d, s a d bit string, A ∈ E(G[Vij ]) . The constraint reprec sented by (l, r, c, A) is this : A is a set of d edges of G[Vij ] that are missing in H, l edges are missing from the left subgraph, and r edges are missing from the right subgraph. c represents the double count, and so is subtracted from the total. If k is the size of the largest k clique, then the total number of such constraints is bounded by 2d · 2d · (2) ∈ O(k 2d ) d which could be better than O(n2d ) and is polynomial in |V | when d is constant. See [10] for additional details. 4 Conclusions Algorithm 1 will compute the optimal H ∈ H for the two examples discussed above and is polynomial (for fixed constant d) even if k is O(n). In [10] a generalization is presented which will allow finding the optimal solution for other classes of subgraphical models. Now, we assume an oracle model for computing KL-divergences of probability distributions on vertex sets of cliques. It is clear that these KL-divergences can be computed R ← separator-tree for G; for each vertex M of R in order of increasing height (bottom up) do for each constraint cM of M do if M is an interior vertex of R corresponding to edge ij of the junction tree then Let Ml and Mr be the left and right children of M ; Pick constraint cl ∈ CMl compatible with cM to minimize table[Ml , cl ]; Pick constraint cr ∈ CMr compatible with cM to minimize table[Mr , cr ]; loss ← D(PG [M ] PH [M ]); table[M, cM ] ← table[Ml , cl ] + table[Mr , cr ] − loss; else table[M, cM ] ← D(PG [M ] PH [M ]); end end end Algorithm 1: Finding optimal set of constraints efficiently for distributions like Gaussians, but for discrete distributions this may not be possible when k is large. However even in this case this algorithm will result in only polynomial calls to the oracle. The standard algorithm [3] which is exponential in the treewidth will make O(2k ) calls to this oracle. Therefore, when the cost of computing the KL-divergence is large, this algorithm becomes even more attractive as it results in exponential speedup over the standard algorithm. Alternatively, if we can compute approximate KL-divergences, or approximately optimal solutions, then we can compute an approximate solution by using the same algorithm. References [1] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees”, IEEE Transactions on Information Theory, v. 14, 1968, Pages 462–467. [2] F. Gavril, “Algorithms on clique separable graphs”, Discrete Mathematics v. 9 (1977), pp. 159–165. [3] R. E. Tarjan. “Decomposition by Clique Separators”, Discrete Mathematics, v. 55 (1985), pp. 221–232. [4] U. Kjaerulff. “Reduction of computational complexity in Bayesian networks through removal of weak dependencies”, Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 374–382, 1994. [5] T. Kloks, “Treewidth: Computations and Approximations”, Springer-Verlag, 1994. [6] A. Darwiche and M. Hopkins. “Using recursive decomposition to construct elimination orders, jointrees and dtrees”, Technical Report D-122, Computer Science Dept., UCLA. [7] S. Lauritzen. “Graphical Models”, Oxford University Press, Oxford, 1996. [8] T. A. McKee and F. R. McMorris. “Topics in Intersection Graph Theory”, SIAM Monographs on Discrete Mathematics and Applications, 1999. [9] D. Karger and N. Srebro. “Learning Markov networks: Maximum bounded tree-width graphs.” In Symposium on Discrete Algorithms, 2001, Pages 391-401. [10] M. Narasimhan and J. Bilmes. “Optimization on separator-clique trees.”, Technical report UWEETR 2004-10, June 2004.
2 0.6571722 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms
Author: Omid Madani, David M. Pennock, Gary W. Flake
Abstract: In the context of binary classification, we define disagreement as a measure of how often two independently-trained models differ in their classification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plentiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (generalization) error, and a tight upper bound on the “variance of prediction error”, or the variance of the average error across instances, where variance is measured across training sets. We present experimental results on several data sets exploring co-validation for error estimation and model selection. The procedure is especially effective in active learning settings, where training sets are not drawn at random and cross validation overestimates error. 1
3 0.65396708 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
Author: Scott J. Gaffney, Padhraic Smyth
Abstract: Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.
4 0.62571824 48 nips-2004-Convergence and No-Regret in Multiagent Learning
Author: Michael Bowling
Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1
5 0.490576 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes
Author: Yoshitatsu Matsuda, Kazunori Yamaguchi
Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.
6 0.48783299 160 nips-2004-Seeing through water
7 0.48333338 19 nips-2004-An Application of Boosting to Graph Classification
8 0.47175068 150 nips-2004-Proximity Graphs for Clustering and Manifold Learning
9 0.47146347 161 nips-2004-Self-Tuning Spectral Clustering
10 0.47009668 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
11 0.46971819 108 nips-2004-Markov Networks for Detecting Overalpping Elements in Sequence Data
12 0.46941856 45 nips-2004-Confidence Intervals for the Area Under the ROC Curve
13 0.46930003 44 nips-2004-Conditional Random Fields for Object Recognition
14 0.46887302 23 nips-2004-Analysis of a greedy active learning strategy
15 0.4682956 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification
16 0.4681803 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
17 0.46744713 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
18 0.46716502 72 nips-2004-Generalization Error and Algorithmic Convergence of Median Boosting
19 0.46612269 7 nips-2004-A Large Deviation Bound for the Area Under the ROC Curve
20 0.4659892 77 nips-2004-Hierarchical Clustering of a Mixture Model