jmlr jmlr2006 jmlr2006-25 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
Reference: text
sentIndex sentText sentNum sentScore
1 Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. [sent-4, score-0.703]
2 Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. [sent-5, score-0.83]
3 Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base 1. [sent-6, score-0.536]
4 Typical roles of the two graphs are that of a model graph from a model data base or a prototype data base and that of an instance graph representing an ’as is’ structure which is encountered ’at run time’. [sent-16, score-0.554]
5 Moreover, the graphs are simple which means that there is at most one edge between any two vertices and there are no loops so that no edge begins and ends in the same vertex. [sent-36, score-0.619]
6 The graph vertices denote objects or states and the edge labels denote distances, transition times etc. [sent-49, score-0.663]
7 1 Problem Formulation The best approximation of a labeled graph by another labeled graph is defined by a subisomorphism of the vertex set of the first graph to the vertex set of the second graph. [sent-52, score-2.065]
8 Thereby edge labels of the first graph are approximated by corresponding edge labels of the second graph as minimum sum of squared differences. [sent-53, score-0.898]
9 The first graph has n = |V1 | vertices and the second graph has m = |V2 | vertices with n ≤ m. [sent-55, score-0.92]
10 In order to be well defined, the problem entails a technical condition that, whenever two vertices of the first graph are joined by an edge, the images of the two vertices must be joined by an edge in the second graph. [sent-59, score-0.917]
11 The edge labelings then denote distance matrices D 1 , D2 with entries D1 (vi , v j ) = l1 (vi , v j ), for vi = v j D2 (wi , w j ) = 0, for vi = v j l2 (wi , w j ), for wi = w j 0, for wi = w j . [sent-62, score-0.8]
12 The best approximation objective can then be written as follows since counting over all edges amounts to counting twice over all vertex pairs. [sent-63, score-0.598]
13 Suppose H1 and H2 are two unlabeled graphs with same number of vertices and arbitrary edge sets. [sent-73, score-0.5]
14 Each graph is extended to the complete graph on its vertex set and receives the edge labels li (e) := 1, if edge e belongs to original graph Hi 0, if edge e does not belong to original graph Hi , i = 1, 2. [sent-74, score-1.84]
15 The single edge graph is best approximated by that edge from the larger graph whose label comes closest to the label of the single-edge graph. [sent-79, score-0.787]
16 Best graph approximation can be considered as search for a minimum weight clique of given size in a suitably defined graph, the association graph or correspondence graph. [sent-81, score-0.564]
17 Each vertex of one graph is paired with each vertex of the other graph and each such pair is identified with a vertex of the association graph. [sent-85, score-1.697]
18 Two vertices of the association graph are joined by an edge if the two vertices stem from four distinct vertices of the original graphs. [sent-86, score-1.121]
19 The subisomorphism is ϕ(v1 ) = w1 , ϕ(v2 ) = w4 and ϕ(v3 ) = w3 with the two vertices w2 and w5 being unattained. [sent-89, score-0.638]
20 cost of the edge between the vertices (vi , w j ) and (vk , wl ) is set equal to (D1 (vi , vk ) − D2 (w j , wl ))2 . [sent-93, score-0.638]
21 The meaning of selecting any vertex of the form (vi , w j ) is that the original vertex vi is mapped to the original vertex w j by a subisomorphism. [sent-94, score-1.45]
22 These ”associations” motivate the name association graph and the cost of a clique, which equals the cost sum over all edges between selected vertices, is the objective of graph approximation. [sent-95, score-0.764]
23 i=1 m j=1 The binary variable xi j attaining value one means that vertex vi is assigned to vertex w j and that variable attaining the value zero means that this assignment is not valid. [sent-103, score-1.231]
24 The vector x has n · m coordinates and C is an n · m × n · m matrix denoting the cost incurred by pairwise assignments; the assignments vi → w j and vk → wl entail the cost (D1 (vi , vk ) − D2 (w j , wl ))2 . [sent-104, score-0.777]
25 The indicated clique of the vertices (v1 , w4 ), (v2 , w2 ) and (v3 , w3 ) denotes the subisomorphism ϕ(v1 ) = w4 , ϕ(v2 ) = w2 and ϕ(v3 ) = w3 . [sent-106, score-0.671]
26 That problem does not admit edge labels and it considers the two operations of vertex removal and edge removal for the transition from the larger to the smaller graph. [sent-126, score-0.753]
27 Here, only vertex removals matter and edge removals are allowed only in so far as they are implied by vertex removals. [sent-129, score-1.022]
28 One graph is therefore modified by a minimum number of vertex insertions and deletions and by edge insertions and deletions. [sent-131, score-0.779]
29 Conflicts of tentatively assigning several vertices of the smaller graph to one vertex of the larger graph are resolved in a hierarchical manner. [sent-137, score-1.138]
30 When these are dropped, that is, when numbers are given as mere list entries of one list and when the numbers are viewed as Euclidean distances on the real line, a complete line graph may be searched for such that the given numbers form a coherent distance labeling. [sent-161, score-0.588]
31 The distance list of a vertex contains the labels of all edges that are incident with the vertex. [sent-191, score-0.817]
32 The distance list of a vertex v is denoted by distlist(v). [sent-193, score-0.625]
33 A distance list in a complete graph with edge labels D(·, ·) is given by distlist(v) = (D(v, vi ))vi ∈V −{v} . [sent-194, score-0.893]
34 Distance lists generalize vertex degrees since they count the ”ones” when unlabeled graphs receive the binary edge labeling as given above for the embedding of the graph isomorphism problem. [sent-196, score-1.108]
35 In the foregoing example, selections of two out of the four coordinates from the distance lists of the five-vertex graph are made. [sent-203, score-0.524]
36 The approximation of a distance list by a selection from a larger distance list can be formulated as a weighted or cost minimal assignment problem. [sent-204, score-0.772]
37 Each coordinate receives one vertex and each coordinate of one distance list is connected by an edge to each coordinate of the other distance list. [sent-212, score-0.862]
38 r rrdt rr d t d t rry m−1 Figure 4: Complete bipartite graph for two distance lists. [sent-227, score-0.542]
39 The vertices correspond to the coordinates of the distance lists rather than to the vertices of the original graphs. [sent-228, score-0.697]
40 i i=1 Alternatively, the best approximation problem for distance lists can be formulated as cost minimal perfect matching problem and as a cost minimal integral flow problem with flow value set to level n − 1, compare with Section 4. [sent-235, score-0.581]
41 The best approximation of a distance list distlist(v) by the distance list distlist(w) is denoted as the projection pr(distlist(w), distlist(v)). [sent-236, score-0.502]
42 2 Best Approximations by Distance Lists The idea of cost minimal assignments can be carried over from distance lists of single vertices to the whole graph. [sent-240, score-0.62]
43 vn r r dt rr d t rtrw d r m Figure 5: Complete bipartite graph for a heuristic solution of best graph approximation. [sent-258, score-0.7]
44 A greedy heuristic for best graph approximation can now be based on iterative decisions according to minimum distance list errors. [sent-260, score-0.51]
45 While the sets A and B of unassigned vertices decrease along the iterations of the algorithm, the distance lists to consider become fewer but not smaller. [sent-277, score-0.522]
46 The resulting subisomorphism is given by selecting the minima for each vertex from the smaller graph. [sent-281, score-0.811]
47 An optimal solution of this problem (which still need not lead to the best graph approximation since the assignment problem is only an approximative encoding for best graph approximation problem) can be found by any weighted assignment algorithm for bipartite graphs. [sent-284, score-1.042]
48 Therefore, all vertices of the smaller graph are connected to an extra source vertex and all vertices of the larger graph are connected to an extra sink vertex. [sent-286, score-1.372]
49 This flow amounts to a cost minimal assignment of all vertices from the smaller graph. [sent-290, score-0.529]
50 A different solution for cost minimal assignment problems can be obtained from straightforward transformations to cost minimal perfect matching problems by introducing additional vertices for the smaller graph. [sent-293, score-0.682]
51 All dummy vertices for the smaller graph are connected to all vertices from the larger graph by edges with zero cost. [sent-295, score-1.031]
52 A matching is perfect if each vertex of the graph is covered by an edge. [sent-297, score-0.673]
53 The edges that are incident either to the source or the sink carry capacities but no cost coefficients while the edges with cost coefficients do not carry capacities. [sent-315, score-0.509]
54 3 Direct Methods Direct methods for graph approximation will use the original edge labels and the unaltered best approximation errors for all fitting assessments. [sent-317, score-0.561]
55 1 S EQUENTIAL A SSIGNMENTS A subisomorphism can be constructed by sequentially assigning vertices from the smaller graph to the larger graph so that the sum of squared label distances over all new edge pairs is minimal. [sent-320, score-1.328]
56 Selection of one vertex v1 of the two vertices incident with e0 . [sent-327, score-0.682]
57 Selection of one vertex w1 of the two vertices incident with f 0 . [sent-328, score-0.682]
58 2 I MPROVEMENTS Whenever a subisomorphism is not optimal or not known to be optimal, improvements can be aimed at by swapping two vertex assignments or by swapping an assigned with an unassigned vertex. [sent-346, score-1.046]
59 First, two vertices vi , v j , 1 ≤ i = j ≤ n are considered for swapping their assignments while all other assignments are preserved. [sent-352, score-0.655]
60 The new subisomorphism leads to a smaller approximation error if and only if n ∑ l1 (vi , vk ) − l1 (v j , vk ) · l2 (w j , wk ) − l2 (wi , wk ) > 0. [sent-354, score-0.795]
61 Second, one vertex v i , 1 ≤ i ≤ n is considered for changing its assignment to an unassigned vertex w 0 ∈ V2 − ϕ(V1 ), that is, the present subisomorphism is compared to the new subisomorphism ϕ (vk ) = w0 if k = i wk if k = i. [sent-356, score-1.986]
62 The new subisomorphism leads to a smaller approximation error if and only if n ∑ k=1,k=i 2 2 l2 (wi , wk ) − l2 (w0 , wk ) > 2 n ∑ l1 (vi , vk ) · l2 (wi , wk ) − l2 (w0 , wk ) . [sent-357, score-0.888]
63 Each vertex from that set is paired with all vertices from the second vertex set which amounts to considering their common vertices in the association graph. [sent-383, score-1.297]
64 This minimization implies a function from the first vertex set into the second vertex set by vi → wµ(i) . [sent-392, score-1.046]
65 2 I MPROVEMENT After initialization, the relaxation method proceeds by iteratively selecting an overassigned vertex and redirecting or backtracking at least one assignment to a yet unassigned vertex. [sent-408, score-0.745]
66 All edges of the current structure are oriented as backward edges from the second vertex set to the first vertex set and all other edges are oriented as forward edges from the first to the second vertex set, see Figure 8. [sent-412, score-1.656]
67 The wave front begins in that vertex and ends in a suitable vertex from the second set from which no edge leads back into the first set. [sent-417, score-1.048]
68 The labels are potentials which equal the cost of functions from the first vertex set or a subset thereof into the second vertex set. [sent-419, score-0.949]
69 The set S(u) denotes the set of all immediate successors of vertex u which is the set of all vertices to which an edge points from vertex u. [sent-439, score-1.185]
70 The graph vertices which are not contained in the list are permanently labeled. [sent-441, score-0.544]
71 The algorithm terminates as soon as the first yet unassigned vertex from the second set receives a permanent label. [sent-442, score-0.529]
72 It may occur during wave front propagation that an already assigned vertex from the second set is permanently labeled. [sent-443, score-0.498]
73 The node potential algorithm terminates with exactly one additional vertex assignment from the second graph. [sent-445, score-0.589]
74 2 can be obtained from the mode potential algorithm NoPo by starting at a vertex that is attained exactly once (with all vertices of the second set being attained at most once). [sent-449, score-0.635]
75 The idea is that of selecting a vertex from the larger graph as pivot element. [sent-465, score-0.671]
76 This means that best graph approximation is tentatively reduced to only those subisomorphisms which attain that vertex and the optimal of such pivot-subisomorphisms is computed exactly or approximately. [sent-466, score-0.908]
77 Eventually, all vertices of the second graph are chosen as pivot elements and the pivoting-subisomorphism with smallest objective value is reported as the best one. [sent-467, score-0.535]
78 Any subisomorphism which attains the pivot element w j0 ∈ V2 from some vertex vi0 ∈ V1 is denoted by ϕi0 → j0 . [sent-469, score-0.872]
79 A candidate for the best subisomorphism of this type will be computed and the 2079 ¨ K AMPKE best ϕ j0 of these over all vertices from the first graph is selected by independent computations. [sent-470, score-0.941]
80 r rrdt rr d t t rrw d 2 m vd r i0 d rdj w 0 d Figure 9: Complete bipartite graph for pivot vertex and preselected vertex from the first graph. [sent-492, score-1.274]
81 1 Flows Best graph approximation can be formulated as a cost minimal flow problem in loose analogy to the distance list heuristic. [sent-518, score-0.659]
82 The main problem of transforming the best graph approximation into a flow problem is that subisomorphisms refer to vertices while the costs refer to edge pairs. [sent-521, score-0.836]
83 The cost issue is therefore dealt by introducing a biquadratic number of network vertices that represent all possible edge pairings induced by the vertex assignments. [sent-522, score-0.865]
84 Each pair of distinct vertices from the smaller graph may correspond to each pair of distinct vertices from the larger graph. [sent-523, score-0.691]
85 The incurred cost is an edge label with the edge connecting a network vertex (vi , v j , wk , wl ) with the sink in the flow network. [sent-524, score-0.963]
86 A subisomorphism in the network is specified by all considering each vertex v i of the smaller graph and all its possible assignments by introducing the network vertices (v i , w1 ), . [sent-525, score-1.347]
87 Since each vertex of the second graph is attained at most once by any subisomorphism, the edges into the network vertices (v 1 , w j ), . [sent-530, score-0.975]
88 , (vi , wm ) together allow an inflow of strength one only for each of the vertices vi . [sent-554, score-0.529]
89 The flow from the source vertex to each of the graph vertices is bounded by one and the flow out of each 1 vertex (vi , w j ) is bounded by n−1 . [sent-557, score-1.268]
90 The reason for this bound is that each vertex of the smaller graph 2081 ¨ K AMPKE is incident with all n − 1 other vertices and thus n − 1 edges of the smaller graph are incident with each vertex. [sent-558, score-1.298]
91 A best graph approximation amounts to a cost minimal flow of strength n from source to sink such that the flows along the edges between all vi and (vi , w j ) are integer. [sent-576, score-0.822]
92 A partial subisomorphism is a subisomorphism defined over a subset of the vertex set of the smaller graph. [sent-581, score-1.249]
93 The domain of a partial subisomorphism is denoted by A(V1 ) and the partial subisomorphism itself is denoted by ϕ|A(V1 ) . [sent-582, score-0.922]
94 The special case of the partial subisomorphism being defined over the complete vertex set of the smaller graph is denoted by ϕ = ϕ|V1 . [sent-583, score-1.119]
95 A lower bound for the approximation distance of a partial subisomorphism can be obtained by independently minimizing distances between unassigned and assigned vertices. [sent-584, score-0.741]
96 The vertex set of the bipartite graph for the assignment problem consist of both sets of unassigned vertices which are V1 − A(V1 ) and V2 − ϕ(A(V1 )). [sent-588, score-1.188]
97 r rrdt rr d t V1 − A(V1 ) t d rr V2 − ϕ(A(V1 )) Figure 12: Assignment problem for improved lower bound of partial subisomorphism ϕ | A(V1 ) . [sent-602, score-0.706]
98 The value M is the approximation distance of the best subisomorphism found so far and L opt is a list of all these best subisomorphisms. [sent-611, score-0.725]
99 Whenever a partial subisomorphism is not pruned by the bounding step, the next iteration of step 2 may or may not select a refinement of this partial subisomorphism to continue with. [sent-636, score-0.876]
100 Conclusion Best graph approximation has been formulated for labeled graphs in analogy to isomorphism for unlabeled graphs. [sent-638, score-0.631]
wordName wordTfidf (topN-words)
[('subisomorphism', 0.407), ('vertex', 0.404), ('distlist', 0.26), ('vi', 0.238), ('vertices', 0.231), ('graph', 0.229), ('assignment', 0.185), ('subisomorphisms', 0.147), ('edge', 0.146), ('ampke', 0.124), ('cap', 0.124), ('isomorphism', 0.12), ('rr', 0.117), ('distance', 0.114), ('istance', 0.113), ('val', 0.112), ('edges', 0.111), ('imilarity', 0.096), ('tructural', 0.096), ('graphs', 0.096), ('lopt', 0.091), ('unassigned', 0.091), ('wk', 0.088), ('lists', 0.086), ('ow', 0.085), ('cost', 0.084), ('list', 0.084), ('vk', 0.083), ('imp', 0.079), ('dl', 0.079), ('assignments', 0.076), ('wm', 0.06), ('wave', 0.06), ('similarity', 0.06), ('labels', 0.057), ('distances', 0.052), ('branch', 0.051), ('subgraph', 0.049), ('grid', 0.049), ('bipartite', 0.048), ('sink', 0.048), ('wl', 0.047), ('incident', 0.047), ('approximation', 0.046), ('messmer', 0.045), ('tentatively', 0.045), ('wiskott', 0.045), ('patterns', 0.043), ('analogy', 0.041), ('ows', 0.041), ('labeled', 0.04), ('matching', 0.04), ('vn', 0.04), ('pivot', 0.038), ('submodular', 0.038), ('best', 0.037), ('coordinates', 0.035), ('inexact', 0.034), ('ahuja', 0.034), ('digest', 0.034), ('foregoing', 0.034), ('overassigned', 0.034), ('overassignments', 0.034), ('pami', 0.034), ('permanent', 0.034), ('removals', 0.034), ('rrdt', 0.034), ('shasha', 0.034), ('swapping', 0.034), ('squared', 0.034), ('front', 0.034), ('clique', 0.033), ('formulated', 0.032), ('wi', 0.032), ('partial', 0.031), ('relaxation', 0.031), ('minimal', 0.029), ('edit', 0.029), ('garey', 0.029), ('swaps', 0.029), ('whenever', 0.028), ('association', 0.027), ('computations', 0.027), ('unlabeled', 0.027), ('selections', 0.026), ('joined', 0.026), ('dom', 0.026), ('isomorphic', 0.026), ('complete', 0.025), ('arcs', 0.024), ('capacities', 0.024), ('approximations', 0.023), ('denoted', 0.023), ('argminw', 0.023), ('bruno', 0.023), ('bunke', 0.023), ('caetano', 0.023), ('champin', 0.023), ('desper', 0.023), ('elasticity', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
2 0.18033689 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
3 0.11793937 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
Author: Shai Shalev-Shwartz, Yoram Singer
Abstract: We discuss the problem of learning to rank labels from a real valued feedback associated with each label. We cast the feedback as a preferences graph where the nodes of the graph are the labels and edges express preferences over labels. We tackle the learning problem by defining a loss function for comparing a predicted graph with a feedback graph. This loss is materialized by decomposing the feedback graph into bipartite sub-graphs. We then adopt the maximum-margin framework which leads to a quadratic optimization problem with linear constraints. While the size of the problem grows quadratically with the number of the nodes in the feedback graph, we derive a problem of a significantly smaller size and prove that it attains the same minimum. We then describe an efficient algorithm, called SOPOPO, for solving the reduced problem by employing a soft projection onto the polyhedron defined by a reduced set of constraints. We also describe and analyze a wrapper procedure for batch learning when multiple graphs are provided for training. We conclude with a set of experiments which show significant improvements in run time over a state of the art interior-point algorithm.
Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs
6 0.064134471 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs
7 0.059142426 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity
8 0.057918385 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
9 0.054950647 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
10 0.052020136 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)
11 0.046741452 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities
13 0.042016063 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies
15 0.041566744 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models
16 0.041497074 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
17 0.039557204 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
18 0.039431661 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification
19 0.038354378 35 jmlr-2006-Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
20 0.034166817 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
topicId topicWeight
[(0, 0.191), (1, -0.056), (2, -0.147), (3, 0.094), (4, -0.064), (5, 0.056), (6, 0.43), (7, -0.061), (8, -0.149), (9, 0.111), (10, -0.104), (11, 0.105), (12, 0.077), (13, 0.023), (14, 0.151), (15, -0.007), (16, 0.105), (17, 0.051), (18, -0.122), (19, 0.101), (20, 0.061), (21, -0.002), (22, 0.012), (23, 0.072), (24, -0.058), (25, -0.087), (26, -0.043), (27, -0.081), (28, 0.066), (29, -0.033), (30, 0.052), (31, -0.085), (32, -0.03), (33, 0.038), (34, -0.116), (35, 0.009), (36, 0.093), (37, -0.132), (38, 0.029), (39, 0.032), (40, 0.042), (41, 0.015), (42, 0.021), (43, 0.0), (44, 0.001), (45, -0.005), (46, -0.051), (47, -0.023), (48, 0.004), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.98240143 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
2 0.76393253 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
3 0.72344875 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
4 0.40994337 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs
Author: Alberto Roverato, Milan Studený
Abstract: This paper deals with chain graph models under alternative AMP interpretation. A new representative of an AMP Markov equivalence class, called the largest deflagged graph, is proposed. The representative is based on revealed internal structure of the AMP Markov equivalence class. More specifically, the AMP Markov equivalence class decomposes into finer strong equivalence classes and there exists a distinguished strong equivalence class among those forming the AMP Markov equivalence class. The largest deflagged graph is the largest chain graph in that distinguished strong equivalence class. A composed graphical procedure to get the largest deflagged graph on the basis of any AMP Markov equivalent chain graph is presented. In general, the largest deflagged graph differs from the AMP essential graph, which is another representative of the AMP Markov equivalence class. Keywords: chain graph, AMP Markov equivalence, strong equivalence, largest deflagged graph, component merging procedure, deflagging procedure, essential graph
Author: Shai Shalev-Shwartz, Yoram Singer
Abstract: We discuss the problem of learning to rank labels from a real valued feedback associated with each label. We cast the feedback as a preferences graph where the nodes of the graph are the labels and edges express preferences over labels. We tackle the learning problem by defining a loss function for comparing a predicted graph with a feedback graph. This loss is materialized by decomposing the feedback graph into bipartite sub-graphs. We then adopt the maximum-margin framework which leads to a quadratic optimization problem with linear constraints. While the size of the problem grows quadratically with the number of the nodes in the feedback graph, we derive a problem of a significantly smaller size and prove that it attains the same minimum. We then describe an efficient algorithm, called SOPOPO, for solving the reduced problem by employing a soft projection onto the polyhedron defined by a reduced set of constraints. We also describe and analyze a wrapper procedure for batch learning when multiple graphs are provided for training. We conclude with a set of experiments which show significant improvements in run time over a state of the art interior-point algorithm.
6 0.31370777 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity
7 0.27817515 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
8 0.26565483 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models
10 0.26054591 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies
12 0.23046029 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
15 0.17773385 35 jmlr-2006-Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
16 0.1753982 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
17 0.17160639 60 jmlr-2006-Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples
18 0.17077015 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
19 0.15827321 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
20 0.15788965 7 jmlr-2006-A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events
topicId topicWeight
[(8, 0.012), (14, 0.222), (36, 0.052), (45, 0.027), (46, 0.011), (50, 0.045), (61, 0.213), (63, 0.035), (76, 0.057), (78, 0.018), (79, 0.022), (81, 0.023), (84, 0.02), (90, 0.018), (91, 0.061), (96, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.84295249 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
2 0.69361997 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection
Author: Hichem Sahbi, Donald Geman
Abstract: We introduce a computational design for pattern detection based on a tree-structured network of support vector machines (SVMs). An SVM is associated with each cell in a recursive partitioning of the space of patterns (hypotheses) into increasingly finer subsets. The hierarchy is traversed coarse-to-fine and each chain of positive responses from the root to a leaf constitutes a detection. Our objective is to design and build a network which balances overall error and computation. Initially, SVMs are constructed for each cell with no constraints. This “free network” is then perturbed, cell by cell, into another network, which is “graded” in two ways: first, the number of support vectors of each SVM is reduced (by clustering) in order to adjust to a pre-determined, increasing function of cell depth; second, the decision boundaries are shifted to preserve all positive responses from the original set of training data. The limits on the numbers of clusters (virtual support vectors) result from minimizing the mean computational cost of collecting all detections subject to a bound on the expected number of false positives. When applied to detecting faces in cluttered scenes, the patterns correspond to poses and the free network is already faster and more accurate than applying a single pose-specific SVM many times. The graded network promotes very rapid processing of background regions while maintaining the discriminatory power of the free network. Keywords: statistical learning, hierarchy of classifiers, coarse-to-fine computation, support vector machines, face detection
3 0.66469681 83 jmlr-2006-Sparse Boosting
Author: Peter Bühlmann, Bin Yu
Abstract: We propose Sparse Boosting (the SparseL2 Boost algorithm), a variant on boosting with the squared error loss. SparseL2 Boost yields sparser solutions than the previously proposed L2 Boosting by minimizing some penalized L2 -loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the soft-thresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL2 Boost. We prove an equivalence of SparseL2 Boost to Breiman’s nonnegative garrote estimator for orthogonal linear models and demonstrate the generic nature of SparseL2 Boost for nonparametric interaction modeling. For an automatic selection of the tuning parameter in SparseL2 Boost we propose to employ the gMDL model selection criterion which can also be used for early stopping of L2 Boosting. Consequently, we can select between SparseL2 Boost and L2 Boosting by comparing their gMDL scores. Keywords: lasso, minimum description length (MDL), model selection, nonnegative garrote, regression
4 0.42561886 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs
6 0.39002383 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
7 0.38913596 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
8 0.38848156 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
11 0.37780437 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
12 0.37679896 49 jmlr-2006-Learning Parts-Based Representations of Data
13 0.37551823 47 jmlr-2006-Learning Image Components for Object Recognition
14 0.37243363 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)
16 0.36927804 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix
17 0.36402112 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
19 0.36231872 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms
20 0.36055616 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition