jmlr jmlr2006 jmlr2006-53 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Center for Computational Learning Systems Columbia University 475 Riverside Drive 850 Interchurch MC 7717 New York, NY 10115, USA Editor: Manfred Warmuth Abstract We consider the problem of learning a hypergraph using edge-detecting queries. [sent-7, score-0.512]
2 In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. [sent-8, score-0.897]
3 We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. [sent-9, score-1.159]
4 We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. [sent-11, score-1.059]
5 Introduction A hypergraph H = (V, E) is given by a set of vertices V and a set of edges E, which is a subset of the power set of V (E ⊆ 2V ). [sent-15, score-0.763]
6 The dimension of a hypergraph H is the cardinality of the largest set in E. [sent-16, score-0.512]
7 In this paper, we are interested in learning a hidden hypergraph using edge-detecting queries of the following form QH (S) : does S include at least one edge of H? [sent-18, score-0.967]
8 The query QH (S) is answered 1 or 0, indicating whether S contains all vertices of at least one edge of H or not. [sent-20, score-0.491]
9 Grebinski and Kucherov (2000) also study a somewhat different and interesting query model, which they call the additive model, where instead of giving a 1 or 0 answer, a query tells you the total number of edges contained in a certain vertex set. [sent-35, score-0.631]
10 It is shown in Angluin and Chen (2004) that Ω((2m/r) r/2 ) edge-detecting queries are required to learn a general hypergraph of dimension r with m edges. [sent-42, score-0.808]
11 However, this lower bound does not preclude efficient algorithms for classes of hypergraphs whose edges sizes are close. [sent-45, score-0.392]
12 In particular, the question whether there is a learning algorithm for uniform hypergraphs using a number of queries that is linear in the number of edges is still left open, which is the main subject of this paper. [sent-46, score-0.727]
13 We show that an r-uniform hypergraph is learnable with O(24r m · poly(r, log n, log 1 )) queries with probability at least 1 − δ. [sent-49, score-1.087]
14 Formally speaking, Definition 1 A hypergraph is (r, ∆)-uniform, where ∆ < r, if its dimension is r and the difference between its maximum and minimum edge sizes is ∆, or equivalently, the maximum and the minimum edge sizes are r and r − ∆ respectively. [sent-51, score-0.866]
15 On 2 the other hand, we extend the algorithm that learns uniform hypergraphs to learning the class of 2216 L EARNING A H IDDEN H YPERGRAPH ∆ ∆ (r, ∆)-uniform hypergraphs with O(2O((1+ 2 )r) · m1+ 2 · poly(log n, log 1 )) queries with probability at δ least 1 − δ. [sent-55, score-1.012]
16 In this paper, we show that in our algorithm for r-uniform hypergraphs, queries can be made in O(min(2r (log m + r)2 , (log m + r)3 )) rounds, and in our algorithm for (r, ∆)-uniform hypergraphs, queries can be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. [sent-64, score-0.628]
17 Basically, an independent covering family of a hypergraph is a collection of independent sets that cover all non-edges. [sent-66, score-0.755]
18 An interesting observation is that the set of negative queries of any algorithm that learns a hypergraph drawn from a class of hypergraphs that is closed under the operation of adding an edge is an independent covering family of that hypergraph. [sent-67, score-1.499]
19 Note both the class of r-uniform hypergraphs and the class of (r, ∆)-uniform hypergraphs are closed under the operation of adding an edge. [sent-68, score-0.482]
20 This implies that the query complexity of learning such a hypergraph is bounded below by the minimum size of its independent covering families. [sent-69, score-0.85]
21 In this paper, we give a randomized construction of an independent covering family of size O(r2 2r m log n) for r-uniform hypergraphs with m edges. [sent-72, score-0.625]
22 This yields a learning algorithm using a number of queries that is quadratic in m, which is further improved to give an algorithm using a number of queries that is linear in m. [sent-73, score-0.65]
23 As mentioned in Angluin and Chen (2004) and some other papers, the hypergraph learning problem may also be viewed as the problem of learning a monotone disjunctive normal form (DNF) boolean formula using membership queries only. [sent-74, score-0.832]
24 A membership query assigns 1 or 0 to each variable, and is answered 1 if the assignment satisfies at least one term, and 0 otherwise, that is, if the set of vertices corresponding to the variables assigned 1 contains all vertices of at least one edge of H. [sent-76, score-0.572]
25 An (r, ∆)-uniform hypergraph corresponds to a monotone DNF whose terms are of sizes in the range of [r − ∆, r]. [sent-78, score-0.536]
26 In Section 3, we formally define the concept of an independent covering family and give a randomized construction of independent covering families for general r-uniform hypergraphs. [sent-81, score-0.46]
27 In Section 4, we show how to efficiently find an arbitrary edge in a hypergraph and give a simple learning algorithm using a number of queries that is quadratic in the number of edges. [sent-82, score-1.007]
28 In Section 5, we give an algorithm that learns r-uniform hypergraphs using a number of queries that is linear in the number of edges. [sent-83, score-0.59]
29 In this paper, we assume that edges do not contain each other, as there is no way to detect the existence of edges that contain other edges using edge-detecting queries. [sent-88, score-0.505]
30 We use the term non-edge to denote any set that is a candidate edge in some class of hypergraphs but is not an edge in the target hypergraph. [sent-90, score-0.559]
31 The degree of a set χ ⊆ V in a hypergraph H denoted as d H (χ) is the number of / edges of H that contain χ. [sent-93, score-0.689]
32 An Independent Covering Family Definition 2 An independent covering family of a hypergraph H is a collection of independent sets of H such that every non-edge not containing an edge is contained in one of these independent sets. [sent-97, score-0.982]
33 First, we observe that if the target hypergraph is drawn from a class of hypergraphs that is closed under the operation of adding an edge (e. [sent-106, score-0.93]
34 , the class of all r-uniform hypergraphs), the set of negative queries of any algorithm that learns it is an independent covering family of this hypergraph. [sent-108, score-0.569]
35 This is because if there is a non-edge not contained in any of the sets on which these negative queries are made, we will not be able to distinguish between the target hypergraph and the hypergraph with this non-edge being an extra edge. [sent-109, score-1.365]
36 Second, although the task of constructing an independent covering family seems substantially easier than that of learning, since the hypergraph is known in the construction task, we show that efficient construction of small-sized independent covering families yields an efficient learning algorithm. [sent-113, score-0.979]
37 In Section 4, we will show how to find an arbitrary edge out of a hypergraph of dimension r using O(r log n) queries. [sent-114, score-0.79]
38 Imagine a simple algorithm in which at each iteration we maintain a sub-hypergraph of the target hypergraph which contains edges that we have found, and construct an independent covering family for it and ask queries on all the sets in the family. [sent-115, score-1.253]
39 If there is a set whose query is answered positively, we can find at least one edge out of this set. [sent-116, score-0.383]
40 We repeat this process until we have collected all the edges in the target hypergraph, in which case the independent covering family we construct is a proof of this fact. [sent-118, score-0.371]
41 Suppose that we can construct an independent covering family of size at most f (m) for any hypergraph with at most m edges drawn from certain class of hypergraphs. [sent-119, score-0.901]
42 The above algorithm learns this class of hypergraphs using only O( f (m) · m · r log n) queries. [sent-120, score-0.413]
43 In the rest of this section, we give a randomized construction of a linear-sized (linear in the number of edges) independent covering family of an r-uniform hypergraph which succeeds with probability at least 1/2. [sent-121, score-0.87]
44 By the standard probabilistic argument, the construction proves the existence of an independent covering family of size linear in the number of edges for any uniform hypergraph. [sent-122, score-0.418]
45 Theorem 3 Any r-uniform hypergraph with m edges has an independent covering family of size O(r22r m log n). [sent-125, score-1.002]
46 We call a vertex set χ ⊆ V relevant if it is contained in at least one edge in the hypergraph. [sent-127, score-0.421]
47 Similarly, a vertex is relevant if it is contained in at least one edge in the hypergraph. [sent-128, score-0.391]
48 2219 A NGLUIN AND C HEN Algorithm 1 Construction of an independent covering family 1: FH ← a set containing 4(ln 2 + r ln n) · 2r dH (χ) (χ, pH (χ))-samples drawn independently for every relevant set χ. [sent-144, score-0.395]
49 A Simple Quadratic Algorithm In this section, we first give an algorithm that finds an arbitrary edge in a hypergraph of dimension r using only r log n edge-detecting queries. [sent-156, score-0.808]
50 Using the high-probability version of the construction, we obtain an algorithm using a number of queries that is quadratic in m that learns an r-uniform hypergraph with m edges with high probability. [sent-159, score-1.034]
51 Lemma 6 FIND-ONE-VERTEX finds one relevant vertex in a non-empty hypergraph with n vertices using at most log n edge-detecting queries. [sent-171, score-0.899]
52 Since we assume that the hypergraph is non-empty, the above equalities clearly hold for our initial assignment of S and A. [sent-175, score-0.558]
53 The algorithm takes at most log n edge-detecting queries in total, as it makes one query in each iteration. [sent-182, score-0.578]
54 8: Call FIND-ONE-EDGE on the hypergraph induced on S with the vertex v removed. [sent-192, score-0.627]
55 In fact, each time, we make edge-detecting queries on the union of a subset of S and the set of vertices already found. [sent-198, score-0.396]
56 Lemma 7 FIND-ONE-EDGE finds one edge in a non-empty hypergraph of dimension r with n vertices using r log n edge-detecting queries. [sent-202, score-0.871]
57 It is evident that if FIND-ONE-EDGE uses (r − 1) log n queries for a hypergraph with dimension r − 1. [sent-204, score-0.927]
58 then it only uses (r − 1) log n + log n = r log n queries for a hypergraph with dimension r. [sent-205, score-1.165]
59 Algorithm 4 learns a uniform hypergraph with probability at least 1 − δ. [sent-210, score-0.609]
60 Algorithm 4 finds one new edge at each iteration because F H is an independent covering family of the already found sub-hypergraph H. [sent-221, score-0.408]
61 The query complexity will be O(22r m2 · r poly(r, log n) · log 1 ), which is quadratic in m. [sent-225, score-0.405]
62 3 An Improved FIND-ONE-EDGE Despite the simplicity of FIND-ONE-EDGE, its queries have to be made in r log n rounds. [sent-227, score-0.415]
63 When irrelevant vertices abound, that is, when n is much larger than m, we would like to arrange queries in a smaller number of rounds. [sent-228, score-0.377]
64 In the following, we use a technique developed in Damaschke (1998) (for learning monotone boolean functions) to find one edge from a non-empty hypergraph with high probability using only O(log m + r) rounds and O((log m + r) log n) queries. [sent-229, score-0.878]
65 We say that we split a set A according to its ith (i ∈ [1, log n]) bit, we will divide A into two sets, one containing vertices whose ith bits are 0 and the other containing vertices whose ith bits are 1. [sent-241, score-0.447]
66 One of the key ideas in Damaschke (1998) is that because the splits are pre-determined, and the queries are monotone in terms of subset relation, we can make queries on pre-determined splits to make predictions. [sent-247, score-0.635]
67 15: if all queries are answered 1 then ∗ ∗ 16: A ← Ai , S ← Si (i∗ is the largest index in I in this case). [sent-262, score-0.375]
68 We first make queries on (S\A) ∪ A| bi =0 (= S\A|bi =1 ) and (S\A) ∪ A|bi =1 (= S\A|bi =0 ) for every i. [sent-266, score-0.416]
69 case 1: If there exists i such that Ri = (0, 0), that is, both queries are answered 0, all edges contained in S are split between A|bi =0 and A|bi =1 , that is, the intersections of each edge with these two sets are a partition of the edge. [sent-268, score-0.752]
70 case 2: If there exists i such that Ri = (1, 1), that is, both queries are answered 1, we can set S to be either of the two sets (S\A) ∪ A|bi =0 and (S\A) ∪ A|bi =1 as they both contain edges, and set A to be A|bi =0 or A|bi =1 respectively. [sent-274, score-0.401]
71 Because they do not share a common edge as their intersection S\A does not contain an edge, the sum of the numbers of edges contained in these two sets is at most the number of edges contained in S. [sent-277, score-0.597]
72 case 3: If neither of the two events happens, we need to deal with the third case where ∀i ∈ I, one of the queries is answered 0 and the other is answered 1. [sent-281, score-0.472]
73 If all queries are answered 1, i∗ is the largest index in I and Ai is a singleton set containing a relevant vertex. [sent-293, score-0.447]
74 At each iteration, we use at most 3 log n queries which are made in at most 2 rounds. [sent-301, score-0.415]
75 Therefore, Lemma 8 In expectation, PARA-FIND-ONE-VERTEX finds one relevant vertex using O((log m + r) log n) queries, and the queries can be made in 2(log m + r) rounds. [sent-302, score-0.602]
76 PARA-FIND-ONE-VERTEX can work with FIND-ONE-EDGE to find an edge using expected O(r(log m + r) log n) queries in expected 2r(log m + r) rounds. [sent-303, score-0.574]
77 In the new vertex finding process, 2225 A NGLUIN AND C HEN instead of starting with A = V = S (recall in a recursive call of FIND-ONE-EDGE, we look for a relevant edge contained in the S. [sent-310, score-0.421]
78 Lemma 9 There is an algorithm that finds an edge in a non-empty hypergraph using expected O((log m+r) log n) edge-detecting queries. [sent-315, score-0.808]
79 Corollary 10 With probability at least 1−δ, PARA-FIND-ONE-EDGE finds an edge using O((log m+ r) log n log 1 ) edge-detecting queries, and the queries can be made in 4(log m + r) rounds. [sent-322, score-0.734]
80 A Linear-Query Algorithm Reconstructing an independent covering family at the discovery of every new edge is indeed wasteful. [sent-324, score-0.443]
81 The difference is that discovery probabilities reflect degrees in the hypergraph we have found, while threshold probabilities reflect degrees in the target hypergraph. [sent-339, score-0.675]
82 Intuitively, a high-degree relevant set in the target hypergraph (not necessarily a high-degree relevant set in H), or a relevant set with small threshold probability is important, because an edge or a non-edge may not be found if its important relevant subsets are not found. [sent-371, score-1.035]
83 2 The Algorithm Let H = (V, E) be the hypergraph the algorithm has found so far. [sent-377, score-0.53]
84 The queries can be made in O(log m + r) rounds, as queries of each call to PARA-FIND-ONE-EDGE can be made in O(log m + r) rounds. [sent-382, score-0.622]
85 Since ∑χ dH (χ) ≤ 2r m and ∑χ c(χ) ≤ (2r +1)m (note that c(χ) is one more than the 1 number of new edges that χ-samples in FH produce), the number of queries made at each iteration 1 is at most O(24r m · poly(r, log n, log δ )). [sent-401, score-0.714]
86 Therefore, the total number of queries will be linear in the number of edges with high probability, as desired. [sent-402, score-0.447]
87 Let H be the hypergraph the algorithm has found at the beginning of the iteration. [sent-405, score-0.53]
88 Let H be the hypergraph the algorithm 1 has found before this group is processed. [sent-412, score-0.553]
89 Assertion 17 If χ is inactive, then at the end of this iteration, either e has been found or a subset of e whose threshold probability is at most 1 pχ has been found (a relevant subset is found when an 2 edge that contains it is found). [sent-414, score-0.372]
90 Let H be the hypergraph that has been found at the end of the iteration. [sent-434, score-0.512]
91 Theorem 22 Ω((2m/r)r/2 ) edge-detecting queries are required to identify a hypergraph drawn from the class of all (r, r − 2)-uniform hypergraphs with n vertices and m edges. [sent-461, score-1.148]
92 ∆ Theorem 23 Ω((2m/(∆ + 2))1+ 2 ) edge-detecting queries are required to identify a hypergraph drawn from the class of all (r, ∆)-uniform hypergraphs with n vertices and m edges. [sent-463, score-1.148]
93 Proof Given a (∆ + 2, ∆)-uniform hypergraph H = (V, E), let H = (V ∪V , E ) be an (r, ∆)-uniform hypergraph, where |V | = r − ∆ − 2, V ∩V = φ and E = {e ∪V |e ∈ E}. [sent-464, score-0.512]
94 Theorem 24 There is a randomized algorithm that learns an (r, ∆)-uniform hypergraph with m ∆ ∆ edges and n vertices with probability at least 1 − δ, using O(2 O((1+ 2 )r) · m1+ 2 · poly(log n, log 1 ))) δ queries. [sent-470, score-0.976]
95 (We remark that this improvement is available for the uniform hypergraph problem in the case when |χ| = r − 1, but is not as important. [sent-487, score-0.533]
96 / Definition 25 A (χ, ν, p)-sample (χ ∩ ν = 0) is a random set of vertices that contains χ and does not contain any vertex in ν and contains each other vertex independently with probability p. [sent-490, score-0.432]
97 Algorithm 7 Learning an (r, ∆)-uniform hypergraph All PARA-FIND-ONE-EDGE’s are called with parameter δ . [sent-491, score-0.512]
98 Let H be the hypergraph the algorithm has found before the group is processed. [sent-545, score-0.553]
99 If no assertion is violated, the round complexity of Algorithm 7 is O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) 1 We can choose δ so that the algorithm succeeds with probability 1 − δ and log δ ≤ poly(r, log n) · log 1 . [sent-597, score-0.564]
100 Therefore, the total number of queries the algorithm makes is bounded by ∆ ∆ 1 O(2O((1+ 2 )r) · m1+ 2 · poly(log n, log )). [sent-604, score-0.433]
wordName wordTfidf (topN-words)
[('hypergraph', 0.512), ('dh', 0.371), ('fh', 0.339), ('queries', 0.296), ('hypergraphs', 0.241), ('ph', 0.197), ('edge', 0.159), ('covering', 0.152), ('edges', 0.151), ('query', 0.145), ('bi', 0.12), ('log', 0.119), ('vertex', 0.115), ('ngluin', 0.102), ('ypergraph', 0.093), ('hen', 0.087), ('ln', 0.085), ('angluin', 0.085), ('vertices', 0.081), ('answered', 0.079), ('poly', 0.072), ('relevant', 0.072), ('idden', 0.065), ('discovery', 0.064), ('assertion', 0.058), ('alon', 0.056), ('assertions', 0.056), ('reaction', 0.056), ('succeeds', 0.052), ('phase', 0.051), ('chemicals', 0.047), ('grebinski', 0.046), ('equalities', 0.046), ('chen', 0.045), ('contained', 0.045), ('family', 0.045), ('probability', 0.041), ('beigel', 0.039), ('round', 0.038), ('newly', 0.035), ('draw', 0.035), ('learns', 0.035), ('threshold', 0.035), ('probabilities', 0.032), ('jiang', 0.032), ('divide', 0.031), ('call', 0.03), ('phases', 0.03), ('inactive', 0.03), ('ri', 0.03), ('iteration', 0.029), ('terminates', 0.029), ('kucherov', 0.028), ('noga', 0.028), ('calls', 0.028), ('earning', 0.028), ('contains', 0.027), ('contain', 0.026), ('construction', 0.026), ('iterations', 0.025), ('samples', 0.025), ('dna', 0.025), ('bits', 0.025), ('monotone', 0.024), ('react', 0.024), ('qh', 0.024), ('dnf', 0.024), ('halves', 0.024), ('independent', 0.023), ('ends', 0.023), ('group', 0.023), ('nds', 0.023), ('rounds', 0.023), ('ch', 0.023), ('pr', 0.022), ('lemma', 0.022), ('split', 0.022), ('quadratic', 0.022), ('dana', 0.021), ('nder', 0.021), ('hamiltonian', 0.021), ('ith', 0.021), ('uniform', 0.021), ('intersection', 0.02), ('families', 0.02), ('incident', 0.019), ('abbreviate', 0.019), ('excluding', 0.019), ('event', 0.019), ('subset', 0.019), ('randomized', 0.019), ('active', 0.019), ('asodi', 0.019), ('doubles', 0.019), ('algorithm', 0.018), ('drawn', 0.018), ('succeed', 0.018), ('shrink', 0.018), ('minimum', 0.018), ('events', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
2 0.11793937 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
3 0.092470042 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
4 0.057595219 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities
Author: Adam R. Klivans, Rocco A. Servedio
Abstract: We consider two well-studied problems regarding attribute efficient learning: learning decision lists and learning parity functions. First, we give an algorithm for learning decision lists of length ˜ 1/3 ˜ 1/3 k over n variables using 2O(k ) log n examples and time nO(k ) . This is the first algorithm for learning decision lists that has both subexponential sample complexity and subexponential running time in the relevant parameters. Our approach establishes a relationship between attribute efficient learning and polynomial threshold functions and is based on a new construction of low degree, low weight polynomial threshold functions for decision lists. For a wide range of parameters our construction matches a lower bound due to Beigel for decision lists and gives an essentially optimal tradeoff between polynomial threshold function degree and weight. Second, we give an algorithm for learning an unknown parity function on k out of n variables using O(n1−1/k ) examples in poly(n) time. For k = o(log n) this yields a polynomial time algorithm with sample complexity o(n); this is the first polynomial time algorithm for learning parity on a superconstant number of variables with sublinear sample complexity. We also give a simple algorithm for learning an unknown length-k parity using O(k log n) examples in nk/2 time, which improves on the naive nk time bound of exhaustive search. Keywords: PAC learning, attribute efficiency, learning parity, decision lists, Winnow
5 0.049184851 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
Author: Ting Liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification
7 0.038646042 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models
8 0.037037376 81 jmlr-2006-Some Discriminant-Based PAC Algorithms
9 0.034887653 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity
10 0.034839805 12 jmlr-2006-Active Learning with Feedback on Features and Instances
12 0.032964237 10 jmlr-2006-Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
13 0.028301733 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
14 0.027842367 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
16 0.026538305 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)
17 0.026208609 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates
18 0.02498123 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
19 0.024979368 84 jmlr-2006-Stability Properties of Empirical Risk Minimization over Donsker Classes
20 0.023242969 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
topicId topicWeight
[(0, 0.139), (1, -0.033), (2, -0.117), (3, 0.021), (4, -0.089), (5, 0.1), (6, 0.295), (7, -0.096), (8, -0.061), (9, 0.051), (10, -0.022), (11, 0.062), (12, 0.133), (13, -0.012), (14, -0.017), (15, 0.012), (16, 0.059), (17, 0.113), (18, -0.06), (19, -0.009), (20, -0.062), (21, -0.036), (22, 0.045), (23, 0.162), (24, -0.1), (25, -0.024), (26, -0.006), (27, -0.142), (28, 0.094), (29, -0.169), (30, -0.035), (31, 0.019), (32, -0.106), (33, 0.069), (34, -0.066), (35, 0.075), (36, 0.187), (37, -0.289), (38, -0.054), (39, 0.095), (40, 0.073), (41, 0.101), (42, -0.031), (43, 0.003), (44, 0.129), (45, 0.12), (46, -0.293), (47, 0.041), (48, 0.024), (49, 0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.95706004 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
2 0.63870907 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
3 0.40203187 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
4 0.33044228 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
Author: Ting Liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high-dimensional nonparametric operations such as k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the data points close to the query, but merely need to answer questions about the properties of that set of data points. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this paper concentrates on pure k-NN classification. We introduce new ball-tree algorithms that on real-world data sets give accelerations from 2-fold to 100-fold compared against highly optimized traditional ball-tree-based k-NN . These results include data sets with up to 106 dimensions and 105 records, and demonstrate non-trivial speed-ups while giving exact answers. keywords: ball-tree, k-NN classification
Author: Eyal Even-Dar, Shie Mannor, Yishay Mansour
Abstract: We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a total of O (n/ε2 ) log(1/δ) times to find an ε-optimal arm with probability of at least 1 − δ. This bound matches the lower bound of Mannor and Tsitsiklis (2004) up to constants. We also devise action elimination procedures in reinforcement learning algorithms. We describe a framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability). We provide a model-based and a model-free variants of the elimination method. We further derive stopping conditions guaranteeing that the learned policy is approximately optimal with high probability. Simulations demonstrate a considerable speedup and added robustness over ε-greedy Q-learning.
6 0.22221984 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities
7 0.22180194 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity
8 0.20731325 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models
10 0.16227919 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
11 0.15124142 88 jmlr-2006-Streamwise Feature Selection
12 0.15077667 58 jmlr-2006-Lower Bounds and Aggregation in Density Estimation
13 0.14927876 15 jmlr-2006-Bayesian Network Learning with Parameter Constraints (Special Topic on Machine Learning and Optimization)
14 0.14291665 68 jmlr-2006-On the Complexity of Learning Lexicographic Strategies
15 0.13503303 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation
16 0.13307279 36 jmlr-2006-In Search of Non-Gaussian Components of a High-Dimensional Distribution
17 0.12469439 12 jmlr-2006-Active Learning with Feedback on Features and Instances
18 0.12334343 23 jmlr-2006-Consistency and Convergence Rates of One-Class SVMs and Related Algorithms
19 0.12205734 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates
20 0.1193097 94 jmlr-2006-Using Machine Learning to Guide Architecture Simulation
topicId topicWeight
[(8, 0.016), (36, 0.068), (37, 0.373), (45, 0.019), (50, 0.057), (61, 0.045), (63, 0.027), (66, 0.013), (68, 0.014), (76, 0.045), (78, 0.017), (79, 0.022), (81, 0.029), (84, 0.017), (90, 0.02), (91, 0.048), (96, 0.072)]
simIndex simValue paperId paperTitle
same-paper 1 0.6942997 53 jmlr-2006-Learning a Hidden Hypergraph
Author: Dana Angluin, Jiang Chen
Abstract: We consider the problem of learning a hypergraph using edge-detecting queries. In this model, the learner may query whether a set of vertices induces an edge of the hidden hypergraph or not. We show that an r-uniform hypergraph with m edges and n vertices is learnable with O(2 4r m · poly(r, log n)) queries with high probability. The queries can be made in O(min(2 r (log m + r)2 , (log m + r)3 )) rounds. We also give an algorithm that learns an almost uniform hypergraph of ∆ ∆ dimension r using O(2O((1+ 2 )r) · m1+ 2 · poly(log n)) queries with high probability, where ∆ is the difference between the maximum and the minimum edge sizes. This upper bound matches our ∆ lower bound of Ω(( m∆ )1+ 2 ) for this class of hypergraphs in terms of dependence on m. The 1+ 2 queries can also be made in O((1 + ∆) · min(2r (log m + r)2 , (log m + r)3 )) rounds. Keywords: query learning, hypergraph, multiple round algorithm, sampling, chemical reaction network
2 0.59430838 57 jmlr-2006-Linear State-Space Models for Blind Source Separation
Author: Rasmus Kongsgaard Olsson, Lars Kai Hansen
Abstract: We apply a type of generative modelling to the problem of blind source separation in which prior knowledge about the latent source signals, such as time-varying auto-correlation and quasiperiodicity, are incorporated into a linear state-space model. In simulations, we show that in terms of signal-to-error ratio, the sources are inferred more accurately as a result of the inclusion of strong prior knowledge. We explore different schemes of maximum-likelihood optimization for the purpose of learning the model parameters. The Expectation Maximization algorithm, which is often considered the standard optimization method in this context, results in slow convergence when the noise variance is small. In such scenarios, quasi-Newton optimization yields substantial improvements in a range of signal to noise ratios. We analyze the performance of the methods on convolutive mixtures of speech signals. Keywords: blind source separation, state-space model, independent component analysis, convolutive model, EM, speech modelling
3 0.34427673 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
Author: Juho Rousu, Craig Saunders, Sandor Szedmak, John Shawe-Taylor
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm’s predictive accuracy was found to be competitive with other recently introduced hierarchical multicategory or multilabel classification learning algorithms. Keywords: kernel methods, hierarchical classification, text categorization, convex optimization, structured outputs
5 0.33016515 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix
Author: Mikio L. Braun
Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to infinity. We derive probabilistic finite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reflecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a significant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds
6 0.32369843 3 jmlr-2006-A Hierarchy of Support Vector Machines for Pattern Detection
7 0.32313013 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
8 0.32123363 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
9 0.32002884 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
11 0.31690615 92 jmlr-2006-Toward Attribute Efficient Learning of Decision Lists and Parities
13 0.31509572 44 jmlr-2006-Large Scale Transductive SVMs
14 0.31486759 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
15 0.31472778 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
16 0.31425261 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification
17 0.31347904 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)
18 0.31339344 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
19 0.31235325 1 jmlr-2006-A Direct Method for Building Sparse Kernel Learning Algorithms
20 0.31234631 11 jmlr-2006-Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error