jmlr jmlr2013 jmlr2013-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
Reference: text
sentIndex sentText sentNum sentScore
1 The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. [sent-10, score-0.943]
2 In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. [sent-13, score-0.326]
3 In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. [sent-14, score-0.309]
4 Their approach was to embed a specific family of regular languages (the piecewise-testable ones) in a Hilbert space via a kernel and to identify languages with hyperplanes. [sent-44, score-0.258]
5 (1995) consider the learnability u 1514 O N THE L EARNABILITY OF S HUFFLE I DEALS b,c 0 a 1 a a,b,c a,c b,c 2 b 3 Figure 1: The canonical DFA for recognizing the shuffle ideal of u = aab over Σ = {a, b, c}, which accepts precisely those strings that contain u as a subsequence. [sent-53, score-0.364]
6 The shuffle ideal generated by a string u is the collection of all strings containing u as a (not necessarily contiguous) subsequence (see Figure 1 for an illustration). [sent-57, score-0.963]
7 Despite being a particularly simple subfamily of the regular languages, shuffle ideals play a prominent role in formal language theory. [sent-58, score-0.325]
8 On a more applied front, the shuffle ideals capture some rudimentary phenomena in human language morphology (Kontorovich et al. [sent-61, score-0.265]
9 In Section 3 we show that shuffle ideals of known length are exactly learnable in the statistical query model under the uniform distribution, though not efficiently. [sent-63, score-0.351]
10 On the other hand, in Section 4 we show that the shuffle ideals are not properly PAC learnable under general distributions unless RP=NP. [sent-65, score-0.32]
11 In Section 5 we show that a polynomial time improper PAC learning algorithm for the class of shuffle ideals would imply the existence of polynomial time algorithms to break the RSA cryptosystem, factor Blum integers, and test quadratic residuosity. [sent-66, score-0.324]
12 The elements of Σ∗ will be referred to as strings with their length denoted by |·|; the empty string is λ. [sent-71, score-0.659]
13 The concatenation of strings u1 and u2 is denoted by u1 · u2 or u1 u2 . [sent-72, score-0.257]
14 The string u is a prefix of a string v if there exists a string w such that v = uw. [sent-73, score-1.128]
15 Similarly, u is a suffix of v if there exists a string w such that v = wu. [sent-74, score-0.376]
16 We use exponential notation for repeated concatenation of a string with itself, that is, un is the concatenation of n copies of u. [sent-75, score-0.458]
17 If u ⊑ v then the leftmost span of u in v 1515 A NGLUIN , A SPNES , E ISENSTAT AND KONTOROVICH is the shortest prefix v1 of v such that u ⊑ v1 and the rightmost span of u in v is the shortest suffix v2 of v such that u ⊑ v2 . [sent-82, score-0.305]
18 The shuffle ideal of string u consists of all strings v over the given alphabet such that u ⊑ v. [sent-87, score-0.724]
19 Lemma 1 Suppose u = u1 u2 u3 and v = v1 v2 v3 are strings such that u ⊑ v and v1 is the leftmost span of u1 in v and v3 is the rightmost span of u3 in v. [sent-91, score-0.562]
20 Proof If u = λ, then u is certainly a subsequence of x. [sent-94, score-0.296]
21 If there is no such occurrence, then u is certainly not a subsequence of x. [sent-96, score-0.296]
22 Otherwise, we write x = yax′ , where y contains no occurrence of a; then u is a subsequence of x if and only if u′ is a subsequence of x′ , so we continue recursively with u′ and x′ . [sent-97, score-0.636]
23 If n is an upper bound on the length of the string u ∈ Σ∗ generating the target shuffle ideal, then our concept class contains exactly n ∑ |Σ|ℓ = O(|Σ|n) ℓ=0 1516 O N THE L EARNABILITY OF S HUFFLE I DEALS members. [sent-110, score-0.402]
24 m Hence, the problem of properly PAC learning shuffle ideals has been reduced to finding one that is consistent with a given sample. [sent-112, score-0.26]
25 In addition, in Section 5 we show that the existence of a polynomial time improper PAC learning algorithm for shuffle ideals would imply the existence of polynomial time algorithms for certain cryptographic problems. [sent-115, score-0.387]
26 SQ Learning Under the Uniform Distribution The main result of this section is that shuffle ideals are efficiently PAC learnable under the uniform distribution. [sent-117, score-0.276]
27 If u is not a subsequence of x′ , χu,a (x, y) = 0. [sent-126, score-0.296]
28 Proof Fix an unknown string u of length L ≥ 1; by assumption, we have recovered in u = u1 . [sent-138, score-0.402]
29 Let X be a random variable representing the uniformly chosen sample string x. [sent-146, score-0.376]
30 TA is the length of the longest prefix of ′ that is a subsequence of X with X u Iℓ +1 excluded: TA = max t : u′ . [sent-184, score-0.322]
31 1 Intuitively, TB is the length of the longest prefix of u′ with u′ excluded that is a subsequence of X ℓ+1 with XIℓ +1 excluded. [sent-194, score-0.322]
32 s 1518 O N THE L EARNABILITY OF S HUFFLE I DEALS If X is a positive example, then u is a subsequence of X and a leftmost embedding of u in X ¯ ¯ embeds u1 . [sent-230, score-0.474]
33 ¯ ¯ 1519 A NGLUIN , A SPNES , E ISENSTAT AND KONTOROVICH Theorem 4 When the length L of the target string u is known, u is exactly identifiable with O(Ls) ¯ ¯ 2 statistical queries at tolerance τ = 3(s−1) P(L, n, s). [sent-265, score-0.524]
34 Theorem 5 When the length L of the target string u is known, u is approximately identifiable to ¯ ¯ within ε > 0 with O(Ls) statistical queries at tolerance τ = 2ε/(9(s − 1)n). [sent-270, score-0.524]
35 Theorem 7 For any alphabet of size at least 2, given two disjoint sets of strings S, T ⊂ Σ∗ , the problem of determining whether there exists a string u such that u ⊑ x for each x ∈ S and u ⊑ x for each x ∈ T is NP-complete. [sent-294, score-0.69]
36 Let Σ = {0, 1}, let n be a positive integer and define An to be the set of 2n binary strings described by the regular expression ((00000 + 00100)11)n . [sent-296, score-0.317]
37 Define strings v0 = 000100, v1 = 001000, d = 11, and let Sn consist of the two strings s0 = (v0 d)n , s1 = (v1 d)n . [sent-297, score-0.514]
38 The strings ti,0 , ti,1 and ti,2 are obtained from s0 by replacing occurrence i of v0 by y0 , y1 , and z, respectively. [sent-299, score-0.301]
39 The string ti,3 is obtained from s0 by replacing occurrence i of d by d0 . [sent-300, score-0.42]
40 The following lemma shows that the set of strings consistent with Sn and Tn is precisely the 2n strings in An . [sent-302, score-0.514]
41 Lemma 8 Let Cn be the set of strings u such that u is a subsequence of both strings in Sn and not a subsequence of any string in Tn . [sent-303, score-1.482]
42 Proof We first observe that for any positive integer m and any string u ∈ Am , the leftmost span of u in (v0 d)m is (v0 d)m itself, and the leftmost span of u in (v1 d)m is (v1 d)m itself. [sent-305, score-0.76]
43 Similarly, for any string u ∈ Am , the rightmost span of du in d(v0 d)m is d(v0 d)m itself, and the rightmost span of du in d(v1 d)m is d(v1 d)m itself. [sent-308, score-0.602]
44 The leftmost span of u′ in ti,0 is (v0 d)i−1 , and the rightmost span of u′′ in ti,0 is d(v0 d)n−i , which implies that ui ⊑ y0 by Lemma 1. [sent-317, score-0.42]
45 Similar arguments show that u is not a subsequence of ti,1 or ti,2 . [sent-320, score-0.296]
46 We divide u into parts, u = u′ ui dui+1 u′′ , where u′ = u1 d · · · ui−1 d and ′′ = du ′ i−1 and the rightmost span of u′′ in t u i+2 · · · un d. [sent-322, score-0.261]
47 1522 O N THE L EARNABILITY OF S HUFFLE I DEALS That is, at least one of the strings 000001100000, 001001100000, 000001100100, 001001100100 must be a subsequence of 0001001000100, which is false, showing that u is not a subsequence of ti,3 . [sent-325, score-0.849]
48 Thus u is not a subsequence of any string in Tn , and u ∈ Cn . [sent-326, score-0.672]
49 Similarly, if ui is a subsequence of y0 , y1 or z, then u is a subsequence of ti,0 , ti,1 , or ti,2 , respectively, so we know that each ui is a subsequence of the string 000100, but not a subsequence of the strings 00010, 01000, or 0000. [sent-332, score-2.047]
50 To eliminate the third possibility we use the fact that u is a subsequence of s1 . [sent-334, score-0.296]
51 Consider any string w = w1 dw2 d · · · wn d, where wi = 000100 and each w j for j = i is either 00000 or 00100. [sent-335, score-0.376]
52 If w ⊑ s1 , then the leftmost span of w′ in s1 is (v1 d)i−1 , and the rightmost span of w′′ in s1 is d(v1 d)n−i , which by Lemma 1 means that 000100 must be a subsequence of v1 = 001000, a contradiction. [sent-337, score-0.601]
53 Thus no such w is a subsequence of s1 , and we must have ui equal to 00000 or 00100 for all i, that is, u must be in An . [sent-338, score-0.411]
54 Proof To see that this decision problem is in NP, note that if S is empty, then any string of length longer than the longest string in T satisfies the necessary requirements, so that the answer in this case is necessarily “yes. [sent-341, score-0.778]
55 Given a CNF formula φ over the n variables xi for 1 ≤ i ≤ n, we construct two sets of binary strings S and T such that φ is satisfiable if and only if there exists a shuffle string u that is a subsequence of every string in S and of no string in T . [sent-344, score-1.736]
56 The set T is the strings in the set Tn together with additional strings determined by the clauses of φ. [sent-346, score-0.514]
57 By Lemma 8, the strings consistent with Sn and Tn are the 2n strings in An . [sent-347, score-0.514]
58 We use each u = u1 du2 d · · · un d in An to represent an assignment to the n variables xi by choosing xi = 0 if ui is 00000 and xi = 1 if ui = 00100. [sent-348, score-0.403]
59 We construct additional elements of T based on the clauses of the formula φ to exclude any strings representing assignments that do not satisfy φ. [sent-349, score-0.313]
60 The strings in An that are subsequences of t j are exactly those that correspond to assignments that falsify clause j of φ, and adding t j to T eliminates these strings from those consistent with S and T . [sent-351, score-0.648]
61 By adding one string t j to T for each clause j of φ, we ensure that the only strings u that are subsequences of both elements of S and not subsequences of any element of T are exactly those elements of An that correspond to assignments that do not falsify any clause of φ. [sent-352, score-0.847]
62 Thus, there exists at least one string u that is a subsequence of both strings in S and not a subsequence of any string in T if and only if φ is satisfiable. [sent-353, score-1.601]
63 Note that S contains two strings of length O(n), Tn contains 4n strings of length O(n), and T additionally contains one string of length O(n) for each clause of φ, so the sizes of S and T are polynomial in the size of φ. [sent-354, score-1.067]
64 Thus, the class of shuffle ideals faces the same cryptographic limitations on PAC learnability as demonstrated by Kearns and Valiant for the class of general regular languages represented by deterministic finite automata. [sent-359, score-0.511]
65 Thus, an OR of m inputs is equivalent to a threshold function with threshold 1, and an AND of m inputs is equivalent to a threshold function with threshold m. [sent-362, score-0.354]
66 For d > 0, the formulas of depth d consist of a threshold function with m inputs applied to a sequence of m formulas of depth d − 1. [sent-373, score-0.36]
67 The string alphabet consists of the symbols 0 and 1 and a set of d + 1 delimiters: #0 , #1 , . [sent-380, score-0.433]
68 In this case, the shuffle string is r0 ( f ) = y1 #0 y2 #0 . [sent-385, score-0.376]
69 If the assignment a is given by a binary string a1 a2 . [sent-391, score-0.438]
70 an , indicating that xi is assigned the value ai , then the string representing the assignment is just s0 (a) = a1 #0 a2 #0 . [sent-394, score-0.464]
71 It is clear that r0 ( f ) is a subsequence of s0 (a) if and only if the n occurrences of #0 in each string are matched, and y j is a subsequence of a j for all j = 1, 2, . [sent-398, score-0.968]
72 Thus, when f is a constant or a literal, r0 ( f ) is a subsequence of s0 (a) if and only if a satisfies f . [sent-405, score-0.296]
73 In addition to defining the shuffle string and the assignment strings at each level, we also define a slack string. [sent-406, score-0.735]
74 For level 0, the slack string z0 is defined as follows. [sent-407, score-0.416]
75 z0 = (01#0 )n , That is, z0 consists of n repetitions of the string 01#0 . [sent-408, score-0.376]
76 For level d, the slack string is designed to ensure that rd ( f ) is a subsequence of zd for any f ∈ T (n, m, d); this clearly holds at level d = 0. [sent-409, score-0.84]
77 , fm ), where each fi is a depth d − 1 threshold formula and θ is a threshold function with threshold t. [sent-418, score-0.358]
78 We define the shuffle string rd ( f ) = u1 u1 u2 u2 · · · um um (#d )2t , where for each i = 1, 2, . [sent-419, score-0.427]
79 Given an assignment a to the variables Vn , we define a level d assignment string sd (a) = v2m , where v = sd−1 (a)#d zd−1 #d . [sent-428, score-0.61]
80 1525 A NGLUIN , A SPNES , E ISENSTAT AND KONTOROVICH That is, sd (a) is 2m copies of the string v consisting of the level d − 1 code for a, followed by #d , followed by the level d − 1 slack string, followed by #d . [sent-429, score-0.575]
81 Finally, the level d slack string is defined as follows. [sent-431, score-0.416]
82 A straightforward induction shows that for any threshold formula f in T (n, m, d), rd ( f ) is a subsequence of zd , and for any assignment a to the variables, sd (a) is also a subsequence of zd . [sent-433, score-1.073]
83 Lemma 9 For all threshold formulas f in T (n, m, d) and assignments a to the variables in Vn , a satisfies f if and only if rd ( f ) is a subsequence of sd (a). [sent-434, score-0.63]
84 For d = 0, the basis construction showed that for all constants or literals f and assignments a, a satisfies f if and only if r0 ( f ) is a subsequence of s0 (a). [sent-436, score-0.347]
85 , fm ), where each fi is a depth d − 1 threshold formula and θ is a threshold function with threshold t. [sent-441, score-0.358]
86 Because rd−1 ( fi ) is a subsequence of the slack string zd−1 , ui ui is a subsequence of vv. [sent-443, score-1.284]
87 Also, ui ui is a subsequence of v if and only if rd−1 ( fi ) is a subsequence of sd−1 (a), which holds if and only if a satisfies fi , by the inductive assumption. [sent-444, score-0.914]
88 If ui ui is not a subsequence of v, then a leftmost embedding of ui ui in vv must match the first #d in ui ui to the second #d in vv and the second #d in ui ui to the fourth #d in vv, thereby “consuming” all of vv for the embedding. [sent-445, score-1.493]
89 By the inductive assumption, this means that rd−1 ( fi ) is a subsequence of sd−1 (a) for each i ∈ T . [sent-448, score-0.342]
90 For i ∈ T , ui ui is a subsequence of vv but not of v. [sent-450, score-0.559]
91 Conversely, suppose that rd ( f ) is a subsequence of sd (a), and consider a leftmost embedding. [sent-453, score-0.592]
92 Considering the segments ui ui of rd ( f ) from left to right, we see that the leftmost embedding consumes one copy of v if a satisfies fi and two copies if a does not satisfy fi . [sent-454, score-0.6]
93 Each is a subsequence of zd , and for m ≥ 2, the length of zd is bounded by (10m)d (3n). [sent-458, score-0.476]
94 The first result assumes a polynomial time algorithm to learn shuffle ideals over some fixed alphabet. [sent-461, score-0.27]
95 Theorem 10 Suppose for some positive integer d, there exists a polynomial time algorithm to PAC learn shuffle ideals over an alphabet of size d + 2. [sent-462, score-0.327]
96 The second result assumes a polynomial time algorithm to learn shuffle ideals over an arbitrary finite alphabet, where the dependence on the alphabet size must be at most exponential. [sent-464, score-0.327]
97 Theorem 11 Suppose there exists an algorithm to PAC learn shuffle ideals over arbitrary finite alphabets that runs in time polynomial in n and Cs , where n is a bound on the length of examples, s is the alphabet size and C is a fixed constant. [sent-465, score-0.353]
98 The assignment strings for the assignment a = 001 are as follows. [sent-476, score-0.381]
99 Assignment a satisfies f and r2 ( f ) is a subsequence of s2 (a). [sent-478, score-0.296]
100 Discussion We have shown that the class of shuffle ideals is not efficiently properly PAC learnable if RP = NP, and is not efficiently improperly PAC learnable under certain cryptographic assumptions. [sent-480, score-0.443]
wordName wordTfidf (topN-words)
[('string', 0.376), ('pac', 0.352), ('subsequence', 0.296), ('shuf', 0.29), ('strings', 0.257), ('ideals', 0.216), ('kontorovich', 0.211), ('ta', 0.168), ('tb', 0.165), ('leftmost', 0.135), ('ui', 0.115), ('sd', 0.11), ('sq', 0.105), ('doi', 0.102), ('languages', 0.099), ('aryeh', 0.095), ('earnability', 0.095), ('huffle', 0.095), ('isenstat', 0.095), ('ngluin', 0.095), ('spnes', 0.095), ('dana', 0.09), ('automata', 0.088), ('issn', 0.081), ('kearns', 0.079), ('pr', 0.079), ('zd', 0.077), ('threshold', 0.075), ('learnability', 0.073), ('formulas', 0.071), ('cryptographic', 0.063), ('angluin', 0.063), ('tolerance', 0.063), ('assignment', 0.062), ('regular', 0.06), ('learnable', 0.06), ('queries', 0.059), ('depth', 0.058), ('alphabet', 0.057), ('span', 0.057), ('rightmost', 0.056), ('polynomial', 0.054), ('aspnes', 0.053), ('symbol', 0.052), ('deals', 0.051), ('rd', 0.051), ('mehryar', 0.05), ('language', 0.049), ('copies', 0.049), ('query', 0.049), ('dfa', 0.049), ('fi', 0.046), ('clause', 0.045), ('pre', 0.044), ('properly', 0.044), ('occurrence', 0.044), ('embedding', 0.043), ('delimiters', 0.042), ('pitt', 0.042), ('valiant', 0.041), ('automaton', 0.041), ('slack', 0.04), ('mohri', 0.038), ('leonid', 0.037), ('branching', 0.035), ('subsequences', 0.035), ('ideal', 0.034), ('vv', 0.033), ('rp', 0.033), ('un', 0.033), ('tn', 0.032), ('corinna', 0.032), ('delimiter', 0.032), ('sarah', 0.032), ('formula', 0.029), ('boolean', 0.029), ('isbn', 0.028), ('np', 0.028), ('assignments', 0.027), ('inputs', 0.027), ('vazirani', 0.027), ('falsify', 0.027), ('eisenstat', 0.027), ('parekh', 0.027), ('dui', 0.027), ('alt', 0.026), ('november', 0.026), ('yale', 0.026), ('xi', 0.026), ('length', 0.026), ('linguistics', 0.025), ('bshouty', 0.024), ('literal', 0.024), ('literals', 0.024), ('morphological', 0.024), ('palmer', 0.024), ('heidelberg', 0.024), ('ul', 0.022), ('cortes', 0.022), ('james', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
2 0.18735556 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
Author: Alexander Clark
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold’s identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars. Keywords: context-free grammars, grammatical inference, identification in the limit, structure learning
3 0.15347983 33 jmlr-2013-Dimension Independent Similarity Computation
Author: Reza Bosagh Zadeh, Ashish Goel
Abstract: We present a suite of algorithms for Dimension Independent Similarity Computation (DISCO) to compute all pairwise similarities between very high-dimensional sparse vectors. All of our results are provably independent of dimension, meaning that apart from the initial cost of trivially reading in the data, all subsequent operations are independent of the dimension; thus the dimension can be very large. We study Cosine, Dice, Overlap, and the Jaccard similarity measures. For Jaccard similarity we include an improved version of MinHash. Our results are geared toward the MapReduce framework. We empirically validate our theorems with large scale experiments using data from the social networking site Twitter. At time of writing, our algorithms are live in production at twitter.com. Keywords: cosine, Jaccard, overlap, dice, similarity, MapReduce, dimension independent
4 0.076077551 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
Author: Philip M. Long, Rocco A. Servedio
Abstract: We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin halfspace, based on an algorithm of Nesterov’s that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using ˜ n · poly(1/γ) processors and running in time O(1/γ) + O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions. Keywords: PAC learning, parallel learning algorithms, halfspace learning, linear classifiers
5 0.071742453 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
Author: Pekka Parviainen, Mikko Koivisto
Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be significantly less than 2n . Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning
6 0.070350282 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
7 0.063708872 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
8 0.043095108 21 jmlr-2013-Classifier Selection using the Predicate Depth
9 0.042821743 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
10 0.042052258 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
11 0.034870323 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds
12 0.030799085 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
13 0.029879658 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
14 0.029740604 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library
15 0.029497094 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
16 0.029103294 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
17 0.029009668 86 jmlr-2013-Parallel Vector Field Embedding
18 0.028891277 114 jmlr-2013-The Rate of Convergence of AdaBoost
19 0.028717404 66 jmlr-2013-MAGIC Summoning: Towards Automatic Suggesting and Testing of Gestures With Low Probability of False Positives During Use
20 0.027928445 8 jmlr-2013-A Theory of Multiclass Boosting
topicId topicWeight
[(0, -0.138), (1, 0.065), (2, -0.014), (3, 0.005), (4, -0.047), (5, -0.006), (6, 0.132), (7, 0.026), (8, -0.309), (9, 0.191), (10, -0.006), (11, -0.303), (12, -0.115), (13, 0.327), (14, -0.35), (15, -0.008), (16, -0.018), (17, -0.053), (18, -0.043), (19, 0.013), (20, -0.069), (21, -0.131), (22, -0.029), (23, 0.041), (24, -0.099), (25, 0.003), (26, -0.055), (27, -0.027), (28, -0.088), (29, 0.005), (30, -0.101), (31, 0.059), (32, 0.019), (33, -0.064), (34, 0.054), (35, 0.08), (36, -0.001), (37, 0.014), (38, 0.02), (39, -0.022), (40, -0.024), (41, 0.054), (42, 0.019), (43, 0.028), (44, 0.002), (45, 0.043), (46, -0.006), (47, -0.01), (48, 0.029), (49, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.969405 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
2 0.82139647 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
Author: Alexander Clark
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold’s identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars. Keywords: context-free grammars, grammatical inference, identification in the limit, structure learning
3 0.61522847 33 jmlr-2013-Dimension Independent Similarity Computation
Author: Reza Bosagh Zadeh, Ashish Goel
Abstract: We present a suite of algorithms for Dimension Independent Similarity Computation (DISCO) to compute all pairwise similarities between very high-dimensional sparse vectors. All of our results are provably independent of dimension, meaning that apart from the initial cost of trivially reading in the data, all subsequent operations are independent of the dimension; thus the dimension can be very large. We study Cosine, Dice, Overlap, and the Jaccard similarity measures. For Jaccard similarity we include an improved version of MinHash. Our results are geared toward the MapReduce framework. We empirically validate our theorems with large scale experiments using data from the social networking site Twitter. At time of writing, our algorithms are live in production at twitter.com. Keywords: cosine, Jaccard, overlap, dice, similarity, MapReduce, dimension independent
4 0.30153972 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
5 0.25192317 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
Author: Pekka Parviainen, Mikko Koivisto
Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be significantly less than 2n . Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning
6 0.24473517 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
7 0.23814005 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
8 0.19515589 21 jmlr-2013-Classifier Selection using the Predicate Depth
9 0.19003305 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
11 0.15619612 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
12 0.14768511 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
13 0.1420045 104 jmlr-2013-Sparse Single-Index Model
14 0.13239923 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
15 0.13038734 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion
16 0.12872083 112 jmlr-2013-Tapkee: An Efficient Dimension Reduction Library
17 0.12870751 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds
18 0.12631798 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
19 0.12437458 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
20 0.12422371 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications
topicId topicWeight
[(0, 0.012), (5, 0.081), (6, 0.03), (9, 0.538), (10, 0.04), (20, 0.017), (23, 0.02), (41, 0.024), (68, 0.015), (70, 0.029), (75, 0.033), (85, 0.035), (87, 0.018), (89, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.83307159 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
2 0.82376492 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
3 0.25782928 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
Author: Alexander Clark
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold’s identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars. Keywords: context-free grammars, grammatical inference, identification in the limit, structure learning
4 0.23434523 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
5 0.22208388 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
Author: Philip M. Long, Rocco A. Servedio
Abstract: We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin halfspace, based on an algorithm of Nesterov’s that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using ˜ n · poly(1/γ) processors and running in time O(1/γ) + O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions. Keywords: PAC learning, parallel learning algorithms, halfspace learning, linear classifiers
6 0.21559785 37 jmlr-2013-Divvy: Fast and Intuitive Exploratory Data Analysis
7 0.21100372 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
9 0.2005112 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
10 0.19944064 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
12 0.19695307 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
13 0.19585708 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
14 0.19581044 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
15 0.19559555 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
16 0.19529793 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
18 0.19379885 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
20 0.19300313 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning