jmlr jmlr2009 jmlr2009-64 knowledge-graph by maker-knowledge-mining

64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning


Source: pdf

Author: Vitaly Feldman

Abstract: We study the properties of the agnostic learning framework of Haussler (1992) and Kearns, Schapire, and Sellie (1994). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning? Our results show that the answer is negative for distribution-independent agnostic learning and positive for agnostic learning with respect to a specific marginal distribution. Namely, we give a simple proof that any concept class learnable agnostically by a distribution-independent algorithm with access to membership queries is also learnable agnostically without membership queries. This resolves an open problem posed by Kearns et al. (1994). For agnostic learning with respect to the uniform distribution over {0, 1}n we show a concept class that is learnable with membership queries but computationally hard to learn from random examples alone (assuming that one-way functions exist). Keywords: agnostic learning, membership query, separation, PAC learning

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 San Jose, CA 95120 Editor: Rocco Servedio Abstract We study the properties of the agnostic learning framework of Haussler (1992) and Kearns, Schapire, and Sellie (1994). [sent-4, score-0.475]

2 In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning? [sent-5, score-0.932]

3 Our results show that the answer is negative for distribution-independent agnostic learning and positive for agnostic learning with respect to a specific marginal distribution. [sent-6, score-0.95]

4 Namely, we give a simple proof that any concept class learnable agnostically by a distribution-independent algorithm with access to membership queries is also learnable agnostically without membership queries. [sent-7, score-1.488]

5 For agnostic learning with respect to the uniform distribution over {0, 1}n we show a concept class that is learnable with membership queries but computationally hard to learn from random examples alone (assuming that one-way functions exist). [sent-10, score-1.224]

6 Keywords: agnostic learning, membership query, separation, PAC learning 1. [sent-11, score-0.773]

7 Introduction The agnostic framework (Haussler, 1992; Kearns et al. [sent-12, score-0.475]

8 The goal of the agnostic learning algorithm for a concept class C is to produce a hypothesis h whose error on the target concept is close to the best possible by a concept from C . [sent-15, score-0.8]

9 Furthermore, strong computational hardness results are known for agnostic learning of even the simplest classes of functions such as parities, monomials and halfspaces (H˚ stad, 2001; Feldman, 2006; Feldman et al. [sent-23, score-0.514]

10 F ELDMAN nostic learning of simple classes of functions provide another indication of the hardness of agnostic learning (Kearns et al. [sent-30, score-0.514]

11 A membership oracle allows a learning algorithm to obtain the value of the unknown target function f on any point in the domain. [sent-34, score-0.385]

12 Learning with membership queries in both PAC and Angluin’s exact models (Angluin, 1988) was studied in numerous works. [sent-36, score-0.481]

13 For example monotone DNF formulas, finite automata and decision trees are only known to be learnable with membership queries (Valiant, 1984; Angluin, 1988; Bshouty, 1995). [sent-37, score-0.581]

14 It is well-known and easy to prove that the PAC model with membership queries is strictly stronger than the PAC model without membership queries (if one-way functions exist). [sent-38, score-0.914]

15 Membership queries are also used in several agnostic learning algorithms. [sent-39, score-0.634]

16 The first one is the famous algorithm of Goldreich and Levin (1989) introduced in a cryptographic context (even before the definition of the agnostic learning model). [sent-40, score-0.541]

17 Their algorithm learns parities agnostically with respect to the uniform distribution using membership queries. [sent-41, score-0.666]

18 Recently, Gopalan, Kalai, and Klivans (2008) gave an elegant algorithm that learns decision trees agnostically over the uniform distribution and uses membership queries. [sent-44, score-0.575]

19 1 Our Contribution In this work we study the power of membership queries in the agnostic learning model. [sent-46, score-0.932]

20 The question of whether or not membership queries can aid in agnostic learning was first asked by Kearns et al. [sent-47, score-0.932]

21 In the first result we prove that every concept class learnable agnostically with membership queries is also learnable agnostically without membership queries (see Th. [sent-51, score-1.619]

22 The simple proof of this result explains why the known distribution-independent agnostic learning algorithm do not use membership queries (Kearns et al. [sent-56, score-0.932]

23 The proof of this result also shows equivalence of two standard agnostic models: the one in which examples are labeled by an unrestricted function and the one in which examples come from a joint distribution over the domain and the labels. [sent-59, score-0.522]

24 Our second result is a proof that there exists a concept class that is agnostically learnable with membership queries over the uniform distribution on {0, 1} n but hard to learn in the same setting without membership queries (see Th. [sent-60, score-1.377]

25 Our construction is based on the use of pseudorandom function families, list-decodable codes and a variant of an idea from the work of Elbaz, Lee, Servedio, and Wan (2007). [sent-66, score-0.322]

26 If one assumes that learning of parities with noise is intractable then it immediately follows that membership queries are provably helpful in agnostic learning over the uniform distribution on {0, 1}n . [sent-75, score-1.099]

27 Here the main obstacle is the same as in proving positive results for agnostic learning: the requirements of the model impose severe limits on concept classes for which the agnostic guarantees can be provably satisfied. [sent-80, score-1.042]

28 In this model, for a concept f and distribution D over X, an example oracle EX(D, f ) is the oracle that, upon request, returns an example x, f (x) where x is chosen randomly with respect to D. [sent-99, score-0.347]

29 2 Agnostic Learning Model The agnostic learning model was introduced by Haussler (1992) and Kearns et al. [sent-109, score-0.475]

30 The goal of an agnostic learning algorithm for a concept class C is to produce a hypothesis whose error on examples generated from A is close to the best possible by a concept from C . [sent-112, score-0.708]

31 Definition 2 An algorithm Alg agnostically learns a representation class C by a representation class H assuming A if for every ε > 0, δ > 0, A ∈ A , Alg given access to examples drawn randomly from A, outputs, with probability at least 1 − δ, a hypothesis h ∈ H such that ∆(A, h) ≤ ∆(A, C ) + ε. [sent-122, score-0.371]

32 A number of special cases of the above definition are commonly considered (and often referred to as the agnostic learning model). [sent-126, score-0.475]

33 In fully agnostic learning A is the set of all distributions over X × {−1, 1}. [sent-127, score-0.475]

34 (1994), we refer to this version as agnostic PAC learning. [sent-132, score-0.475]

35 We also note that the agnostic PAC learning model can also be thought of as a model of adversarial classification noise. [sent-135, score-0.475]

36 1 U NIFORM C ONVERGENCE A natural approach to agnostic learning is to first draw a sample of fixed size and then choose a hypothesis that best fits the observed labels. [sent-141, score-0.524]

37 3 Membership Queries A membership oracle for a function f is the oracle that, given any point z ∈ {0, 1} n , returns the value f (z) (Valiant, 1984). [sent-153, score-0.503]

38 We refer to agnostic PAC learning with access to MEM( f ), where f is the unknown function that labels the examples, as agnostic PAC+MQ learning. [sent-155, score-1.001]

39 Similarly, one can extend the definition of a membership oracle to fully agnostic learning. [sent-156, score-0.86]

40 When learning with persistent membership queries the learning algorithm is allowed to fail with some negligible probability over the answers of MEM(A). [sent-159, score-0.541]

41 4 List-Decodable Codes As we have mentioned earlier, agnostic learning can be seen as recovery of an unknown concept from possibly malicious errors. [sent-163, score-0.567]

42 Therefore, encoding of information that allows recovery from errors, or error-correcting codes, can be useful in the design of agnostic learning algorithms. [sent-164, score-0.663]

43 1 H ADAMARD C ODE The Hadamard code encodes a vector a ∈ {0, 1}n as the values of the parity function χa on all the points in {0, 1}n (that is the length of the encoding is 2n ). [sent-179, score-0.351]

44 168 (1) O N T HE P OWER OF M EMBERSHIP Q UERIES IN AGNOSTIC L EARNING Therefore, both list-decoding of the Hadamard code and agnostic learning of parities amount to finding the largest (within 2ε) Fourier coefficient of φ(x). [sent-193, score-0.654]

45 Given access to a membership oracle, for every ε > 0 their algorithm can efficiently find all ε-heavy Fourier coefficients. [sent-195, score-0.372]

46 5 Pseudo-random Function Families A key part of our construction in Section 4 will be based on the use of pseudorandom functions families defined by Goldreich, Goldwasser, and Micali (1986). [sent-200, score-0.299]

47 Definition 5 A function family F = {Fn }∞ where Fn = {πz }z∈{0,1}n is a pseudorandom function n=1 family of Boolean functions over {0, 1}n if • There exists a polynomial time algorithm that for every n, given z ∈ {0, 1} n and x ∈ {0, 1}n computes πz (x). [sent-201, score-0.353]

48 (1986) give a construction of pseudorandom a function families based on the existence of one-way functions. [sent-207, score-0.299]

49 Distribution-Independent Agnostic Learning In this section we show that in distribution-independent agnostic learning membership queries do not help. [sent-209, score-0.932]

50 In addition, we prove that fully agnostic learning is equivalent to agnostic PAC learning. [sent-210, score-0.95]

51 Our proof is based on two simple observations about agnostic learning via empirical error minimization. [sent-211, score-0.475]

52 Therefore membership queries do not make empirical error minimization easier. [sent-213, score-0.457]

53 169 F ELDMAN Theorem 6 Let Alg be an algorithm that agnostically PAC+MQ learns a concept class C in time T (σ, ε, δ) and outputs a hypothesis in a representation class H (σ, ε). [sent-216, score-0.364]

54 Then C is (fully) agnostically learnable by H (σ, ε/2) in time T (σ, ε/2, δ/2) + O(d · log (d/ε) + log (1/δ))/ε 2 ), where d is the VC dimension of H (σ, ε/2) ∪ C . [sent-217, score-0.367]

55 Note, that this simulates A in the agnostic PAC+MQ setting over distribution (US , f ). [sent-229, score-0.499]

56 Learning with Respect to the Uniform Distribution In this section we show that when learning with respect to the uniform distribution over {0, 1} n , membership queries are helpful. [sent-245, score-0.533]

57 Specifically, we show that if one-way functions exist, then there exists a concept class C that is not agnostically PAC learnable (even weakly) with respect to the uniform distribution but is agnostically learnable over the uniform distribution given membership queries. [sent-246, score-1.132]

58 Our agnostic learning algorithm is successful only when ε ≥ 1/p(n) for a polynomial p fixed in advance (the definition of C depends on p). [sent-247, score-0.544]

59 While this is slightly weaker than required by the definition of the model it still exhibits the gap between agnostic learning with and without membership queries. [sent-248, score-0.773]

60 We remark that a number of known PAC and agnostic learning algorithms are efficient only for restricted values of ε (O’Donnell and Servedio, 2006; Gopalan et al. [sent-249, score-0.475]

61 1 Background We first show why some of the known separation results will not work in the agnostic setting. [sent-258, score-0.519]

62 It is well-known that the PAC model with membership queries is strictly stronger than the PAC model without membership queries (under the same cryptographic assumption). [sent-259, score-0.98]

63 The separation result is obtained by using a concept class C that is not PAC learnable and augmenting each concept f ∈ C with the encoding of f in a fixed part of the domain. [sent-260, score-0.54]

64 This encoding is readable using membership queries and therefore an MQ algorithm can “learn” the augmented C by querying the points that contain the encoding. [sent-261, score-0.672]

65 This simple approach would fail in the agnostic setting. [sent-263, score-0.475]

66 The agreement of the unknown function with a concept from C is almost 1 but membership queries on the points of encoding will not yield any useful information. [sent-265, score-0.737]

67 There too the secret encoding can be rendered unusable by a function that agrees with a concept in C on a significant fraction of the domain. [sent-268, score-0.387]

68 As in most other separation results our goal is to create a concept class that is not learnable from uniform examples but includes an encoding of the unknown function that is readable using membership queries. [sent-271, score-0.825]

69 We first note that in order for this approach to work in the agnostic setting the secret encoding has to be “spread” over at least 1 − 2ε fraction of {0, 1}n . [sent-272, score-0.77]

70 Another requirement that the construction has to satisfy is that the encoding of the secret has to be resilient to almost any amount of noise. [sent-280, score-0.309]

71 In particular, since the encoding is a part of the function, we also need to be able to reconstruct an encoding that is close to the best possible. [sent-281, score-0.376]

72 The Hadamard code is equivalent to encoding a vector a ∈ {0, 1} n as the values of the parity function χa on all points in {0, 1}n . [sent-287, score-0.351]

73 The label of a random point (z, x) ∈ {0, 1} 2n is a XOR of a pseudorandom bit with an independent bit and therefore is pseudorandom. [sent-298, score-0.313]

74 Values of a pseudorandom function b on any polynomial set of distinct points are pseudorandom and therefore random points will have pseudorandom labels as long as their z parts are distinct. [sent-299, score-0.852]

75 Hence it is possible to find a using membership queries with the same prefix. [sent-302, score-0.457]

76 Finally, the problem with the construction we have so far is that while a membership query learning algorithm can find the secret a, it cannot predict g(z, x) without knowing d. [sent-305, score-0.419]

77 We are unaware of any constructions of pseudorandom functions that would remain pseudorandom when the value of the function is “mixed” with the description of the function (see the work of Halevi and Krawczyk (2007) for a discussion of this problem). [sent-309, score-0.522]

78 They used another pseudorandom function πd 1 to “hide” the encoding of d, then used another pseudorandom function πd 2 to “hide” the encoding of d 1 and so on. [sent-312, score-0.898]

79 First, in the agnostic setting all the encodings but the one using the largest fraction of the domain can be “corrupted”. [sent-316, score-0.549]

80 In addition, in the agnostic setting the encoding of d i for every odd i can be completely “corrupted” making all the other encodings unrecoverable. [sent-318, score-0.736]

81 To solve these problems in our construction we split the domain into p equal parts and on part i we use a pseudorandom function πd i to “hide” the encoding of d j for all j < i. [sent-319, score-0.487]

82 Specifically, the only part where the unknown concept cannot be “recovered” agnostic is the part i such for all j > i agreement of the target function with every gd ∈ Cnp on part j is close to 1/2 and hence d j cannot be recovered. [sent-324, score-0.636]

83 Here k indexes the encodings, z is the input to the k-th pseudorandom function and x is the input to a parity function on n(p − 1) variables that encodes the secret keys for all pseudorandom functions used for encodings 1 through k − 1. [sent-333, score-0.73]

84 4 Hardness of Learning Cnp From Random Examples We start by showing that Cnp is not agnostically learnable from random and uniform examples only. [sent-348, score-0.347]

85 Proof In order to prove the claim we show that a weak PAC learning algorithm for C np can be used to distinguish a pseudorandom function family from a truly random function. [sent-353, score-0.331]

86 Our concept class Cnp uses numerous pseudorandom functions from Fn and therefore we use a so-called “hybrid” argument to show that one can replace a single π d k (z) with a truly random function to cause Alg to fail. [sent-357, score-0.42]

87 5 Agnostic Learning of Cnp with Membership Queries We now describe a (fully) agnostic learning algorithm for C np that uses membership queries and is successful for any ε ≥ 1/p(n). [sent-412, score-0.959]

88 By the definition of j, ge has error of at least 1/2 − ε/4 − 1/(2p) ≥ 1/2 − ε on this part of the domain and therefore our hypothesis has error close to that of ge . [sent-432, score-0.413]

89 For i ∈ [p] and y ∈ {0, 1} n denote ∆i,y = PrA [b = ge (k, z, x) | k = i, z = y] = Pr x,b ∼Ai,y [b = ge (i, y, x)]. [sent-465, score-0.364]

90 On the other hand, by the properties of j, for all i > j, ∆i ≥ 1/2 − ε/4 and thus ∆(A, ge ) = 1 p ∑ ∆i ≥ i∈[p] 1 p ∑ ∆i + (p − j) i< j 1 ε − 2 4 By combining these equations we obtain that ∆(A, he( j), j,b j ) − ∆(A, ge ) ≤ As noted before, this implies the claim. [sent-481, score-0.364]

91 This implies that AgnLearn agnostically learns C np from persistent membership queries. [sent-492, score-0.559]

92 6 Bounds on ε In Theorem 11 Cnp is defined over {0, 1}m for m = n · p(n) + log p(n) and is learnable agnostically for any ε ≥ 1/p(n). [sent-494, score-0.331]

93 Discussion Our results clarify the role of membership queries in agnostic learning. [sent-509, score-0.932]

94 They imply that in order to extract any meaningful information from membership queries the learner needs to have significant prior knowledge about the distribution of examples. [sent-510, score-0.481]

95 Specifically, either the set of possible classification functions has to be restricted (as in the PAC model) or the set of possible marginal distributions (as in distribution-specific agnostic learning). [sent-511, score-0.475]

96 A interesting result in this direction would be a demonstration that membership queries are useful for distribution-specific agnostic learning of a natural concept class such as halfspaces. [sent-512, score-1.024]

97 Finally, we would be interested to see a proof that membership queries are useful in distributionspecific agnostic learning that places no restriction on ε. [sent-513, score-0.932]

98 More efficient PAC learning of DNF with membership queries under the uniform distribution. [sent-542, score-0.509]

99 More efficient PAC-learning of DNF with membership queries under the uniform distribution. [sent-548, score-0.509]

100 On efficient agnostic learning of linear combinations of basis functions. [sent-683, score-0.475]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('agnostic', 0.475), ('membership', 0.298), ('pseudorandom', 0.261), ('cnp', 0.202), ('mem', 0.202), ('pac', 0.196), ('alg', 0.19), ('encoding', 0.188), ('hadamard', 0.182), ('ge', 0.182), ('agnostically', 0.171), ('queries', 0.159), ('agnlearn', 0.154), ('learnable', 0.124), ('eldman', 0.119), ('embership', 0.107), ('ueries', 0.107), ('goldreich', 0.095), ('concept', 0.092), ('parities', 0.091), ('feldman', 0.09), ('code', 0.088), ('oracle', 0.087), ('secret', 0.083), ('kearns', 0.082), ('ower', 0.081), ('parity', 0.075), ('elbaz', 0.071), ('levin', 0.071), ('kalai', 0.07), ('polynomial', 0.069), ('gl', 0.066), ('cryptographic', 0.066), ('boolean', 0.064), ('guruswami', 0.061), ('fourier', 0.06), ('pra', 0.059), ('gopalan', 0.054), ('valiant', 0.054), ('uniform', 0.052), ('access', 0.051), ('encodings', 0.05), ('hypothesis', 0.049), ('ex', 0.049), ('mq', 0.047), ('gd', 0.046), ('mansour', 0.046), ('separation', 0.044), ('truly', 0.043), ('ips', 0.041), ('bshouty', 0.041), ('dnf', 0.04), ('hardness', 0.039), ('construction', 0.038), ('fn', 0.038), ('pr', 0.037), ('log', 0.036), ('sudan', 0.036), ('vitaly', 0.036), ('corrupted', 0.034), ('coin', 0.034), ('hide', 0.033), ('servedio', 0.033), ('persistent', 0.033), ('returns', 0.031), ('weakly', 0.031), ('request', 0.031), ('negligible', 0.03), ('learns', 0.03), ('parseval', 0.03), ('goldman', 0.03), ('contradicting', 0.029), ('haussler', 0.028), ('stoc', 0.027), ('np', 0.027), ('earning', 0.027), ('readable', 0.027), ('concatenated', 0.027), ('stad', 0.027), ('bit', 0.026), ('randomly', 0.026), ('bits', 0.025), ('jackson', 0.025), ('blum', 0.025), ('distribution', 0.024), ('goldwasser', 0.024), ('halevi', 0.024), ('kushilevitz', 0.024), ('touchstone', 0.024), ('xor', 0.024), ('numerous', 0.024), ('fraction', 0.024), ('every', 0.023), ('unrestricted', 0.023), ('codes', 0.023), ('simulate', 0.022), ('angluin', 0.022), ('outputs', 0.022), ('runs', 0.022), ('probability', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning

Author: Vitaly Feldman

Abstract: We study the properties of the agnostic learning framework of Haussler (1992) and Kearns, Schapire, and Sellie (1994). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning? Our results show that the answer is negative for distribution-independent agnostic learning and positive for agnostic learning with respect to a specific marginal distribution. Namely, we give a simple proof that any concept class learnable agnostically by a distribution-independent algorithm with access to membership queries is also learnable agnostically without membership queries. This resolves an open problem posed by Kearns et al. (1994). For agnostic learning with respect to the uniform distribution over {0, 1}n we show a concept class that is learnable with membership queries but computationally hard to learn from random examples alone (assuming that one-way functions exist). Keywords: agnostic learning, membership query, separation, PAC learning

2 0.090730563 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Author: Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page

Abstract: A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. Keywords: correlation immune functions, skewing, relevant variables, Boolean functions, product distributions c 2009 Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray and David Page. H ELLERSTEIN , ROSELL , BACH , R AY AND PAGE

3 0.074447632 46 jmlr-2009-Learning Halfspaces with Malicious Noise

Author: Adam R. Klivans, Philip M. Long, Rocco A. Servedio

Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/ε)-time algorithms for solving the following problems to accuracy ε: • Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(ε2 / log(n/ε)). (The best previous result was Ω(ε/(n log(n/ε))1/4 ).) • Learning origin-centered halfspaces with respect to any isotropic logconcave distribution on Rn with malicious noise rate η = Ω(ε3 / log2 (n/ε)). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n, 1/ε)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(ε3 / log(1/ε)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(ε) but had running time exponential in an unspecified function of 1/ε. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with “smooth” boosting. Keywords: PAC learning, noise tolerance, malicious noise, agnostic learning, label noise, halfspace learning, linear classifiers c 2009 Adam R. Klivans, Philip M. Long and Rocco A. Servedio. K LIVANS , L ONG AND S ERVEDIO

4 0.074276485 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages    (Special Topic on Mining and Learning with Graphs and Relations)

Author: Leonid (Aryeh) Kontorovich, Boaz Nadler

Abstract: We propose a novel framework for supervised learning of discrete concepts. Since the 1970’s, the standard computational primitive has been to find the most consistent hypothesis in a given complexity class. In contrast, in this paper we propose a new basic operation: for each pair of input instances, count how many concepts of bounded complexity contain both of them. Our approach maps instances to a Hilbert space, whose metric is induced by a universal kernel coinciding with our computational primitive, and identifies concepts with half-spaces. We prove that all concepts are linearly separable under this mapping. Hence, given a labeled sample and an oracle for evaluating the universal kernel, we can efficiently compute a linear classifier (via SVM, for example) and use margin bounds to control its generalization error. Even though exact evaluation of the universal kernel may be infeasible, in various natural situations it is efficiently approximable. Though our approach is general, our main application is to regular languages. Our approach presents a substantial departure from current learning paradigms and in particular yields a novel method for learning this fundamental concept class. Unlike existing techniques, we make no structural assumptions on the corresponding unknown automata, the string distribution or the completeness of the training set. Instead, given a labeled sample our algorithm outputs a classifier with guaranteed distribution-free generalization bounds; to our knowledge, the proposed framework is the only one capable of achieving the latter. Along the way, we touch upon several fundamental questions in complexity, automata, and machine learning. Keywords: grammar induction, regular language, finite state automaton, maximum margin hyperplane, kernel approximation

5 0.045218188 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario. Keywords: identity management, permutations, approximate inference, group theoretical methods, sensor networks

6 0.043201879 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

7 0.037286647 79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis

8 0.031448718 35 jmlr-2009-Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination    (Special Topic on Model Selection)

9 0.030523436 90 jmlr-2009-Structure Spaces

10 0.025909869 70 jmlr-2009-Particle Swarm Model Selection    (Special Topic on Model Selection)

11 0.025413299 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths

12 0.023149928 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory

13 0.020762157 38 jmlr-2009-Hash Kernels for Structured Data

14 0.020710178 50 jmlr-2009-Learning When Concepts Abound

15 0.020053085 59 jmlr-2009-Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions

16 0.017228009 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes

17 0.017196905 49 jmlr-2009-Learning Permutations with Exponential Weights

18 0.017188121 18 jmlr-2009-Consistency and Localizability

19 0.016647626 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

20 0.016586386 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.108), (1, -0.04), (2, 0.027), (3, 0.018), (4, -0.07), (5, -0.01), (6, -0.05), (7, 0.04), (8, -0.053), (9, 0.045), (10, 0.126), (11, -0.065), (12, -0.074), (13, 0.164), (14, -0.359), (15, -0.257), (16, -0.0), (17, -0.022), (18, -0.059), (19, 0.188), (20, -0.067), (21, -0.084), (22, 0.084), (23, 0.201), (24, 0.197), (25, -0.134), (26, -0.121), (27, 0.057), (28, 0.032), (29, 0.086), (30, -0.07), (31, -0.034), (32, -0.05), (33, 0.017), (34, -0.065), (35, -0.021), (36, -0.052), (37, -0.089), (38, -0.0), (39, -0.012), (40, -0.034), (41, -0.088), (42, -0.104), (43, -0.042), (44, -0.023), (45, -0.103), (46, -0.059), (47, -0.144), (48, 0.03), (49, -0.131)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96606922 64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning

Author: Vitaly Feldman

Abstract: We study the properties of the agnostic learning framework of Haussler (1992) and Kearns, Schapire, and Sellie (1994). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning? Our results show that the answer is negative for distribution-independent agnostic learning and positive for agnostic learning with respect to a specific marginal distribution. Namely, we give a simple proof that any concept class learnable agnostically by a distribution-independent algorithm with access to membership queries is also learnable agnostically without membership queries. This resolves an open problem posed by Kearns et al. (1994). For agnostic learning with respect to the uniform distribution over {0, 1}n we show a concept class that is learnable with membership queries but computationally hard to learn from random examples alone (assuming that one-way functions exist). Keywords: agnostic learning, membership query, separation, PAC learning

2 0.59435552 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Author: Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page

Abstract: A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. Keywords: correlation immune functions, skewing, relevant variables, Boolean functions, product distributions c 2009 Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray and David Page. H ELLERSTEIN , ROSELL , BACH , R AY AND PAGE

3 0.48719802 46 jmlr-2009-Learning Halfspaces with Malicious Noise

Author: Adam R. Klivans, Philip M. Long, Rocco A. Servedio

Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/ε)-time algorithms for solving the following problems to accuracy ε: • Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(ε2 / log(n/ε)). (The best previous result was Ω(ε/(n log(n/ε))1/4 ).) • Learning origin-centered halfspaces with respect to any isotropic logconcave distribution on Rn with malicious noise rate η = Ω(ε3 / log2 (n/ε)). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n, 1/ε)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(ε3 / log(1/ε)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(ε) but had running time exponential in an unspecified function of 1/ε. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with “smooth” boosting. Keywords: PAC learning, noise tolerance, malicious noise, agnostic learning, label noise, halfspace learning, linear classifiers c 2009 Adam R. Klivans, Philip M. Long and Rocco A. Servedio. K LIVANS , L ONG AND S ERVEDIO

4 0.4799858 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages    (Special Topic on Mining and Learning with Graphs and Relations)

Author: Leonid (Aryeh) Kontorovich, Boaz Nadler

Abstract: We propose a novel framework for supervised learning of discrete concepts. Since the 1970’s, the standard computational primitive has been to find the most consistent hypothesis in a given complexity class. In contrast, in this paper we propose a new basic operation: for each pair of input instances, count how many concepts of bounded complexity contain both of them. Our approach maps instances to a Hilbert space, whose metric is induced by a universal kernel coinciding with our computational primitive, and identifies concepts with half-spaces. We prove that all concepts are linearly separable under this mapping. Hence, given a labeled sample and an oracle for evaluating the universal kernel, we can efficiently compute a linear classifier (via SVM, for example) and use margin bounds to control its generalization error. Even though exact evaluation of the universal kernel may be infeasible, in various natural situations it is efficiently approximable. Though our approach is general, our main application is to regular languages. Our approach presents a substantial departure from current learning paradigms and in particular yields a novel method for learning this fundamental concept class. Unlike existing techniques, we make no structural assumptions on the corresponding unknown automata, the string distribution or the completeness of the training set. Instead, given a labeled sample our algorithm outputs a classifier with guaranteed distribution-free generalization bounds; to our knowledge, the proposed framework is the only one capable of achieving the latter. Along the way, we touch upon several fundamental questions in complexity, automata, and machine learning. Keywords: grammar induction, regular language, finite state automaton, maximum margin hyperplane, kernel approximation

5 0.21668638 79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis

Author: Alexander L. Strehl, Lihong Li, Michael L. Littman

Abstract: We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the wellknown E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined analysis for upper and lower bounds is presented to yield insight into the differences between the model-free Delayed Q-learning and the model-based R-MAX. Keywords: reinforcement learning, Markov decision processes, PAC-MDP, exploration, sample complexity

6 0.21121919 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths

7 0.18523692 70 jmlr-2009-Particle Swarm Model Selection    (Special Topic on Model Selection)

8 0.17795917 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

9 0.16813911 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

10 0.16499931 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

11 0.16360377 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory

12 0.14755768 35 jmlr-2009-Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination    (Special Topic on Model Selection)

13 0.14413901 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

14 0.1403999 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification

15 0.12788002 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling

16 0.12607178 42 jmlr-2009-Incorporating Functional Knowledge in Neural Networks

17 0.12310052 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

18 0.12098733 59 jmlr-2009-Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions

19 0.12062729 55 jmlr-2009-Maximum Entropy Discrimination Markov Networks

20 0.12037612 20 jmlr-2009-DL-Learner: Learning Concepts in Description Logics


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.012), (11, 0.021), (19, 0.035), (26, 0.016), (38, 0.034), (39, 0.045), (47, 0.013), (52, 0.025), (55, 0.041), (58, 0.023), (66, 0.094), (90, 0.039), (93, 0.502), (96, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.6799261 64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning

Author: Vitaly Feldman

Abstract: We study the properties of the agnostic learning framework of Haussler (1992) and Kearns, Schapire, and Sellie (1994). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning? Our results show that the answer is negative for distribution-independent agnostic learning and positive for agnostic learning with respect to a specific marginal distribution. Namely, we give a simple proof that any concept class learnable agnostically by a distribution-independent algorithm with access to membership queries is also learnable agnostically without membership queries. This resolves an open problem posed by Kearns et al. (1994). For agnostic learning with respect to the uniform distribution over {0, 1}n we show a concept class that is learnable with membership queries but computationally hard to learn from random examples alone (assuming that one-way functions exist). Keywords: agnostic learning, membership query, separation, PAC learning

2 0.65957928 1 jmlr-2009-A Least-squares Approach to Direct Importance Estimation

Author: Takafumi Kanamori, Shohei Hido, Masashi Sugiyama

Abstract: We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. Keywords: importance sampling, covariate shift adaptation, novelty detection, regularization path, leave-one-out cross validation

3 0.27650222 46 jmlr-2009-Learning Halfspaces with Malicious Noise

Author: Adam R. Klivans, Philip M. Long, Rocco A. Servedio

Abstract: We give new algorithms for learning halfspaces in the challenging malicious noise model, where an adversary may corrupt both the labels and the underlying distribution of examples. Our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. We give poly(n, 1/ε)-time algorithms for solving the following problems to accuracy ε: • Learning origin-centered halfspaces in Rn with respect to the uniform distribution on the unit ball with malicious noise rate η = Ω(ε2 / log(n/ε)). (The best previous result was Ω(ε/(n log(n/ε))1/4 ).) • Learning origin-centered halfspaces with respect to any isotropic logconcave distribution on Rn with malicious noise rate η = Ω(ε3 / log2 (n/ε)). This is the first efficient algorithm for learning under isotropic log-concave distributions in the presence of malicious noise. We also give a poly(n, 1/ε)-time algorithm for learning origin-centered halfspaces under any isotropic log-concave distribution on Rn in the presence of adversarial label noise at rate η = Ω(ε3 / log(1/ε)). In the adversarial label noise setting (or agnostic model), labels can be noisy, but not example points themselves. Previous results could handle η = Ω(ε) but had running time exponential in an unspecified function of 1/ε. Our analysis crucially exploits both concentration and anti-concentration properties of isotropic log-concave distributions. Our algorithms combine an iterative outlier removal procedure using Principal Component Analysis together with “smooth” boosting. Keywords: PAC learning, noise tolerance, malicious noise, agnostic learning, label noise, halfspace learning, linear classifiers c 2009 Adam R. Klivans, Philip M. Long and Rocco A. Servedio. K LIVANS , L ONG AND S ERVEDIO

4 0.24680638 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Author: Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page

Abstract: A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. Keywords: correlation immune functions, skewing, relevant variables, Boolean functions, product distributions c 2009 Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray and David Page. H ELLERSTEIN , ROSELL , BACH , R AY AND PAGE

5 0.22652031 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine

6 0.22580867 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

7 0.22573759 83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

8 0.22405306 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression

9 0.2224974 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm    (Special Topic on Mining and Learning with Graphs and Relations)

10 0.22033013 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

11 0.22024962 18 jmlr-2009-Consistency and Localizability

12 0.21991429 98 jmlr-2009-Universal Kernel-Based Learning with Applications to Regular Languages    (Special Topic on Mining and Learning with Graphs and Relations)

13 0.21985747 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

14 0.21975012 79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis

15 0.21930496 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

16 0.21874872 14 jmlr-2009-CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

17 0.21855661 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

18 0.21831229 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

19 0.21827368 78 jmlr-2009-Refinement of Reproducing Kernels

20 0.21716562 55 jmlr-2009-Maximum Entropy Discrimination Markov Networks