jmlr jmlr2007 jmlr2007-47 knowledge-graph by maker-knowledge-mining

47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Source: pdf

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

Abstract: The paper introduces L OG A N -H —a system for learning ﬁrst-order function-free Horn expressions from interpretations. The system is based on an algorithm that learns by asking questions and that was proved correct in previous work. The current paper shows how the algorithm can be implemented in a practical system, and introduces a new algorithm based on it that avoids interaction and learns from examples only. The L OG A N -H system implements these algorithms and adds several facilities and optimizations that allow efﬁcient applications in a wide range of problems. As one of the important ingredients, the system includes several fast procedures for solving the subsumption problem, an NP-complete problem that needs to be solved many times during the learning process. We describe qualitative and quantitative experiments in several domains. The experiments demonstrate that the system can deal with varied problems, large amounts of data, and that it achieves good classiﬁcation accuracy. Keywords: inductive logic programming, subsumption, bottom-up learning, learning with queries

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 As one of the important ingredients, the system includes several fast procedures for solving the subsumption problem, an NP-complete problem that needs to be solved many times during the learning process. [sent-10, score-0.532]

2 The hypothesis of the system may include recursive clauses where the same predicate appears both in the condition of the rule and in the conclusion (obviously with different arguments). [sent-91, score-0.391]

3 One of the new methods modiﬁes D JANGO to use a different representation of the subsumption problem and requires slightly different constraint satisfaction algorithms. [sent-107, score-0.529]

4 The experiments also demonstrate that the different subsumption algorithms are effective on a range of problems but no one dominates the others with L OG A N -H. [sent-123, score-0.458]

5 Further experiments evaluating the subsumption methods on their own show that our modiﬁed D JANGO method is a promising approach when hard subsumption problems need to be solved. [sent-124, score-0.916]

6 The paper also makes a contribution to the study of efﬁcient subsumption algorithms. [sent-129, score-0.458]

7 We develop some new subsumption algorithms and evaluate them both in the context of machine learning and independently. [sent-130, score-0.458]

8 One of the new methods, called D JANGO P RIMAL below, seems particularly promising when hard subsumption problems need to be solved. [sent-131, score-0.458]

9 Section 4 describes three subsumption engines that our system uses. [sent-135, score-0.532]

10 Section 5 describes extensive experiments that demonstrate the validity of our method and of the subsumption procedures. [sent-136, score-0.458]

11 A Horn clause (sometimes called a Horn rule or just a rule) is an expression C = (∧n∈N n) → P, where N is a set of atoms and P is an atom. [sent-145, score-0.398]

12 The clause set [s, c] (where c = 0) repV resents the conjunction of clauses b∈c (s → b). [sent-162, score-0.485]

13 The ﬁrst step results in a set of clauses [s, c] such that s (the antecedent) is the conjunction of all atoms true in I and c (the conclusions) is the set of all atoms (over the domain of I) which are false in I. [sent-187, score-0.425]

14 This clause set is such that all arguments to the predicates are domain elements of I. [sent-188, score-0.423]

15 While domain elements are not constants in the language and we do not allow constants in the rules we slightly abuse normal terminology and call this intermediate form a ground clause set. [sent-190, score-0.382]

16 For example, the clause set rel-cands(e3 ) includes among others the clauses [p(x1 , x2 ) ∧ p(x2 , x2 ) ∧ q(x2 , x1 ) → p(x2 , x1 )], and [p(x1 , x2 ) ∧ p(x2 , x2 ) ∧ q(x2 , x1 ) → q(x1 , x1 )], where all variables are universally quantiﬁed. [sent-192, score-0.485]

17 When removing an object from a clause set [s, c] we remove all atoms referring to it from s and c. [sent-198, score-0.398]

18 Pairing: The pairing operation combines two clause sets [sa , ca ] and [sb , cb ] to create a new clause set [s p , c p ]. [sent-213, score-0.713]

19 The clause set [s p , c p ] obtained by the pairing can be more general than the original clause sets [sa , ca ] and [sb , cb ] since s p is contained in both sa and sb (under the injective mapping) and it thus subsumes both s a and sb . [sent-225, score-0.713]

20 Hence, the pairing operation can be intuitively viewed as a generalization of both participating clause sets. [sent-226, score-0.414]

21 Now when we pair this clause set with another one the antecedent will be a subset of s and surely p will still be wrong. [sent-233, score-0.409]

22 The algorithm then tries to ﬁnd a “useful” pairing of [s, c] with one of the clause sets [s i , ci ] in S. [sent-249, score-0.414]

23 In case no such pairing is found for any of the [s i , ci ], the minimized clause set [s, c] is added to S as the last element. [sent-254, score-0.414]

24 Then the interactive algorithm will stop and produce a hypothesis equivalent to T with O(mpk a kk ) clauses after O(mpka kk ) equivalence queries and O((n + m2 )pka k3k ) membership queries. [sent-288, score-0.46]

25 The one-pass procedure: Given a clause set [s, c] the procedure one-pass tests clauses in [s, c] against all positive examples in E. [sent-312, score-0.485]

26 First, once we match the antecedent we can test all the consequents simultaneously so it is better to keep clause sets together rather than split them into individual clauses. [sent-323, score-0.513]

27 Minimization: The minimization procedure acting on a clause set [s, c] assumes the input clause set has already been validated by one-pass. [sent-327, score-0.598]

28 If we can guarantee in addition that all membership queries implicit in one-pass are answered correctly then the batch algorithm can be seen as performing some run of the interactive algorithm and bounds on queries and hypothesis size translate as well. [sent-340, score-0.435]

29 As a result any clause C produced by the interactive algorithm has at most k variables which in turn guarantees that we can test whether I |= C in time O(n k ) where I has n domain elements. [sent-380, score-0.428]

30 It is therefore crucial for our system to have efﬁcient procedures to test subsumption and enumerate matching. [sent-385, score-0.532]

31 In addition, sorting the examples by size helps reduce run time since the rules generated from the small examples have less variables, a property that generally implies faster subsumption tests. [sent-393, score-0.526]

32 Aside from these, the most crucial run time issue is the subsumption test which we discuss next. [sent-408, score-0.484]

33 Efﬁcient Subsumption Tests The one-pass procedure must enumerate all substitutions that embed a clause in an example. [sent-411, score-0.384]

34 To compute all substitutions between an example and a clause the system repeatedly performs joins of these tables (in the database sense) to get a table of all substitutions. [sent-419, score-0.581]

35 Then for each predicate in the clause we pull the appropriate table from the example, and perform a join which matches the variables already instantiated in our intermediate table. [sent-421, score-0.443]

36 Thus if the predicate in the clause does not introduce new variables the table size cannot grow. [sent-422, score-0.414]

37 It may be worth clarifying here that this is the standard subsumption problem and we do not require different variables to be matched to different objects. [sent-439, score-0.458]

38 2 Randomized Table Based Subsumption If lookahead is still not sufﬁcient or too slow we can resort to randomized subsumption tests. [sent-451, score-0.521]

39 3 Subsumption Based on Constraint Satisfaction Algorithms The idea of using constraint satisfaction algorithms to solve subsumption problems has been investigated in Maloberti and Sebag (2004), where a very effective system D JANGO is developed. [sent-461, score-0.603]

40 The D JANGO system was originally designed to ﬁnd a single solution for the subsumption problem but this can be easily extended to give all solutions through backtracking. [sent-462, score-0.532]

41 To illustrate how constraint satisfaction can be used consider the following subsumption problem where the clause is Cl and the example is Ex: Cl : p(X0 , X1 ), q(X0 , X2 , X3 ), r(X0 ), Ex : p(a0 , a1 ), p(a1 , a2 ), q(a0 , a2 , a3 ), q(a0 , a1 , a3 ), r(a0 ). [sent-473, score-0.828]

42 It does however incur an overhead before starting to solve the subsumption problem and can represent an important part of the execution time in case of easy instances of subsumption problems. [sent-499, score-0.916]

43 Despite the different representations, both versions use a similar method to solve the subsumption problem, using arc-consistency and following with depth ﬁrst search with dynamic variable ordering. [sent-503, score-0.458]

44 Within L OG A N -H the backtracking approach can beneﬁt when we have many substitutions of which only a few are needed to remove all consequents from a clause set. [sent-512, score-0.488]

45 (2003) have previously introduced a table based method for subsumption tests (although our system was developed independently). [sent-514, score-0.56]

46 The table-based subsumption method incorporates aspects from both the primal and dual representation of D JANGO. [sent-520, score-0.519]

47 Finally, calculating the dual form representation of the subsumption problem incurs some overhead that can be noticeable if the subsumption problems themselves are quickly solved. [sent-524, score-0.944]

48 Since in L OG A N -H we have many subsumption tests with the same example or the same clause these can be avoided with additional caching. [sent-525, score-0.757]

49 Other systems have developed different methods to reduce the computational cost of subsumption tests. [sent-527, score-0.458]

50 The ﬁrst one, initially reported in Khardon (2000), implements the interactive and batch algorithms in the Prolog language but does not include discretization, pruning or special subsumption engines. [sent-541, score-0.676]

51 In this section we describe several experiments using the system and its subsumption engines. [sent-543, score-0.532]

52 Fourth, the experiments evaluate the performance of different subsumption engines both within the learning system and independently. [sent-548, score-0.532]

53 They also demonstrate that different subsumption engines may lead to faster execution in different problems. [sent-555, score-0.458]

54 We also compare the subsumption methods in an experiment based on the Phase Transition phenomenon similar to the experiments performed by Maloberti and Sebag (2004). [sent-556, score-0.458]

55 These experiments show that D JANGO P RIMAL is very effective and may perform better than the other two approaches if hard subsumption problems around the phase transition region need to be solved. [sent-557, score-0.526]

56 The Prolog system required 35 equivalence queries, 455 membership queries and about 8 minutes (running Sicstus Prolog on a Linux platform using a Pentium 2/366MHz processor) to recover the set of clauses exactly. [sent-572, score-0.402]

57 The ﬁrst 3 targets have 2 clauses each but vary in the number of atoms in the antecedent and the fourth one has 10 clauses of the larger kind making the problem more challenging. [sent-663, score-0.581]

58 Notice that for target I the task is not too hard since it is not unlikely that we get a random example matching the antecedent of a rule exactly (so that discovering the clause is easy) but for the larger targets this is not the case. [sent-672, score-0.409]

59 10% 16000 Table 7: Runtime comparison for subsumption tests on KRK-illegal data set from averaging among 10 runs over an independent test set of 10000 examples. [sent-778, score-0.458]

60 This domain is also a good case to illustrate the various subsumption tests in our system. [sent-785, score-0.499]

61 Note that since we put the position predicate in the antecedent, the consequent is nullary so iterative subsumption tests are likely to be faster. [sent-786, score-0.628]

62 80 GHz) for various subsumption settings averaged over 10 independent runs. [sent-789, score-0.458]

63 93% Table 9: Runtime comparison for subsumption tests on Mutagenesis data set. [sent-861, score-0.458]

64 We have also run experiments comparing run time with the different subsumption engines. [sent-862, score-0.51]

65 For this domain, deterministic table-based subsumption was not possible, not even with lookahead and arc-consistency since the table size grew beyond memory capacity of our computer. [sent-863, score-0.518]

66 2 Subsumption and Phase Transition The previous experiments have demonstrated that different subsumption engines may lead to faster performance in different domains. [sent-871, score-0.458]

67 In this section we further compare the different algorithms but purely on subsumption problems, that is, not in the context of learning. [sent-872, score-0.458]

68 Previous work (Giordana and Saitta, 2000) has shown that one can parameterize subsumption problems so that there is a sharp transition between regions where most problems have a solution and regions where most problems do not have a solution. [sent-873, score-0.496]

69 This is known as the phase transition phenomenon, and it has been used to evaluate subsumption and learning algorithms (Giordana and Saitta, 2000; Giordana et al. [sent-874, score-0.526]

70 All m literals in a generated clause C are built on distinct binary predicate symbols and clause C is connected, that is, all n variables are linked. [sent-878, score-0.794]

71 The latter requirement prevents the subsumption problem from being decomposable into simpler problems. [sent-879, score-0.458]

72 Each literal in Ex is built on a predicate symbol occurring in C (other literals are irrelevant to the subsumption problem). [sent-881, score-0.695]

73 For each pair of values of m, L , 50 clauses and 50 examples are generated, each clause is tested against all examples. [sent-883, score-0.485]

74 This region is called Phase Transition, and is particularly important for the subsumption problem, since computationally hard problems are located in this region. [sent-886, score-0.458]

75 This is important in the context of a system that dynamically builds clauses and tests subsumption for them. [sent-892, score-0.718]

76 Finally, in Maloberti and Sebag (2004), 100 clauses and 100 examples were generated, each clause was tested against one example. [sent-900, score-0.485]

77 (2003) and Maloberti and Suzuki (2004) for simple subsumption as well as for enumerating all solutions. [sent-916, score-0.458]

78 To summarize, the subsumption experiments suggest that in general the D JANGO methods are more robust than the table based methods and that D JANGO P RIMAL is a promising alternative. [sent-920, score-0.486]

79 However, these results must be interpreted with caution, since in these experiments the tables meth576 L EARNING H ORN E XPRESSIONS WITH L OG A N -H (A) (B) Figure 4: (A) Percentage of wrong subsumption tests for randomized tables method on 50×50 pairs (C , Ex), with TH= 1 and R= 1. [sent-921, score-0.638]

80 Notice that the table based method with such low valued parameters is not able to discover any subsumption substitution, and hence the error rate corresponds to the satisﬁability rate of the subsumption problem suite. [sent-923, score-0.944]

81 The paper also introduced new algorithms for solving the subsumption problem and evaluated their performance. [sent-939, score-0.458]

82 The table based methods give competitive performance within L OG A N -H and D JANGO P RIMAL is a promising new approach where hard subsumption problems in the phase transition region are solved. [sent-940, score-0.554]

83 As discussed above, bottom up search suffers from two aspects: subsumption tests are more costly than in top down approaches, and overﬁtting may occur in small data sets with large examples. [sent-948, score-0.488]

84 This clause is then used as a seed for a small step reﬁnement search that evaluates clauses as usual. [sent-954, score-0.485]

85 A related problem occurs when we use randomized subsumption tests. [sent-976, score-0.489]

86 Here since the subsumption test is incomplete we may not notice that a rule in the hypothesis is violated by a negative example. [sent-977, score-0.502]

87 2 Caching The algorithms described above may produce repeated calls to one-pass with the same antecedent since pairings of one clause set with several others may result in the same clause set. [sent-983, score-0.75]

88 If we try to cache a universally quantiﬁed expression then matching it requires a subsumption test which is expensive. [sent-986, score-0.499]

89 For both algorithms the system caches interpretations rather than clauses or clause sets (the s part of [s, c]). [sent-988, score-0.608]

90 / In fact, for the batch algorithm we only need to cache positive interpretations—if a clause set [s, 0] 579 A RIAS , K HARDON AND M ALOBERTI was returned by one-pass then s does not imply any of the possible consequents and therefore it is a positive interpretation. [sent-989, score-0.535]

91 This is matched with the fact that pairing keeps object names of existing clause sets in the hypothesis. [sent-992, score-0.414]

92 Caching can reduce or increase run time of the system, depending on the data set, the cost for subsumption for examples in the data set, and the rate of cache hits. [sent-994, score-0.525]

93 Recall that the system starts with an example and essentially turns objects into variables in the maximally speciﬁc clause set. [sent-1036, score-0.416]

94 For example, if we discretized the logp attribute from above and variabilize we get logp(X) logp val>00(X) logp val>01(X) logp val<02(X) logp val<03(X). [sent-1040, score-0.737]

95 Notice that if the system ever considers the clause BK(b) → b as a candidate, one-pass will ﬁnd the posi/ tive interpretation I and will drop b, as desired. [sent-1108, score-0.403]

96 In fact, one-pass will return [BK(b), 0] for any input clause set [BK(b), c] since it will drop all consequents in c that are not included in BK(b) (including false), thus precluding clause BK(b) → b from ever being included in any hypothesis. [sent-1109, score-0.702]

97 As an example, suppose that in the normal ILP setting, the clause p(a, b) ∧ p(b, c) → q() is labeled positive and the clause p(a, b) → q() is labeled negative. [sent-1115, score-0.598]

98 In the case of zero arity consequents, the check whether a given clause C is satisﬁed by some interpretation I can be considerably simpliﬁed. [sent-1118, score-0.386]

99 As a result subsumption procedures that enumerate solutions one by one can quit early, after the ﬁrst substitution, and are likely to be faster in this case. [sent-1120, score-0.458]

100 A signature captures the idea that when we map a literal in a clause onto a literal in the example all the neighborhood of the ﬁrst literal must exist in the example as well. [sent-1149, score-0.422]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subsumption', 0.458), ('jango', 0.336), ('clause', 0.299), ('og', 0.279), ('clauses', 0.186), ('logp', 0.14), ('ilp', 0.129), ('aloberti', 0.116), ('orn', 0.116), ('rias', 0.116), ('xpressions', 0.116), ('pairing', 0.115), ('antecedent', 0.11), ('literals', 0.109), ('consequents', 0.104), ('rimal', 0.104), ('atoms', 0.099), ('hardon', 0.098), ('khardon', 0.098), ('horn', 0.096), ('muggleton', 0.093), ('batch', 0.091), ('interactive', 0.088), ('predicate', 0.087), ('substitutions', 0.085), ('blockeel', 0.083), ('predicates', 0.083), ('consequent', 0.083), ('logic', 0.08), ('csp', 0.078), ('raedt', 0.076), ('queries', 0.074), ('system', 0.074), ('laer', 0.073), ('maloberti', 0.073), ('sebag', 0.067), ('grammar', 0.067), ('ual', 0.067), ('icl', 0.067), ('mode', 0.063), ('giordana', 0.061), ('tables', 0.059), ('arity', 0.057), ('prolog', 0.055), ('interpretations', 0.049), ('discretization', 0.047), ('lnai', 0.047), ('atom', 0.047), ('mutagenesis', 0.047), ('srinivasan', 0.046), ('val', 0.046), ('bk', 0.046), ('background', 0.045), ('inductive', 0.044), ('hypothesis', 0.044), ('np', 0.043), ('objects', 0.043), ('rules', 0.042), ('pairings', 0.042), ('dropping', 0.042), ('satisfaction', 0.041), ('cache', 0.041), ('literal', 0.041), ('domain', 0.041), ('earning', 0.04), ('pruning', 0.039), ('th', 0.039), ('lt', 0.038), ('transition', 0.038), ('membership', 0.038), ('adj', 0.037), ('variabilize', 0.037), ('joins', 0.036), ('relational', 0.034), ('primal', 0.033), ('lookahead', 0.032), ('nement', 0.031), ('vp', 0.031), ('golem', 0.031), ('arias', 0.031), ('si', 0.031), ('wrong', 0.031), ('randomized', 0.031), ('bongard', 0.031), ('programming', 0.03), ('interpretation', 0.03), ('constraint', 0.03), ('phase', 0.03), ('equivalence', 0.03), ('bottom', 0.03), ('join', 0.029), ('king', 0.029), ('substitution', 0.028), ('dual', 0.028), ('table', 0.028), ('bins', 0.028), ('logical', 0.028), ('sentences', 0.027), ('run', 0.026), ('ex', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

2 0.16446698 43 jmlr-2007-Integrating Naïve Bayes and FOIL

Author: Niels Landwehr, Kristian Kersting, Luc De Raedt

Abstract: A novel relational learning approach that tightly integrates the na¨ve Bayes learning scheme with ı the inductive logic programming rule-learner FOIL is presented. In contrast to previous combinations that have employed na¨ve Bayes only for post-processing the rule sets, the presented approach ı employs the na¨ve Bayes criterion to guide its search directly. The proposed technique is impleı mented in the N FOIL and T FOIL systems, which employ standard na¨ve Bayes and tree augmented ı na¨ve Bayes models respectively. We show that these integrated approaches to probabilistic model ı and rule learning outperform post-processing approaches. They also yield signiﬁcantly more accurate models than simple rule learning and are competitive with more sophisticated ILP systems. Keywords: rule learning, na¨ve Bayes, statistical relational learning, inductive logic programming ı

3 0.061987944 67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Author: Alexander Clark, Rémi Eyraud

Abstract: This paper formalises the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free languages. We show that there is a polynomial characteristic set, and thus prove polynomial identiﬁcation in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. It transpires that it is not necessary to identify constituents in order to learn a context-free language—it is sufﬁcient to identify the syntactic congruence, and the operations of the syntactic monoid can be converted into a context-free grammar. We also discuss modiﬁcations to the algorithm that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin’s notion of reversibility for regular languages. We also demonstrate that an implementation of this algorithm is capable of learning a classic example of structure dependent syntax in English: this constitutes a refutation of an argument that has been used in support of nativist theories of language. Keywords: grammatical inference, context-free languages, positive data only, reduction system, natural languages

4 0.040248182 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions (Special Topic on the Conference on Learning Theory 2005)

Author: Vitaly Feldman

Abstract: We consider the problems of attribute-efﬁcient PAC learning of two well-studied concept classes: parity functions and DNF expressions over {0, 1}n . We show that attribute-efﬁcient learning of parities with respect to the uniform distribution is equivalent to decoding high-rate random linear codes from low number of errors, a long-standing open problem in coding theory. This is the ﬁrst evidence that attribute-efﬁcient learning of a natural PAC learnable concept class can be computationally hard. An algorithm is said to use membership queries (MQs) non-adaptively if the points at which the algorithm asks MQs do not depend on the target concept. Using a simple non-adaptive parity learning algorithm and a modiﬁcation of Levin’s algorithm for locating a weakly-correlated parity due to Bshouty et al. (1999), we give the ﬁrst non-adaptive and attribute-efﬁcient algorithm for ˜ learning DNF with respect to the uniform distribution. Our algorithm runs in time O(ns4 /ε) and ˜ uses O(s4 · log2 n/ε) non-adaptive MQs, where s is the number of terms in the shortest DNF representation of the target concept. The algorithm improves on the best previous algorithm for learning DNF (of Bshouty et al., 1999) and can also be easily modiﬁed to tolerate random persistent classiﬁcation noise in MQs. Keywords: attribute-efﬁcient, non-adaptive, membership query, DNF, parity function, random linear code

5 0.038550209 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

Author: Sofus A. Macskassy, Foster Provost

Abstract: paper1 This is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classiﬁcation in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classiﬁers comprise a local classiﬁer, a relational classiﬁer, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classiﬁcation, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classiﬁcation in machine learning benchmark data sets. The results demonstrate that very simple network-classiﬁcation models perform quite well—well enough that they should be used regularly as baseline classiﬁers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-ﬁeld classiﬁers, Hopﬁeld networks, and relational-neighbor classiﬁers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection. Keywords: relational learning, network learning, collective inference, collective classiﬁcation, networked data, probabilistic relational models, network analysis, network data

6 0.037065282 72 jmlr-2007-Relational Dependency Networks

7 0.036677092 11 jmlr-2007-Anytime Learning of Decision Trees

8 0.036356095 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

9 0.034595061 14 jmlr-2007-Behavioral Shaping for Geometric Concepts

10 0.027278729 20 jmlr-2007-Combining PAC-Bayesian and Generic Chaining Bounds

11 0.026710266 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

12 0.026594412 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features

13 0.025265444 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

14 0.024995206 6 jmlr-2007-A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians

15 0.024430458 48 jmlr-2007-Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners

16 0.024403879 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)

17 0.024175938 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

18 0.023699293 27 jmlr-2007-Distances between Data Sets Based on Summary Statistics

19 0.022527907 90 jmlr-2007-Value Regularization and Fenchel Duality

20 0.022188798 64 jmlr-2007-Online Learning of Multiple Tasks with a Shared Loss

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.149), (1, 0.108), (2, -0.054), (3, 0.013), (4, -0.017), (5, 0.119), (6, 0.036), (7, -0.02), (8, 0.005), (9, -0.217), (10, 0.038), (11, 0.241), (12, 0.096), (13, 0.136), (14, 0.03), (15, 0.25), (16, -0.064), (17, -0.464), (18, 0.217), (19, 0.1), (20, 0.073), (21, 0.055), (22, -0.083), (23, 0.013), (24, 0.04), (25, 0.006), (26, -0.005), (27, -0.085), (28, 0.096), (29, -0.02), (30, -0.032), (31, -0.006), (32, -0.019), (33, -0.022), (34, 0.023), (35, -0.032), (36, -0.057), (37, -0.018), (38, -0.041), (39, 0.047), (40, -0.016), (41, -0.103), (42, 0.037), (43, -0.041), (44, -0.03), (45, 0.033), (46, 0.014), (47, -0.029), (48, 0.009), (49, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95114696 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

2 0.86173308 43 jmlr-2007-Integrating Naïve Bayes and FOIL

Author: Niels Landwehr, Kristian Kersting, Luc De Raedt

3 0.27496275 67 jmlr-2007-Polynomial Identification in the Limit of Substitutable Context-free Languages

Author: Alexander Clark, Rémi Eyraud

4 0.17784274 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

Author: Roni Khardon, Gabriel Wachman

Abstract: A large number of variants of the Perceptron algorithm have been proposed and partially evaluated in recent work. One type of algorithm aims for noise tolerance by replacing the last hypothesis of the perceptron with another hypothesis or a vote among hypotheses. Another type simply adds a margin term to the perceptron in order to increase robustness and accuracy, as done in support vector machines. A third type borrows further from support vector machines and constrains the update function of the perceptron in ways that mimic soft-margin techniques. The performance of these algorithms, and the potential for combining different techniques, has not been studied in depth. This paper provides such an experimental study and reveals some interesting facts about the algorithms. In particular the perceptron with margin is an effective method for tolerating noise and stabilizing the algorithm. This is surprising since the margin in itself is not designed or used for noise tolerance, and there are no known guarantees for such performance. In most cases, similar performance is obtained by the voted-perceptron which has the advantage that it does not require parameter selection. Techniques using soft margin ideas are run-time intensive and do not give additional performance beneﬁts. The results also highlight the difﬁculty with automatic parameter selection which is required with some of these variants. Keywords: perceptron algorithm, on-line learning, noise tolerance, kernel methods

5 0.1575751 14 jmlr-2007-Behavioral Shaping for Geometric Concepts

Author: Manu Chhabra, Robert A. Jacobs, Daniel Štefankovič

Abstract: In a search problem, an agent uses the membership oracle of a target concept to ﬁnd a positive example of the concept. In a shaped search problem the agent is aided by a sequence of increasingly restrictive concepts leading to the target concept (analogous to behavioral shaping). The concepts are given by membership oracles, and the agent has to ﬁnd a positive example of the target concept while minimizing the total number of oracle queries. We show that for the concept class of intervals on the real line an agent using a bounded number of queries per oracle exists. In contrast, for the concept class of unions of two intervals on the real line no agent with a bounded number of queries per oracle exists. We then investigate the (amortized) number of queries per oracle needed for the shaped search problem over other concept classes. We explore the following methods to obtain efﬁcient agents. For axis-parallel rectangles we use a bootstrapping technique to obtain gradually better approximations of the target concept. We show that given rectangles R ⊆ A ⊆ R d one can obtain a rectangle A ⊇ R with vol(A )/vol(R) ≤ 2, using only O(d · vol(A)/vol(R)) random samples from A. For ellipsoids of bounded eccentricity in Rd we analyze a deterministic ray-shooting process which uses a sequence of rays to get close to the centroid. Finally, we use algorithms for generating random points in convex bodies (Dyer et al., 1991; Kannan et al., 1997) to give a randomized agent for the concept class of convex bodies. In the ﬁnal section, we explore connections between our bootstrapping method and active learning. Speciﬁcally, we use the bootstrapping technique for axis-parallel rectangles to active learn axis-parallel rectangles under the uniform distribution in O(d ln(1/ε)) labeled samples. Keywords: computational learning theory, behavioral shaping, active learning

6 0.15335201 11 jmlr-2007-Anytime Learning of Decision Trees

7 0.15054959 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions (Special Topic on the Conference on Learning Theory 2005)

8 0.12736735 84 jmlr-2007-The Pyramid Match Kernel: Efficient Learning with Sets of Features

9 0.11763563 85 jmlr-2007-Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

10 0.11618993 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

11 0.11549243 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

12 0.11534671 72 jmlr-2007-Relational Dependency Networks

13 0.11192155 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition

14 0.11123752 40 jmlr-2007-Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization

15 0.10678813 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models

16 0.10561039 21 jmlr-2007-Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"

17 0.10371687 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)

18 0.10260554 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning

19 0.102062 30 jmlr-2007-Dynamics and Generalization Ability of LVQ Algorithms

20 0.098985158 90 jmlr-2007-Value Regularization and Fenchel Duality

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.019), (12, 0.02), (15, 0.02), (22, 0.013), (28, 0.519), (40, 0.029), (45, 0.019), (48, 0.027), (58, 0.013), (60, 0.069), (77, 0.011), (80, 0.031), (85, 0.032), (98, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94944745 16 jmlr-2007-Boosted Classification Trees and Class Probability Quantile Estimation

Author: David Mease, Abraham J. Wyner, Andreas Buja

Abstract: The standard by which binary classiﬁers are usually judged, misclassiﬁcation error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1|x]. Boosted classiﬁcation trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classiﬁcation with unequal costs or, equivalently, classiﬁcation at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1|x]. We ﬁrst examine whether the latter problem, estimation of P[y = 1|x], can be solved with LogitBoost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overﬁt P[y = 1|x] even though they perform well as classiﬁers. A major negative point of the present article is the disconnect between class probability estimation and classiﬁcation. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overﬁtting, but for arbitrary misclassiﬁcation costs and, equivalently, arbitrary quantile boundaries. We then use collections of classiﬁers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets. Keywords: boosting algorithms, LogitBoost, AdaBoost, class probability estimation, over-sampling, under-sampling, stratiﬁcation, data jittering

2 0.93001717 45 jmlr-2007-Learnability of Gaussians with Flexible Variances

Author: Yiming Ying, Ding-Xuan Zhou

Abstract: Gaussian kernels with ﬂexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with ﬂexible variances is a uniform Glivenko-Cantelli (uGC) class. This result conﬁrms a conjecture concerning learnability of Gaussian kernels and veriﬁes the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when ﬂexible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with ﬂexible variances we present explicit learning rates for regression with least square loss and classiﬁcation with hinge loss. Keywords: Gaussian kernel, ﬂexible variances, learning theory, Glivenko-Cantelli class, regularization scheme, empirical covering number

same-paper 3 0.87814134 47 jmlr-2007-Learning Horn Expressions with LOGAN-H

Author: Marta Arias, Roni Khardon, Jérôme Maloberti

4 0.61565912 71 jmlr-2007-Refinable Kernels

Author: Yuesheng Xu, Haizhang Zhang

Abstract: Motivated by mathematical learning from training data, we introduce the notion of reﬁnable kernels. Various characterizations of reﬁnable kernels are presented. The concept of reﬁnable kernels leads to the introduction of wavelet-like reproducing kernels. We also investigate a reﬁnable kernel that forms a Riesz basis. In particular, we characterize reﬁnable translation invariant kernels, and reﬁnable kernels deﬁned by reﬁnable functions. This study leads to multiresolution analysis of reproducing kernel Hilbert spaces. Keywords: reﬁnable kernels, reﬁnable feature maps, wavelet-like reproducing kernels, dual kernels, learning with kernels, reproducing kernel Hilbert spaces, Riesz bases

5 0.60854203 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR

Author: Francesco Dinuzzo, Marta Neve, Giuseppe De Nicolao, Ugo Pietro Gianazza

Abstract: Support Vector Regression (SVR) for discrete data is considered. An alternative formulation of the representer theorem is derived. This result is based on the newly introduced notion of pseudoresidual and the use of subdifferential calculus. The representer theorem is exploited to analyze the sensitivity properties of ε-insensitive SVR and introduce the notion of approximate degrees of freedom. The degrees of freedom are shown to play a key role in the evaluation of the optimism, that is the difference between the expected in-sample error and the expected empirical risk. In this way, it is possible to deﬁne a C p -like statistic that can be used for tuning the parameters of SVR. The proposed tuning procedure is tested on a simulated benchmark problem and on a real world problem (Boston Housing data set). Keywords: statistical learning, reproducing kernel Hilbert spaces, support vector machines, representer theorem, regularization theory

6 0.57677615 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)

7 0.52840358 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm

8 0.52704567 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification

9 0.52535403 12 jmlr-2007-Attribute-Efficient and Non-adaptive Learning of Parities and DNF Expressions (Special Topic on the Conference on Learning Theory 2005)

10 0.51979387 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method

11 0.50668061 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection

12 0.50214756 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time

13 0.49701059 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study

14 0.49517611 44 jmlr-2007-Large Margin Semi-supervised Learning

15 0.49218726 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification

16 0.49212801 36 jmlr-2007-Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption

17 0.4863686 77 jmlr-2007-Stagewise Lasso

18 0.48513013 37 jmlr-2007-GiniSupport Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression

19 0.48160979 20 jmlr-2007-Combining PAC-Bayesian and Generic Chaining Bounds

20 0.47464055 13 jmlr-2007-Bayesian Quadratic Discriminant Analysis