jmlr jmlr2005 jmlr2005-59 knowledge-graph by maker-knowledge-mining

59 jmlr-2005-New Horn Revision Algorithms

Source: pdf

Author: Judy Goldsmith, Robert H. Sloan

Abstract: A revision algorithm is a learning algorithm that identiﬁes the target concept, starting from an initial concept. Such an algorithm is considered efﬁcient if its complexity (in terms of the measured resource) is polynomial in the syntactic distance between the initial and the target concept, but only polylogarithmic in the number of variables in the universe. We give efﬁcient revision algorithms in the model of learning with equivalence and membership queries. The algorithms work in a general revision model where both deletion and addition revision operators are allowed. In this model one of the main open problems is the efﬁcient revision of Horn formulas. Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for deﬁnite Horn formulas with unique heads. Keywords: theory revision, Horn formulas, query learning, exact learning, computational learning theory

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The algorithms work in a general revision model where both deletion and addition revision operators are allowed. [sent-8, score-0.801]

2 Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for deﬁnite Horn formulas with unique heads. [sent-10, score-0.514]

3 In machine learning, the study of revision algorithms is referred to as theory revision; more detailed references to the literature are given in Wrobel’s overviews of theory revision (Wrobel, 1994, 1995) and also in our recent papers (Goldsmith et al. [sent-31, score-0.768]

4 We have studied revision algorithms in the model of learning with equivalence and membership queries (Goldsmith et al. [sent-34, score-0.55]

5 Finding an efﬁcient revision algorithm for Horn formulas in the general revision model (deletions and additions) emerged as perhaps the main open problem posed by our previous work on revision algorithms. [sent-40, score-1.241]

6 In this way, an efﬁcient equivalence and membership query algorithm can be turned into an efﬁcient practical revision algorithm. [sent-66, score-0.55]

7 2 Classes of Horn Formulas Considered Horn revision is the problem that most practical theory revision systems address. [sent-86, score-0.768]

8 In this paper we present results for the revision problem outlined above: the revision of Horn formulas in the general revision model allowing both deletions and additions (more precise deﬁ1921 G OLDSMITH AND S LOAN nitions are given in Section 2). [sent-90, score-1.407]

9 Each clause with a head, or nonnegated variable, is interpreted as a potential justiﬁcation for making the head variable true in some model of the program. [sent-98, score-0.881]

10 One of our main results, Theorem 5, shows that this class can be revised using O(dist(ϕ, ψ) · m3 · log n) queries, where n is the number of variables, ϕ is the m-clause initial formula, ψ is the target formula, and dist is the revision distance, which will be deﬁned formally in Section 2. [sent-107, score-0.518]

11 Thus, our second topic in this paper is revision with queries for theories consisting of unique explanations. [sent-117, score-0.503]

12 We also give a revision algorithm for deﬁnite Horn formulas with distinct heads, meaning that no variable ever occurs as the head of more than one Horn clause. [sent-118, score-0.704]

13 Preliminaries We use standard notions from propositional logic such as variable, literal, term (or conjunction), clause (or disjunction), etc. [sent-126, score-0.72]

14 A Horn clause is a disjunction with at most one unnegated variable; we will usually think of it as an implication and call the clause’s unnegated variable its head, and its negated variables its body. [sent-153, score-0.715]

15 A Horn clause with an unnegated variable is called deﬁnite (or positive). [sent-156, score-0.675]

16 If a deﬁnite clause contains only one variable, then that clause is called a fact. [sent-157, score-1.3]

17 We will consider clauses with no unnegated variables to have head F, and will sometimes write them as (body → F). [sent-158, score-0.497]

18 1923 G OLDSMITH AND S LOAN If x satisﬁes the body of Horn clause c, considered as a term, we say x covers c. [sent-168, score-0.896]

19 ) For Horn clause body b (or any monotone term) and vector x, we use b ∩ x for the monotone term that has those variables of b that correspond to 1’s in x. [sent-171, score-0.919]

20 1 Revision The revision distance between a formula ϕ and a concept C is deﬁned to be the minimum number of applications of a speciﬁed set of syntactic revision operators to ϕ needed to obtain a formula for C. [sent-179, score-0.9]

21 (Adding a new literal to open up a new clause or term would be an even more general addition-type operator, which we have not considered so far. [sent-185, score-0.67]

22 We use dist(ϕ, ψ) to denote the revision distance from ϕ to ψ whenever the revision operators are clear from context. [sent-188, score-0.787]

23 A revision algorithm for a formula ϕ has access to membership and equivalence oracles for an unknown target concept and must return some representation of the target concept. [sent-190, score-0.652]

24 Depth-1 acyclic Horn formulas are precisely those where each variable that occurs as a head either occurs as a fact (the head of an empty-bodied clause) or never occurs in the body of any clause. [sent-205, score-0.821]

25 Previously we gave a revision algorithm for unate DNF (which would dualize to unate CNF) that was presented as being able to revise speciﬁcally two clauses (Goldsmith et al. [sent-208, score-0.69]

26 Let us call those variables that occur as the head of a clause of the initial formula head variables. [sent-219, score-1.189]

27 Note that xv cannot falsify any clause with a head other than v. [sent-221, score-0.935]

28 1925 G OLDSMITH AND S LOAN Since v will normally be the head of a Horn clause and we use F to denote the “head” of a headless Horn clause, we will use xF to denote x modiﬁed to set all head variables to 1. [sent-222, score-1.127]

29 If xh falsiﬁes a clause of the target depth-1 acyclic Horn formula with head h, then x also falsiﬁes that clause. [sent-225, score-1.106]

30 Proof Consider target clause b → h, where b is nonempty and b → h is falsiﬁed by xh . [sent-226, score-0.785]

31 If xh covers b, then x covers b, since no head variables may occur in b, and xh \ x consists only of those head variables besides h. [sent-228, score-0.642]

32 The algorithm begins with an assumption that the revision distance from the initial theory to the target theory is d. [sent-230, score-0.493]

33 We stop when the ﬁrst such membership query returns 0; we know that x falsiﬁes a clause with head h. [sent-240, score-1.044]

34 Once a negative counterexample x is associated with a head, we ﬁrst try to use x to make deletions from each existing hypothesis clause with the same head. [sent-242, score-0.867]

35 If no such deletions are possible, then we use x to add a new clause to the hypothesis. [sent-243, score-0.747]

36 If (body(c)∩x)h (or, equivalently, body(c)h ∩xh ) is a negative instance, which we can determine by a membership query, then we can create a new smaller hypothesis clause whose body is body(c)∩ x. [sent-245, score-0.994]

37 For each initial theory clause with the same head as we have associated (which for F is all initial theory clauses, since deletions of heads are allowed), use binary search from x intersect {the initial clause with the other heads set to 1} up to x. [sent-250, score-1.798]

38 This guarantees that in particular we try the initial theory clause with smallest revision distance to the target clause that x falsiﬁes. [sent-253, score-1.793]

39 All necessary additions to this clause are found by the calls to B INARY S EARCH; later only deletions will be needed. [sent-254, score-0.816]

40 ” and d > 0 do 4: h :=A SSOCIATE(x, ϕ) 5: if H has at least one clause then 6: for all clauses c ∈ H with head h do 7: if MQ((body(c) ∩ x)h ) == 0 then {delete vars from c} 8: body(c) = body(c) ∩ x 9: d := d−number of variables removed 10: end if 11: end for 12: end if 13: if no vars. [sent-263, score-1.122]

41 For the inductive step, consider how we update the hypothesis, either by adding a new clause or deleting variables from the body of an existing clause. [sent-268, score-0.899]

42 Before creating or updating a clause to have head h and body y, we have ensured (at Line 7 for updates of existing hypothesis clauses and at Line 27 for adding new clauses) that MQ(yh ) = 0, that is, that yh is a negative instance. [sent-269, score-1.43]

43 Because of that, yh must falsify some clause, and because of its form and the syntactic form of the target, it must be a clause with head h. [sent-270, score-0.967]

44 None of the head variables in yh \ y can be in any body, so y must indeed be a superset of the variables of some target clause with head h, as claimed. [sent-271, score-1.191]

45 If x is not used to make deletions, then xh falsiﬁes any target clauses with head h whose body is covered by x. [sent-273, score-0.836]

46 Further, if x is used to add a new clause with head h to the hypothesis, then the body of the new clause does not cover any target clause body covered by any other hypothesis clause with head h. [sent-274, score-3.649]

47 Proof If x falsiﬁed the same target clause as an existing hypothesis clause body with head h, then the membership query at Line 7 would return 0, and x would be used to delete variables from that hypothesis clause body. [sent-275, score-2.732]

48 Thus if and when x is used to add a new clause, x cannot falsify the same target clause as any existing hypothesis clause with the same head. [sent-278, score-1.483]

49 The newly added hypothesis clause’s body is a subset of x, so that clause body does not cover any other hypothesis clause body with head h. [sent-279, score-2.331]

50 Lemma 3 H ORN R EVISE U P T O D(ϕ, d) succeeds in ﬁnding the target Horn formula ψ if it has revision distance at most d from ϕ. [sent-281, score-0.514]

51 We will assume i=1 i i=1 i throughout the analysis in this proof that the terms of the initial and target formulas are numbered so that they “line up” for calculating the revision distance from ϕ to ψ. [sent-283, score-0.582]

52 Part of our inductive claim is that there is a map a(i) (technically a relation) of hypothesis clauses to target clauses such that (body(ci ))head(ci ) falsiﬁes target clause c∗ . [sent-289, score-1.303]

53 The relation a maps every index i of a hypothesis clause to at least one target clause index, and is one-to-one in the sense that no two target clauses ever have the same hypothesis clause mapped to both of them. [sent-291, score-2.409]

54 The relation a evolves in only two ways: (1) when a(i) is more than one index, sometimes one of those indices gets dropped, and (2) a new i gets added to the domain of a each time a clause is added to the hypothesis. [sent-292, score-0.688]

55 We need to show that a new clause is found using x by at least one iteration of the for all c loop starting at Line 15. [sent-298, score-0.68]

56 Let c∗ be a target clause with i head h or F that x falsiﬁes. [sent-299, score-0.943]

57 At some point ci will be used as the clause in the for all c ∈ ϕ loop at Line 15. [sent-300, score-0.711]

58 As long xh falsiﬁes only target clause c∗ , then after at most d calls to B INARY S EARCH, all i necessary additions to ci will have been found and a clause will be added, completing the base case. [sent-301, score-1.555]

59 By Lemma 2, x does not cover any target clauses covered by clause bodies in the hypothesis, so it must cover one or more new target clauses. [sent-313, score-1.033]

60 To complete the inductive step for this case, note that the relation a can indeed be extended by relating the index of the new hypothesis clause to the one or more target clauses whose body its body covers, so Equation (2) holds. [sent-320, score-1.458]

61 Thus the updated hypothesis clause ci := (body(ci ) ∩ a(i) x) must falsify some or all of the clause(s) c∗ , and the relation a is either unchanged, or altered a(i) by decreasing the range of a(i). [sent-325, score-0.782]

62 By Propoa(i) a(i) sition 1, since MQ((body(ci ) ∩ x)h ) = 0, it must be that body(ci ) ∩ x falsiﬁes a target clause with head h. [sent-328, score-0.943]

63 Further, using the numbering of the target clauses that makes that target clause correspond to ci at the start of the round, the number of variables removed from body(c) is subtracted from the parameter d, and Equation (3) still holds, completing the induction step. [sent-329, score-1.046]

64 Furthermore, the clause added will have all the i variables in body(c∗ ) and no variables not in body(ci ) ∪ body(c∗ ) (i. [sent-331, score-0.669]

65 Theorem 5 There is a revision algorithm for depth-1 acyclic Horn formulas with query complexity O(d · m3 · log n), where d is the revision distance, n is the number of variables in the universe, and m is the number of clauses in the initial formula. [sent-350, score-1.253]

66 The revision distance from ϕ to ψ is 5: 1 for the deletion of head x3 from second clause, 1 for the deletion of the third clause, and 3 for adding the literals x7 , x8 , and x5 to the fourth clause (i. [sent-355, score-1.35]

67 Now we call A SSOCIATE(ϕ, 11101110) to ﬁnd a candidate head for a clause negated by 11101110. [sent-362, score-0.896]

68 ) 1931 G OLDSMITH AND S LOAN Hypothesis H currently has no clauses, so we will use 11101110 to add a new clause to H starting at Line 13. [sent-365, score-0.67]

69 There is no clause in H with head x5 , so we do not try to use instance 01110111 to delete variables from any clause of H. [sent-380, score-1.574]

70 In the for loop starting at Line 13 we consider only clauses with head x5 ; there is exactly one: c = (x1 ∧ x4 → x5 ). [sent-381, score-0.502]

71 In Lines 36–38 we set all head variables of x to 0 so x = 00010111 and add a new clause of the form x → h to H; thus we update H to be H = (x1 → F) ∧ (x4 ∧ x6 ∧ x7 ∧ x8 → x5 ), and decrement d by 3, so d has now been reduced by 4 altogether. [sent-391, score-0.901]

72 There is no clause in H with head x2 , so we do not try to use x to delete variables from existing clauses of H. [sent-394, score-1.149]

73 This time we consider only the one clause c = x2 with head x2 (and empty body). [sent-396, score-0.881]

74 A revision of a formula from class C must also be in class C , so in particular, a revision of a deﬁnite Horn formula also be a deﬁnite Horn formula. [sent-406, score-0.866]

75 Its query complexity is O(d · m3 + m4 ), where d is the revision distance and m is the number of clauses in the initial formula. [sent-410, score-0.758]

76 For each clause c = (b → h), the check in Line 4 whether the vector that is 0 at h and 1 elsewhere is a negative instance determines whether clause c should be deleted altogether. [sent-417, score-1.331]

77 To ﬁnd all necessary additions to the body b of clause c = (b → h), we use a constructed example xc . [sent-418, score-1.011]

78 We initialize xc to bh (the body variables from b, plus all head variables except h). [sent-419, score-0.503]

79 Notice that the only way MQ(xc ) can be 0 is if x covers the body of a clause but not its head. [sent-420, score-0.896]

80 Since xc includes all heads except h, it is clear which clause body is or is not covered by xc ; the notion of “pivots” is not needed in this algorithm. [sent-421, score-1.018]

81 To begin the binary search, xc is the known positive instance that must satisfy the target clause c∗ derived from c, and the assignment with a 0 in position h and a 1 everywhere else is the known negative instance that must falsify c∗ . [sent-425, score-0.875]

82 The process ends when xc becomes a negative instance, a clause with head h and a body variable corresponding to each 1 in xc → h is added to the hypothesis. [sent-427, score-1.23]

83 Once the necessary additions to every clause in the initial theory are found, a Horn formula needing only deletions has been produced, and the deletions-only algorithm D ELETIONS O NLY R E VISE from (Goldsmith et al. [sent-428, score-0.893]

84 Notice that each xc is generated, and each clause is added to the hypothesis, without any equivalence queries being asked. [sent-430, score-0.825]

85 First we show that any entire clause deletions are correct, then we consider the addition of variables to an initial clause. [sent-434, score-0.755]

86 Lemma 6 Algorithm D EFINITE H ORN R EVISE adds a clause that is either the initial formula clause c itself, or a revision of initial clause c made by adding variables to body(c), at Line 12 if and only if some revision of c appears in the target formula. [sent-435, score-2.885]

87 If any clause that is a revision of c appears in the target (not counting the everywhere true clause, which can be omitted from any conjunction), then 0h must falsify this target clause. [sent-438, score-1.212]

88 Conversely, if 0h is a positive instance, then it must be that the target contains no clause with head h, and hence, since the formulas are deﬁnite Horn formulas with unique heads, no clause that is a revision of c. [sent-440, score-2.155]

89 In this case, the algorithm does not add any clause that is a revision of c to its hypothesis. [sent-441, score-1.054]

90 Lemma 7 If any variable is added to the body of an initial clause c of ϕ in Algorithm D EFINITE H ORN R EVISE(ϕ), then some clause c∗ that is derived from c must be in the target formula, and every variable added to c in the loop in Lines 6–9 must be in c∗ . [sent-442, score-1.687]

91 1934 N EW H ORN R EVISION A LGORITHMS Proof If variables are added to the body of clause c, then eventually a clause is added to the hypothesis, and by Lemma 6, we know that this means that a clause derived from c must be in the target formula. [sent-443, score-2.279]

92 By the construction, both x and x must have a 1 in the position of every head except for the head h of c, so x must falsify a target clause that is a revision of c. [sent-446, score-1.631]

93 Proof By Lemmas 6 and 7, each variable added to a clause is necessary, and any clause deleted in the for loop is unnecessary. [sent-449, score-1.349]

94 , 2004b), where m is the number of clauses in the formula to be revised, and d is the revision distance. [sent-452, score-0.674]

95 Now the formula to given to Algorithm D ELETIONS O NLY R EVISE has revision distance at most d + m(m − 1), where the m(m − 1) comes from the up to m − 1 heads added to the bodies of up to m clauses. [sent-453, score-0.542]

96 We insert the clause (x2 ∧ x3 ∧ x5 ∧ x6 → x1 ) into the hypothesis. [sent-465, score-0.65]

97 We insert the clause (x1 ∧ x5 ∧ x6 → x2 ) into the hypothesis. [sent-469, score-0.65]

98 We insert the clause (x1 ∧ x2 ∧ x3 ∧ x4 ∧ x6 → x5 ) into the hypothesis. [sent-473, score-0.65]

99 Clause (x2 → x6 ): Vector 0x6 = 111110 is positive, so we do not put this clause in the hypothesis (i. [sent-475, score-0.697]

100 Could the complexity of revision depend on a ﬁxed maximum number of occurrences as the head of a clause for each variable? [sent-494, score-1.265]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('clause', 0.65), ('revision', 0.384), ('horn', 0.264), ('clauses', 0.241), ('head', 0.231), ('body', 0.229), ('orn', 0.143), ('evise', 0.103), ('falsi', 0.103), ('formulas', 0.089), ('additions', 0.089), ('goldsmith', 0.089), ('queries', 0.086), ('query', 0.086), ('mq', 0.08), ('counterexample', 0.078), ('deletions', 0.077), ('xh', 0.073), ('earch', 0.069), ('inary', 0.062), ('target', 0.062), ('neg', 0.059), ('falsify', 0.054), ('heads', 0.053), ('membership', 0.053), ('numaddedvars', 0.049), ('oldsmith', 0.049), ('ssociate', 0.049), ('formula', 0.049), ('hypothesis', 0.047), ('dist', 0.044), ('evision', 0.044), ('xc', 0.043), ('acyclic', 0.041), ('logic', 0.04), ('pos', 0.037), ('efinite', 0.034), ('theories', 0.033), ('deletion', 0.033), ('dnf', 0.033), ('angluin', 0.033), ('ci', 0.031), ('loop', 0.03), ('propositional', 0.03), ('revising', 0.03), ('xf', 0.03), ('sloan', 0.029), ('loan', 0.029), ('initial', 0.028), ('ew', 0.028), ('equivalence', 0.027), ('delete', 0.027), ('revise', 0.025), ('revisions', 0.025), ('foundaclause', 0.025), ('judy', 0.025), ('rgy', 0.025), ('unnegated', 0.025), ('returns', 0.024), ('lgorithms', 0.022), ('dr', 0.022), ('tur', 0.021), ('monotone', 0.02), ('add', 0.02), ('cnf', 0.02), ('deleting', 0.02), ('eletions', 0.02), ('literal', 0.02), ('mid', 0.02), ('nly', 0.02), ('oe', 0.02), ('pivot', 0.02), ('polylogarithmic', 0.02), ('sz', 0.02), ('unate', 0.02), ('distance', 0.019), ('added', 0.019), ('position', 0.019), ('sentences', 0.018), ('bodies', 0.018), ('answer', 0.017), ('eq', 0.017), ('covers', 0.017), ('bal', 0.017), ('dana', 0.017), ('rth', 0.017), ('yh', 0.017), ('bshouty', 0.017), ('counterexamples', 0.017), ('expert', 0.016), ('instance', 0.016), ('automaton', 0.015), ('doshi', 0.015), ('headless', 0.015), ('negated', 0.015), ('oracles', 0.015), ('revises', 0.015), ('syntactic', 0.015), ('negative', 0.015), ('line', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 59 jmlr-2005-New Horn Revision Algorithms

Author: Judy Goldsmith, Robert H. Sloan

2 0.039034076 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels

Author: Roni Khardon, Rocco A. Servedio

Abstract: Recent work has introduced Boolean kernels with which one can learn linear threshold functions over a feature space containing all conjunctions of length up to k (for any 1 ≤ k ≤ n) over the original n Boolean features in the input space. This motivates the question of whether maximum margin algorithms such as Support Vector Machines can learn Disjunctive Normal Form expressions in the Probably Approximately Correct (PAC) learning model by using this kernel. We study this question, as well as a variant in which structural risk minimization (SRM) is performed where the class hierarchy is taken over the length of conjunctions. We show that maximum margin algorithms using the Boolean kernels do not PAC learn t(n)term DNF for any t(n) = ω(1), even when used with such a SRM scheme. We also consider PAC learning under the uniform distribution and show that if the kernel uses conjunctions of length √ ˜ ω( n) then the maximum margin hypothesis will fail on the uniform distribution as well. Our results concretely illustrate that margin based algorithms may overﬁt when learning simple target functions with natural kernels. Keywords: computational learning theory, kernel methods, PAC learning, Boolean functions

3 0.020596813 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree

4 0.01894662 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

Author: Hal Daumé III, Daniel Marcu

Abstract: We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to deﬁne distributions over the countably inﬁnite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over inﬁnitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artiﬁcial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics. Keywords: supervised clustering, record linkage, citation matching, coreference, Dirichlet process, non-parametric Bayesian

5 0.016638596 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

Author: Josh Bongard, Hod Lipson

Abstract: This paper describes an active learning approach to the problem of grammatical inference, specifically the inference of deterministic ﬁnite automata (DFAs). We refer to the algorithm as the estimation-exploration algorithm (EEA). This approach differs from previous passive and active learning approaches to grammatical inference in that training data is actively proposed by the algorithm, rather than passively receiving training data from some external teacher. Here we show that this algorithm outperforms one version of the most powerful set of algorithms for grammatical inference, evidence driven state merging (EDSM), on randomly-generated DFAs. The performance increase is due to the fact that the EDSM algorithm only works well for DFAs with speciﬁc balances (percentage of positive labelings), while the EEA is more consistent over a wider range of balances. Based on this ﬁnding we propose a more general method for generating DFAs to be used in the development of future grammatical inference algorithms. Keywords: grammatical inference, evolutionary computation, deterministic ﬁnite automata, active learning, system identiﬁcation

6 0.012823917 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

7 0.012110857 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

8 0.0120164 47 jmlr-2005-Learning from Examples as an Inverse Problem

9 0.011810743 20 jmlr-2005-Clustering with Bregman Divergences

10 0.011547098 70 jmlr-2005-Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions

11 0.011167649 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

12 0.01090516 29 jmlr-2005-Efficient Margin Maximizing with Boosting

13 0.010543216 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables

14 0.0097662918 32 jmlr-2005-Expectation Consistent Approximate Inference

15 0.0096095046 41 jmlr-2005-Kernel Methods for Measuring Independence

16 0.0091631273 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

17 0.0088761104 10 jmlr-2005-Adaptive Online Prediction by Following the Perturbed Leader

18 0.0087553877 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks

19 0.0087364987 58 jmlr-2005-Multiclass Classification with Multi-Prototype Support Vector Machines

20 0.0084912218 69 jmlr-2005-Tutorial on Practical Prediction Theory for Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.053), (1, 0.011), (2, 0.046), (3, 0.008), (4, -0.025), (5, -0.007), (6, -0.036), (7, -0.033), (8, 0.037), (9, -0.037), (10, 0.078), (11, 0.002), (12, 0.036), (13, 0.072), (14, -0.01), (15, -0.162), (16, 0.01), (17, -0.054), (18, 0.148), (19, -0.201), (20, 0.038), (21, -0.011), (22, -0.387), (23, -0.065), (24, 0.308), (25, -0.32), (26, -0.034), (27, 0.065), (28, -0.289), (29, -0.12), (30, -0.513), (31, 0.134), (32, 0.105), (33, -0.016), (34, 0.199), (35, -0.169), (36, -0.06), (37, -0.137), (38, 0.055), (39, 0.058), (40, 0.013), (41, -0.016), (42, 0.082), (43, -0.07), (44, -0.058), (45, 0.006), (46, -0.064), (47, -0.048), (48, 0.027), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9917751 59 jmlr-2005-New Horn Revision Algorithms

Author: Judy Goldsmith, Robert H. Sloan

2 0.10262726 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels

Author: Roni Khardon, Rocco A. Servedio

3 0.066662572 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

Author: Robert G. Cowell

4 0.057949271 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization

Author: Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer

Abstract: We describe new loss functions for regression problems along with an accompanying algorithmic framework which utilizes these functions. These loss functions are derived by symmetrization of margin-based losses commonly used in boosting algorithms, namely, the logistic loss and the exponential loss. The resulting symmetric logistic loss can be viewed as a smooth approximation to the ε-insensitive hinge loss used in support vector regression. We describe and analyze two parametric families of batch learning algorithms for minimizing these symmetric losses. The ﬁrst family employs an iterative log-additive update which can be viewed as a regression counterpart to recent boosting algorithms. The second family utilizes an iterative additive update step. We also describe and analyze online gradient descent (GD) and exponentiated gradient (EG) algorithms for the symmetric logistic loss. A byproduct of our work is a new simple form of regularization for boosting-based classiﬁcation and regression algorithms. Our regression framework also has implications on classiﬁcation algorithms, namely, a new additive update boosting algorithm for classiﬁcation. We demonstrate the merits of our algorithms in a series of experiments.

5 0.054974474 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior

Author: Hal Daumé III, Daniel Marcu

6 0.050558556 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

7 0.041619465 47 jmlr-2005-Learning from Examples as an Inverse Problem

8 0.041376282 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach

9 0.040249754 20 jmlr-2005-Clustering with Bregman Divergences

10 0.037690639 11 jmlr-2005-Algorithmic Stability and Meta-Learning

11 0.037376415 32 jmlr-2005-Expectation Consistent Approximate Inference

12 0.036549944 41 jmlr-2005-Kernel Methods for Measuring Independence

13 0.036489882 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

14 0.036014158 58 jmlr-2005-Multiclass Classification with Multi-Prototype Support Vector Machines

15 0.035081796 53 jmlr-2005-Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application

16 0.033917163 24 jmlr-2005-Core Vector Machines: Fast SVM Training on Very Large Data Sets

17 0.032469124 61 jmlr-2005-Prioritization Methods for Accelerating MDP Solvers

18 0.03030796 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks

19 0.029195426 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

20 0.028911345 67 jmlr-2005-Stability of Randomized Learning Algorithms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.011), (17, 0.015), (36, 0.015), (37, 0.021), (43, 0.023), (47, 0.015), (52, 0.05), (59, 0.015), (70, 0.642), (82, 0.015), (88, 0.032), (94, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91983289 1 jmlr-2005-A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes

Author: Marc Boullé

Abstract: In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In this paper, we propose a new grouping method MODL1 founded on a Bayesian approach. The method relies on a model space of grouping models and on a prior distribution defined on this model space. This results in an evaluation criterion of grouping, which is minimal for the most probable grouping given the data, i.e. the Bayes optimal grouping. We propose new super-linear optimization heuristics that yields near-optimal groupings. Extensive comparative experiments demonstrate that the MODL grouping method builds high quality groupings in terms of predictive quality, robustness and small number of groups. Keywords: data preparation, grouping, Bayesianism, model selection, classification, naïve Bayes 1

same-paper 2 0.90898389 59 jmlr-2005-New Horn Revision Algorithms

Author: Judy Goldsmith, Robert H. Sloan

3 0.84071058 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification

Author: Malte Kuss, Carl Edward Rasmussen

Abstract: Gaussian process priors can be used to deﬁne ﬂexible, probabilistic classiﬁcation models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classiﬁcation model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classiﬁcation, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC

4 0.33791986 36 jmlr-2005-Gaussian Processes for Ordinal Regression

Author: Wei Chu, Zoubin Ghahramani

Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative ﬁltering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach. Keywords: Gaussian processes, ordinal regression, approximate Bayesian inference, collaborative ﬁltering, gene expression analysis, feature selection

5 0.29972601 32 jmlr-2005-Expectation Consistent Approximate Inference

Author: Manfred Opper, Ole Winther

Abstract: We propose a novel framework for approximations to intractable probabilistic models which is based on a free energy formulation. The approximation can be understood as replacing an average over the original intractable distribution with a tractable one. It requires two tractable probability distributions which are made consistent on a set of moments and encode diﬀerent features of the original intractable distribution. In this way we are able to use Gaussian approximations for models with discrete or bounded variables which allow us to include non-trivial correlations. These are neglected in many other methods. We test the framework on toy benchmark problems for binary variables on fully connected graphs and 2D grids and compare with other methods, such as loopy belief propagation. Good performance is already achieved by using single nodes as tractable substructures. Signiﬁcant improvements are obtained when a spanning tree is used instead.

6 0.29456931 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

7 0.28453216 71 jmlr-2005-Variational Message Passing

8 0.27891839 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching

9 0.26585299 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata

10 0.26460576 54 jmlr-2005-Managing Diversity in Regression Ensembles

11 0.26359171 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial

12 0.25814146 65 jmlr-2005-Separating a Real-Life Nonlinear Image Mixture

13 0.24936157 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks

14 0.24380735 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints

15 0.24107333 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning

16 0.24018642 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)

17 0.23639768 7 jmlr-2005-A Unifying View of Sparse Approximate Gaussian Process Regression

18 0.23349082 69 jmlr-2005-Tutorial on Practical Prediction Theory for Classification

19 0.22951245 13 jmlr-2005-Analysis of Variance of Cross-Validation Estimators of the Generalization Error

20 0.22265129 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors