jmlr jmlr2005 jmlr2005-59 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Judy Goldsmith, Robert H. Sloan
Abstract: A revision algorithm is a learning algorithm that identifies the target concept, starting from an initial concept. Such an algorithm is considered efficient if its complexity (in terms of the measured resource) is polynomial in the syntactic distance between the initial and the target concept, but only polylogarithmic in the number of variables in the universe. We give efficient revision algorithms in the model of learning with equivalence and membership queries. The algorithms work in a general revision model where both deletion and addition revision operators are allowed. In this model one of the main open problems is the efficient revision of Horn formulas. Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for definite Horn formulas with unique heads. Keywords: theory revision, Horn formulas, query learning, exact learning, computational learning theory
Reference: text
sentIndex sentText sentNum sentScore
1 The algorithms work in a general revision model where both deletion and addition revision operators are allowed. [sent-8, score-0.801]
2 Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for definite Horn formulas with unique heads. [sent-10, score-0.514]
3 In machine learning, the study of revision algorithms is referred to as theory revision; more detailed references to the literature are given in Wrobel’s overviews of theory revision (Wrobel, 1994, 1995) and also in our recent papers (Goldsmith et al. [sent-31, score-0.768]
4 We have studied revision algorithms in the model of learning with equivalence and membership queries (Goldsmith et al. [sent-34, score-0.55]
5 Finding an efficient revision algorithm for Horn formulas in the general revision model (deletions and additions) emerged as perhaps the main open problem posed by our previous work on revision algorithms. [sent-40, score-1.241]
6 In this way, an efficient equivalence and membership query algorithm can be turned into an efficient practical revision algorithm. [sent-66, score-0.55]
7 2 Classes of Horn Formulas Considered Horn revision is the problem that most practical theory revision systems address. [sent-86, score-0.768]
8 In this paper we present results for the revision problem outlined above: the revision of Horn formulas in the general revision model allowing both deletions and additions (more precise defi1921 G OLDSMITH AND S LOAN nitions are given in Section 2). [sent-90, score-1.407]
9 Each clause with a head, or nonnegated variable, is interpreted as a potential justification for making the head variable true in some model of the program. [sent-98, score-0.881]
10 One of our main results, Theorem 5, shows that this class can be revised using O(dist(ϕ, ψ) · m3 · log n) queries, where n is the number of variables, ϕ is the m-clause initial formula, ψ is the target formula, and dist is the revision distance, which will be defined formally in Section 2. [sent-107, score-0.518]
11 Thus, our second topic in this paper is revision with queries for theories consisting of unique explanations. [sent-117, score-0.503]
12 We also give a revision algorithm for definite Horn formulas with distinct heads, meaning that no variable ever occurs as the head of more than one Horn clause. [sent-118, score-0.704]
13 Preliminaries We use standard notions from propositional logic such as variable, literal, term (or conjunction), clause (or disjunction), etc. [sent-126, score-0.72]
14 A Horn clause is a disjunction with at most one unnegated variable; we will usually think of it as an implication and call the clause’s unnegated variable its head, and its negated variables its body. [sent-153, score-0.715]
15 A Horn clause with an unnegated variable is called definite (or positive). [sent-156, score-0.675]
16 If a definite clause contains only one variable, then that clause is called a fact. [sent-157, score-1.3]
17 We will consider clauses with no unnegated variables to have head F, and will sometimes write them as (body → F). [sent-158, score-0.497]
18 1923 G OLDSMITH AND S LOAN If x satisfies the body of Horn clause c, considered as a term, we say x covers c. [sent-168, score-0.896]
19 ) For Horn clause body b (or any monotone term) and vector x, we use b ∩ x for the monotone term that has those variables of b that correspond to 1’s in x. [sent-171, score-0.919]
20 1 Revision The revision distance between a formula ϕ and a concept C is defined to be the minimum number of applications of a specified set of syntactic revision operators to ϕ needed to obtain a formula for C. [sent-179, score-0.9]
21 (Adding a new literal to open up a new clause or term would be an even more general addition-type operator, which we have not considered so far. [sent-185, score-0.67]
22 We use dist(ϕ, ψ) to denote the revision distance from ϕ to ψ whenever the revision operators are clear from context. [sent-188, score-0.787]
23 A revision algorithm for a formula ϕ has access to membership and equivalence oracles for an unknown target concept and must return some representation of the target concept. [sent-190, score-0.652]
24 Depth-1 acyclic Horn formulas are precisely those where each variable that occurs as a head either occurs as a fact (the head of an empty-bodied clause) or never occurs in the body of any clause. [sent-205, score-0.821]
25 Previously we gave a revision algorithm for unate DNF (which would dualize to unate CNF) that was presented as being able to revise specifically two clauses (Goldsmith et al. [sent-208, score-0.69]
26 Let us call those variables that occur as the head of a clause of the initial formula head variables. [sent-219, score-1.189]
27 Note that xv cannot falsify any clause with a head other than v. [sent-221, score-0.935]
28 1925 G OLDSMITH AND S LOAN Since v will normally be the head of a Horn clause and we use F to denote the “head” of a headless Horn clause, we will use xF to denote x modified to set all head variables to 1. [sent-222, score-1.127]
29 If xh falsifies a clause of the target depth-1 acyclic Horn formula with head h, then x also falsifies that clause. [sent-225, score-1.106]
30 Proof Consider target clause b → h, where b is nonempty and b → h is falsified by xh . [sent-226, score-0.785]
31 If xh covers b, then x covers b, since no head variables may occur in b, and xh \ x consists only of those head variables besides h. [sent-228, score-0.642]
32 The algorithm begins with an assumption that the revision distance from the initial theory to the target theory is d. [sent-230, score-0.493]
33 We stop when the first such membership query returns 0; we know that x falsifies a clause with head h. [sent-240, score-1.044]
34 Once a negative counterexample x is associated with a head, we first try to use x to make deletions from each existing hypothesis clause with the same head. [sent-242, score-0.867]
35 If no such deletions are possible, then we use x to add a new clause to the hypothesis. [sent-243, score-0.747]
36 If (body(c)∩x)h (or, equivalently, body(c)h ∩xh ) is a negative instance, which we can determine by a membership query, then we can create a new smaller hypothesis clause whose body is body(c)∩ x. [sent-245, score-0.994]
37 For each initial theory clause with the same head as we have associated (which for F is all initial theory clauses, since deletions of heads are allowed), use binary search from x intersect {the initial clause with the other heads set to 1} up to x. [sent-250, score-1.798]
38 This guarantees that in particular we try the initial theory clause with smallest revision distance to the target clause that x falsifies. [sent-253, score-1.793]
39 All necessary additions to this clause are found by the calls to B INARY S EARCH; later only deletions will be needed. [sent-254, score-0.816]
40 ” and d > 0 do 4: h :=A SSOCIATE(x, ϕ) 5: if H has at least one clause then 6: for all clauses c ∈ H with head h do 7: if MQ((body(c) ∩ x)h ) == 0 then {delete vars from c} 8: body(c) = body(c) ∩ x 9: d := d−number of variables removed 10: end if 11: end for 12: end if 13: if no vars. [sent-263, score-1.122]
41 For the inductive step, consider how we update the hypothesis, either by adding a new clause or deleting variables from the body of an existing clause. [sent-268, score-0.899]
42 Before creating or updating a clause to have head h and body y, we have ensured (at Line 7 for updates of existing hypothesis clauses and at Line 27 for adding new clauses) that MQ(yh ) = 0, that is, that yh is a negative instance. [sent-269, score-1.43]
43 Because of that, yh must falsify some clause, and because of its form and the syntactic form of the target, it must be a clause with head h. [sent-270, score-0.967]
44 None of the head variables in yh \ y can be in any body, so y must indeed be a superset of the variables of some target clause with head h, as claimed. [sent-271, score-1.191]
45 If x is not used to make deletions, then xh falsifies any target clauses with head h whose body is covered by x. [sent-273, score-0.836]
46 Further, if x is used to add a new clause with head h to the hypothesis, then the body of the new clause does not cover any target clause body covered by any other hypothesis clause with head h. [sent-274, score-3.649]
47 Proof If x falsified the same target clause as an existing hypothesis clause body with head h, then the membership query at Line 7 would return 0, and x would be used to delete variables from that hypothesis clause body. [sent-275, score-2.732]
48 Thus if and when x is used to add a new clause, x cannot falsify the same target clause as any existing hypothesis clause with the same head. [sent-278, score-1.483]
49 The newly added hypothesis clause’s body is a subset of x, so that clause body does not cover any other hypothesis clause body with head h. [sent-279, score-2.331]
50 Lemma 3 H ORN R EVISE U P T O D(ϕ, d) succeeds in finding the target Horn formula ψ if it has revision distance at most d from ϕ. [sent-281, score-0.514]
51 We will assume i=1 i i=1 i throughout the analysis in this proof that the terms of the initial and target formulas are numbered so that they “line up” for calculating the revision distance from ϕ to ψ. [sent-283, score-0.582]
52 Part of our inductive claim is that there is a map a(i) (technically a relation) of hypothesis clauses to target clauses such that (body(ci ))head(ci ) falsifies target clause c∗ . [sent-289, score-1.303]
53 The relation a maps every index i of a hypothesis clause to at least one target clause index, and is one-to-one in the sense that no two target clauses ever have the same hypothesis clause mapped to both of them. [sent-291, score-2.409]
54 The relation a evolves in only two ways: (1) when a(i) is more than one index, sometimes one of those indices gets dropped, and (2) a new i gets added to the domain of a each time a clause is added to the hypothesis. [sent-292, score-0.688]
55 We need to show that a new clause is found using x by at least one iteration of the for all c loop starting at Line 15. [sent-298, score-0.68]
56 Let c∗ be a target clause with i head h or F that x falsifies. [sent-299, score-0.943]
57 At some point ci will be used as the clause in the for all c ∈ ϕ loop at Line 15. [sent-300, score-0.711]
58 As long xh falsifies only target clause c∗ , then after at most d calls to B INARY S EARCH, all i necessary additions to ci will have been found and a clause will be added, completing the base case. [sent-301, score-1.555]
59 By Lemma 2, x does not cover any target clauses covered by clause bodies in the hypothesis, so it must cover one or more new target clauses. [sent-313, score-1.033]
60 To complete the inductive step for this case, note that the relation a can indeed be extended by relating the index of the new hypothesis clause to the one or more target clauses whose body its body covers, so Equation (2) holds. [sent-320, score-1.458]
61 Thus the updated hypothesis clause ci := (body(ci ) ∩ a(i) x) must falsify some or all of the clause(s) c∗ , and the relation a is either unchanged, or altered a(i) by decreasing the range of a(i). [sent-325, score-0.782]
62 By Propoa(i) a(i) sition 1, since MQ((body(ci ) ∩ x)h ) = 0, it must be that body(ci ) ∩ x falsifies a target clause with head h. [sent-328, score-0.943]
63 Further, using the numbering of the target clauses that makes that target clause correspond to ci at the start of the round, the number of variables removed from body(c) is subtracted from the parameter d, and Equation (3) still holds, completing the induction step. [sent-329, score-1.046]
64 Furthermore, the clause added will have all the i variables in body(c∗ ) and no variables not in body(ci ) ∪ body(c∗ ) (i. [sent-331, score-0.669]
65 Theorem 5 There is a revision algorithm for depth-1 acyclic Horn formulas with query complexity O(d · m3 · log n), where d is the revision distance, n is the number of variables in the universe, and m is the number of clauses in the initial formula. [sent-350, score-1.253]
66 The revision distance from ϕ to ψ is 5: 1 for the deletion of head x3 from second clause, 1 for the deletion of the third clause, and 3 for adding the literals x7 , x8 , and x5 to the fourth clause (i. [sent-355, score-1.35]
67 Now we call A SSOCIATE(ϕ, 11101110) to find a candidate head for a clause negated by 11101110. [sent-362, score-0.896]
68 ) 1931 G OLDSMITH AND S LOAN Hypothesis H currently has no clauses, so we will use 11101110 to add a new clause to H starting at Line 13. [sent-365, score-0.67]
69 There is no clause in H with head x5 , so we do not try to use instance 01110111 to delete variables from any clause of H. [sent-380, score-1.574]
70 In the for loop starting at Line 13 we consider only clauses with head x5 ; there is exactly one: c = (x1 ∧ x4 → x5 ). [sent-381, score-0.502]
71 In Lines 36–38 we set all head variables of x to 0 so x = 00010111 and add a new clause of the form x → h to H; thus we update H to be H = (x1 → F) ∧ (x4 ∧ x6 ∧ x7 ∧ x8 → x5 ), and decrement d by 3, so d has now been reduced by 4 altogether. [sent-391, score-0.901]
72 There is no clause in H with head x2 , so we do not try to use x to delete variables from existing clauses of H. [sent-394, score-1.149]
73 This time we consider only the one clause c = x2 with head x2 (and empty body). [sent-396, score-0.881]
74 A revision of a formula from class C must also be in class C , so in particular, a revision of a definite Horn formula also be a definite Horn formula. [sent-406, score-0.866]
75 Its query complexity is O(d · m3 + m4 ), where d is the revision distance and m is the number of clauses in the initial formula. [sent-410, score-0.758]
76 For each clause c = (b → h), the check in Line 4 whether the vector that is 0 at h and 1 elsewhere is a negative instance determines whether clause c should be deleted altogether. [sent-417, score-1.331]
77 To find all necessary additions to the body b of clause c = (b → h), we use a constructed example xc . [sent-418, score-1.011]
78 We initialize xc to bh (the body variables from b, plus all head variables except h). [sent-419, score-0.503]
79 Notice that the only way MQ(xc ) can be 0 is if x covers the body of a clause but not its head. [sent-420, score-0.896]
80 Since xc includes all heads except h, it is clear which clause body is or is not covered by xc ; the notion of “pivots” is not needed in this algorithm. [sent-421, score-1.018]
81 To begin the binary search, xc is the known positive instance that must satisfy the target clause c∗ derived from c, and the assignment with a 0 in position h and a 1 everywhere else is the known negative instance that must falsify c∗ . [sent-425, score-0.875]
82 The process ends when xc becomes a negative instance, a clause with head h and a body variable corresponding to each 1 in xc → h is added to the hypothesis. [sent-427, score-1.23]
83 Once the necessary additions to every clause in the initial theory are found, a Horn formula needing only deletions has been produced, and the deletions-only algorithm D ELETIONS O NLY R E VISE from (Goldsmith et al. [sent-428, score-0.893]
84 Notice that each xc is generated, and each clause is added to the hypothesis, without any equivalence queries being asked. [sent-430, score-0.825]
85 First we show that any entire clause deletions are correct, then we consider the addition of variables to an initial clause. [sent-434, score-0.755]
86 Lemma 6 Algorithm D EFINITE H ORN R EVISE adds a clause that is either the initial formula clause c itself, or a revision of initial clause c made by adding variables to body(c), at Line 12 if and only if some revision of c appears in the target formula. [sent-435, score-2.885]
87 If any clause that is a revision of c appears in the target (not counting the everywhere true clause, which can be omitted from any conjunction), then 0h must falsify this target clause. [sent-438, score-1.212]
88 Conversely, if 0h is a positive instance, then it must be that the target contains no clause with head h, and hence, since the formulas are definite Horn formulas with unique heads, no clause that is a revision of c. [sent-440, score-2.155]
89 In this case, the algorithm does not add any clause that is a revision of c to its hypothesis. [sent-441, score-1.054]
90 Lemma 7 If any variable is added to the body of an initial clause c of ϕ in Algorithm D EFINITE H ORN R EVISE(ϕ), then some clause c∗ that is derived from c must be in the target formula, and every variable added to c in the loop in Lines 6–9 must be in c∗ . [sent-442, score-1.687]
91 1934 N EW H ORN R EVISION A LGORITHMS Proof If variables are added to the body of clause c, then eventually a clause is added to the hypothesis, and by Lemma 6, we know that this means that a clause derived from c must be in the target formula. [sent-443, score-2.279]
92 By the construction, both x and x must have a 1 in the position of every head except for the head h of c, so x must falsify a target clause that is a revision of c. [sent-446, score-1.631]
93 Proof By Lemmas 6 and 7, each variable added to a clause is necessary, and any clause deleted in the for loop is unnecessary. [sent-449, score-1.349]
94 , 2004b), where m is the number of clauses in the formula to be revised, and d is the revision distance. [sent-452, score-0.674]
95 Now the formula to given to Algorithm D ELETIONS O NLY R EVISE has revision distance at most d + m(m − 1), where the m(m − 1) comes from the up to m − 1 heads added to the bodies of up to m clauses. [sent-453, score-0.542]
96 We insert the clause (x2 ∧ x3 ∧ x5 ∧ x6 → x1 ) into the hypothesis. [sent-465, score-0.65]
97 We insert the clause (x1 ∧ x5 ∧ x6 → x2 ) into the hypothesis. [sent-469, score-0.65]
98 We insert the clause (x1 ∧ x2 ∧ x3 ∧ x4 ∧ x6 → x5 ) into the hypothesis. [sent-473, score-0.65]
99 Clause (x2 → x6 ): Vector 0x6 = 111110 is positive, so we do not put this clause in the hypothesis (i. [sent-475, score-0.697]
100 Could the complexity of revision depend on a fixed maximum number of occurrences as the head of a clause for each variable? [sent-494, score-1.265]
wordName wordTfidf (topN-words)
[('clause', 0.65), ('revision', 0.384), ('horn', 0.264), ('clauses', 0.241), ('head', 0.231), ('body', 0.229), ('orn', 0.143), ('evise', 0.103), ('falsi', 0.103), ('formulas', 0.089), ('additions', 0.089), ('goldsmith', 0.089), ('queries', 0.086), ('query', 0.086), ('mq', 0.08), ('counterexample', 0.078), ('deletions', 0.077), ('xh', 0.073), ('earch', 0.069), ('inary', 0.062), ('target', 0.062), ('neg', 0.059), ('falsify', 0.054), ('heads', 0.053), ('membership', 0.053), ('numaddedvars', 0.049), ('oldsmith', 0.049), ('ssociate', 0.049), ('formula', 0.049), ('hypothesis', 0.047), ('dist', 0.044), ('evision', 0.044), ('xc', 0.043), ('acyclic', 0.041), ('logic', 0.04), ('pos', 0.037), ('efinite', 0.034), ('theories', 0.033), ('deletion', 0.033), ('dnf', 0.033), ('angluin', 0.033), ('ci', 0.031), ('loop', 0.03), ('propositional', 0.03), ('revising', 0.03), ('xf', 0.03), ('sloan', 0.029), ('loan', 0.029), ('initial', 0.028), ('ew', 0.028), ('equivalence', 0.027), ('delete', 0.027), ('revise', 0.025), ('revisions', 0.025), ('foundaclause', 0.025), ('judy', 0.025), ('rgy', 0.025), ('unnegated', 0.025), ('returns', 0.024), ('lgorithms', 0.022), ('dr', 0.022), ('tur', 0.021), ('monotone', 0.02), ('add', 0.02), ('cnf', 0.02), ('deleting', 0.02), ('eletions', 0.02), ('literal', 0.02), ('mid', 0.02), ('nly', 0.02), ('oe', 0.02), ('pivot', 0.02), ('polylogarithmic', 0.02), ('sz', 0.02), ('unate', 0.02), ('distance', 0.019), ('added', 0.019), ('position', 0.019), ('sentences', 0.018), ('bodies', 0.018), ('answer', 0.017), ('eq', 0.017), ('covers', 0.017), ('bal', 0.017), ('dana', 0.017), ('rth', 0.017), ('yh', 0.017), ('bshouty', 0.017), ('counterexamples', 0.017), ('expert', 0.016), ('instance', 0.016), ('automaton', 0.015), ('doshi', 0.015), ('headless', 0.015), ('negated', 0.015), ('oracles', 0.015), ('revises', 0.015), ('syntactic', 0.015), ('negative', 0.015), ('line', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 59 jmlr-2005-New Horn Revision Algorithms
Author: Judy Goldsmith, Robert H. Sloan
Abstract: A revision algorithm is a learning algorithm that identifies the target concept, starting from an initial concept. Such an algorithm is considered efficient if its complexity (in terms of the measured resource) is polynomial in the syntactic distance between the initial and the target concept, but only polylogarithmic in the number of variables in the universe. We give efficient revision algorithms in the model of learning with equivalence and membership queries. The algorithms work in a general revision model where both deletion and addition revision operators are allowed. In this model one of the main open problems is the efficient revision of Horn formulas. Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for definite Horn formulas with unique heads. Keywords: theory revision, Horn formulas, query learning, exact learning, computational learning theory
2 0.039034076 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels
Author: Roni Khardon, Rocco A. Servedio
Abstract: Recent work has introduced Boolean kernels with which one can learn linear threshold functions over a feature space containing all conjunctions of length up to k (for any 1 ≤ k ≤ n) over the original n Boolean features in the input space. This motivates the question of whether maximum margin algorithms such as Support Vector Machines can learn Disjunctive Normal Form expressions in the Probably Approximately Correct (PAC) learning model by using this kernel. We study this question, as well as a variant in which structural risk minimization (SRM) is performed where the class hierarchy is taken over the length of conjunctions. We show that maximum margin algorithms using the Boolean kernels do not PAC learn t(n)term DNF for any t(n) = ω(1), even when used with such a SRM scheme. We also consider PAC learning under the uniform distribution and show that if the kernel uses conjunctions of length √ ˜ ω( n) then the maximum margin hypothesis will fail on the uniform distribution as well. Our results concretely illustrate that margin based algorithms may overfit when learning simple target functions with natural kernels. Keywords: computational learning theory, kernel methods, PAC learning, Boolean functions
3 0.020596813 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks
Author: Robert G. Cowell
Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree
4 0.01894662 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
Author: Hal Daumé III, Daniel Marcu
Abstract: We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics. Keywords: supervised clustering, record linkage, citation matching, coreference, Dirichlet process, non-parametric Bayesian
5 0.016638596 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata
Author: Josh Bongard, Hod Lipson
Abstract: This paper describes an active learning approach to the problem of grammatical inference, specifically the inference of deterministic finite automata (DFAs). We refer to the algorithm as the estimation-exploration algorithm (EEA). This approach differs from previous passive and active learning approaches to grammatical inference in that training data is actively proposed by the algorithm, rather than passively receiving training data from some external teacher. Here we show that this algorithm outperforms one version of the most powerful set of algorithms for grammatical inference, evidence driven state merging (EDSM), on randomly-generated DFAs. The performance increase is due to the fact that the EDSM algorithm only works well for DFAs with specific balances (percentage of positive labelings), while the EEA is more consistent over a wider range of balances. Based on this finding we propose a more general method for generating DFAs to be used in the development of future grammatical inference algorithms. Keywords: grammatical inference, evolutionary computation, deterministic finite automata, active learning, system identification
6 0.012823917 34 jmlr-2005-Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach
7 0.012110857 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
8 0.0120164 47 jmlr-2005-Learning from Examples as an Inverse Problem
9 0.011810743 20 jmlr-2005-Clustering with Bregman Divergences
10 0.011547098 70 jmlr-2005-Universal Algorithms for Learning Theory Part I : Piecewise Constant Functions
11 0.011167649 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization
12 0.01090516 29 jmlr-2005-Efficient Margin Maximizing with Boosting
13 0.010543216 42 jmlr-2005-Large Margin Methods for Structured and Interdependent Output Variables
14 0.0097662918 32 jmlr-2005-Expectation Consistent Approximate Inference
15 0.0096095046 41 jmlr-2005-Kernel Methods for Measuring Independence
16 0.0091631273 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach
17 0.0088761104 10 jmlr-2005-Adaptive Online Prediction by Following the Perturbed Leader
18 0.0087553877 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks
19 0.0087364987 58 jmlr-2005-Multiclass Classification with Multi-Prototype Support Vector Machines
20 0.0084912218 69 jmlr-2005-Tutorial on Practical Prediction Theory for Classification
topicId topicWeight
[(0, 0.053), (1, 0.011), (2, 0.046), (3, 0.008), (4, -0.025), (5, -0.007), (6, -0.036), (7, -0.033), (8, 0.037), (9, -0.037), (10, 0.078), (11, 0.002), (12, 0.036), (13, 0.072), (14, -0.01), (15, -0.162), (16, 0.01), (17, -0.054), (18, 0.148), (19, -0.201), (20, 0.038), (21, -0.011), (22, -0.387), (23, -0.065), (24, 0.308), (25, -0.32), (26, -0.034), (27, 0.065), (28, -0.289), (29, -0.12), (30, -0.513), (31, 0.134), (32, 0.105), (33, -0.016), (34, 0.199), (35, -0.169), (36, -0.06), (37, -0.137), (38, 0.055), (39, 0.058), (40, 0.013), (41, -0.016), (42, 0.082), (43, -0.07), (44, -0.058), (45, 0.006), (46, -0.064), (47, -0.048), (48, 0.027), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.9917751 59 jmlr-2005-New Horn Revision Algorithms
Author: Judy Goldsmith, Robert H. Sloan
Abstract: A revision algorithm is a learning algorithm that identifies the target concept, starting from an initial concept. Such an algorithm is considered efficient if its complexity (in terms of the measured resource) is polynomial in the syntactic distance between the initial and the target concept, but only polylogarithmic in the number of variables in the universe. We give efficient revision algorithms in the model of learning with equivalence and membership queries. The algorithms work in a general revision model where both deletion and addition revision operators are allowed. In this model one of the main open problems is the efficient revision of Horn formulas. Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for definite Horn formulas with unique heads. Keywords: theory revision, Horn formulas, query learning, exact learning, computational learning theory
2 0.10262726 56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels
Author: Roni Khardon, Rocco A. Servedio
Abstract: Recent work has introduced Boolean kernels with which one can learn linear threshold functions over a feature space containing all conjunctions of length up to k (for any 1 ≤ k ≤ n) over the original n Boolean features in the input space. This motivates the question of whether maximum margin algorithms such as Support Vector Machines can learn Disjunctive Normal Form expressions in the Probably Approximately Correct (PAC) learning model by using this kernel. We study this question, as well as a variant in which structural risk minimization (SRM) is performed where the class hierarchy is taken over the length of conjunctions. We show that maximum margin algorithms using the Boolean kernels do not PAC learn t(n)term DNF for any t(n) = ω(1), even when used with such a SRM scheme. We also consider PAC learning under the uniform distribution and show that if the kernel uses conjunctions of length √ ˜ ω( n) then the maximum margin hypothesis will fail on the uniform distribution as well. Our results concretely illustrate that margin based algorithms may overfit when learning simple target functions with natural kernels. Keywords: computational learning theory, kernel methods, PAC learning, Boolean functions
3 0.066662572 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks
Author: Robert G. Cowell
Abstract: This paper describes a scheme for local computation in conditional Gaussian Bayesian networks that combines the approach of Lauritzen and Jensen (2001) with some elements of Shachter and Kenley (1989). Message passing takes place on an elimination tree structure rather than the more compact (and usual) junction tree of cliques. This yields a local computation scheme in which all calculations involving the continuous variables are performed by manipulating univariate regressions, and hence matrix operations are avoided. Keywords: Bayesian networks, conditional Gaussian distributions, propagation algorithm, elimination tree
4 0.057949271 66 jmlr-2005-Smooth ε-Insensitive Regression by Loss Symmetrization
Author: Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer
Abstract: We describe new loss functions for regression problems along with an accompanying algorithmic framework which utilizes these functions. These loss functions are derived by symmetrization of margin-based losses commonly used in boosting algorithms, namely, the logistic loss and the exponential loss. The resulting symmetric logistic loss can be viewed as a smooth approximation to the ε-insensitive hinge loss used in support vector regression. We describe and analyze two parametric families of batch learning algorithms for minimizing these symmetric losses. The first family employs an iterative log-additive update which can be viewed as a regression counterpart to recent boosting algorithms. The second family utilizes an iterative additive update step. We also describe and analyze online gradient descent (GD) and exponentiated gradient (EG) algorithms for the symmetric logistic loss. A byproduct of our work is a new simple form of regularization for boosting-based classification and regression algorithms. Our regression framework also has implications on classification algorithms, namely, a new additive update boosting algorithm for classification. We demonstrate the merits of our algorithms in a series of experiments.
5 0.054974474 2 jmlr-2005-A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
Author: Hal Daumé III, Daniel Marcu
Abstract: We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics. Keywords: supervised clustering, record linkage, citation matching, coreference, Dirichlet process, non-parametric Bayesian
6 0.050558556 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata
7 0.041619465 47 jmlr-2005-Learning from Examples as an Inverse Problem
8 0.041376282 43 jmlr-2005-Learning Hidden Variable Networks: The Information Bottleneck Approach
9 0.040249754 20 jmlr-2005-Clustering with Bregman Divergences
10 0.037690639 11 jmlr-2005-Algorithmic Stability and Meta-Learning
11 0.037376415 32 jmlr-2005-Expectation Consistent Approximate Inference
12 0.036549944 41 jmlr-2005-Kernel Methods for Measuring Independence
13 0.036489882 4 jmlr-2005-A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
14 0.036014158 58 jmlr-2005-Multiclass Classification with Multi-Prototype Support Vector Machines
15 0.035081796 53 jmlr-2005-Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application
16 0.033917163 24 jmlr-2005-Core Vector Machines: Fast SVM Training on Very Large Data Sets
17 0.032469124 61 jmlr-2005-Prioritization Methods for Accelerating MDP Solvers
18 0.03030796 15 jmlr-2005-Asymptotic Model Selection for Naive Bayesian Networks
19 0.029195426 62 jmlr-2005-Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models
20 0.028911345 67 jmlr-2005-Stability of Randomized Learning Algorithms
topicId topicWeight
[(13, 0.011), (17, 0.015), (36, 0.015), (37, 0.021), (43, 0.023), (47, 0.015), (52, 0.05), (59, 0.015), (70, 0.642), (82, 0.015), (88, 0.032), (94, 0.02)]
simIndex simValue paperId paperTitle
1 0.91983289 1 jmlr-2005-A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes
Author: Marc Boullé
Abstract: In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In this paper, we propose a new grouping method MODL1 founded on a Bayesian approach. The method relies on a model space of grouping models and on a prior distribution defined on this model space. This results in an evaluation criterion of grouping, which is minimal for the most probable grouping given the data, i.e. the Bayes optimal grouping. We propose new super-linear optimization heuristics that yields near-optimal groupings. Extensive comparative experiments demonstrate that the MODL grouping method builds high quality groupings in terms of predictive quality, robustness and small number of groups. Keywords: data preparation, grouping, Bayesianism, model selection, classification, naïve Bayes 1
same-paper 2 0.90898389 59 jmlr-2005-New Horn Revision Algorithms
Author: Judy Goldsmith, Robert H. Sloan
Abstract: A revision algorithm is a learning algorithm that identifies the target concept, starting from an initial concept. Such an algorithm is considered efficient if its complexity (in terms of the measured resource) is polynomial in the syntactic distance between the initial and the target concept, but only polylogarithmic in the number of variables in the universe. We give efficient revision algorithms in the model of learning with equivalence and membership queries. The algorithms work in a general revision model where both deletion and addition revision operators are allowed. In this model one of the main open problems is the efficient revision of Horn formulas. Two revision algorithms are presented for special cases of this problem: for depth-1 acyclic Horn formulas, and for definite Horn formulas with unique heads. Keywords: theory revision, Horn formulas, query learning, exact learning, computational learning theory
3 0.84071058 14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification
Author: Malte Kuss, Carl Edward Rasmussen
Abstract: Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC
4 0.33791986 36 jmlr-2005-Gaussian Processes for Ordinal Regression
Author: Wei Chu, Zoubin Ghahramani
Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach. Keywords: Gaussian processes, ordinal regression, approximate Bayesian inference, collaborative filtering, gene expression analysis, feature selection
5 0.29972601 32 jmlr-2005-Expectation Consistent Approximate Inference
Author: Manfred Opper, Ole Winther
Abstract: We propose a novel framework for approximations to intractable probabilistic models which is based on a free energy formulation. The approximation can be understood as replacing an average over the original intractable distribution with a tractable one. It requires two tractable probability distributions which are made consistent on a set of moments and encode different features of the original intractable distribution. In this way we are able to use Gaussian approximations for models with discrete or bounded variables which allow us to include non-trivial correlations. These are neglected in many other methods. We test the framework on toy benchmark problems for binary variables on fully connected graphs and 2D grids and compare with other methods, such as loopy belief propagation. Good performance is already achieved by using single nodes as tractable substructures. Significant improvements are obtained when a spanning tree is used instead.
6 0.29456931 19 jmlr-2005-Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
7 0.28453216 71 jmlr-2005-Variational Message Passing
8 0.27891839 31 jmlr-2005-Estimation of Non-Normalized Statistical Models by Score Matching
9 0.26585299 8 jmlr-2005-Active Coevolutionary Learning of Deterministic Finite Automata
10 0.26460576 54 jmlr-2005-Managing Diversity in Regression Ensembles
11 0.26359171 63 jmlr-2005-Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial
12 0.25814146 65 jmlr-2005-Separating a Real-Life Nonlinear Image Mixture
13 0.24936157 51 jmlr-2005-Local Propagation in Conditional Gaussian Bayesian Networks
14 0.24380735 46 jmlr-2005-Learning a Mahalanobis Metric from Equivalence Constraints
15 0.24107333 68 jmlr-2005-Tree-Based Batch Mode Reinforcement Learning
16 0.24018642 49 jmlr-2005-Learning the Kernel with Hyperkernels (Kernel Machines Section)
17 0.23639768 7 jmlr-2005-A Unifying View of Sparse Approximate Gaussian Process Regression
18 0.23349082 69 jmlr-2005-Tutorial on Practical Prediction Theory for Classification
19 0.22951245 13 jmlr-2005-Analysis of Variance of Cross-Validation Estimators of the Generalization Error
20 0.22265129 52 jmlr-2005-Loopy Belief Propagation: Convergence and Effects of Message Errors