jmlr jmlr2013 jmlr2013-40 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
Reference: text
sentIndex sentText sentNum sentScore
1 The search space hence becomes a lattice structure with a partial generality order, with the bodyless clause as top element and bottom clause as bottom element. [sent-33, score-1.076]
2 This input-output specification implicitly defines a logical constraint on the chaining of literals as the clause is computed from left to right. [sent-35, score-0.696]
3 The mode declaration types are automatically handled by the bottom clause construction algorithm, but correct input-output chaining may be violated since candidates contain a subset of the bottom clause’s literals. [sent-45, score-0.97]
4 We shall refer to candidates that are correct with respect to their mode declarations as mode conformant. [sent-54, score-0.64]
5 We achieve mode conformance by representing the input-output logic given by the mode declarations as propositional clauses (Russell et al. [sent-55, score-0.853]
6 Constraints have been used in ILP before, although not to constrain the bottom clause literals. [sent-80, score-0.517]
7 Hence, each candidate can be represented as a bit string where bit bi signifies occurrence of body literal at position i or lack thereof. [sent-101, score-0.53]
8 Seen from a slightly different perspective, we represent the occurrence or non-occurrence of a literal at position i in the bottom clause by a propositional variable bi and ¬bi , respectively. [sent-102, score-1.07]
9 The propositional constraints correspond to the syntactic bias induced by the user generated mode declarations and the search lattice pruning. [sent-104, score-0.686]
10 Definition 1 Let B be a definite clause (a clause with a head). [sent-111, score-0.892]
11 A clause C is a candidate from B if and only if C’s head is the same as B’s head, and C’s body is a subsequence of B’s body. [sent-112, score-0.761]
12 Here, B is intended to be the bottom clause, and C a clause that is created by possibly removing body literals of B. [sent-113, score-0.816]
13 Example 1 With bottom clause B = h(X,Y ) ← q1 (X, Z), q2 (Z,Y ), q3 (X,Y, Z), the clause C = h(X,Y ) ← q1 (X, Z), q3 (X,Y, Z) is a candidate from B since they have the same head and (q1 (X, Z), q3 (X,Y, Z)) is a subsequence of (q1 (X, Z), q2 (Z,Y ), q3 (X,Y, Z)). [sent-117, score-1.202]
14 To create a one-to-one correspondence between first-order horn clauses and propositional formulae with respect to a bottom clause, we represent each literal of the bottom clause from left to right as propositional variables b1 , b2 , . [sent-122, score-1.424]
15 Given a clause C which has some of the body literals in B, the propositional formula for C is then the conjunction of all propositional variables in B with positive sign if they occur in C and negative sign otherwise. [sent-126, score-1.215]
16 The propositional formula B for C is PC = n li , where li = bi if C contains the ith literal in B, and li = ¬bi otherwise. [sent-128, score-0.571]
17 We write i=1 PC when there is no confusion about which bottom clause we are referring to. [sent-129, score-0.517]
18 3652 P ROGRAM S YNTHESIS IN ILP U SING C ONSTRAINT S ATISFACTION Definition 3 Let B be a clause with n body literals, F B a propositional formula containing a subset of the propositional variables {b1 , . [sent-140, score-0.972]
19 Definition 4 Let B be a clause with n body literals and F B a propositional formula. [sent-151, score-0.961]
20 2 Mode Declaration Constraints Intuitively, mode declarations show where the inputs and outputs of a literal are located, and which types it takes. [sent-171, score-0.616]
21 The bottom clause is then constructed using the mode declarations so that no literal is introduced until all its inputs have been instantiated by previous literals. [sent-172, score-1.194]
22 These aspects are being taken care of by the bottom clause construction algorithm. [sent-181, score-0.517]
23 This declares =/2 as a body literal (modeb) with a list on the left hand side as input, to obtain the head and tail of the list as outputs on the right hand side. [sent-194, score-0.558]
24 In conjunction with the modeh declaration above, we may now generate the mode conformant clause member(E, L) ← L = [E|T ], which is the proper base case definition of list membership. [sent-196, score-0.902]
25 The occurrence of L in the head is as input (given by the modeh declaration), so L is already instantiated in the body literal as required. [sent-197, score-0.613]
26 An example of a clause that is not mode conformant is member(E, L) ← X = [E|T ], since X is not an input type (it has not previously been instantiated). [sent-199, score-0.673]
27 Example 7 Assume we have computed bottom clause member(A, B) ← B = [C|D], member(A, D), member(C, B) from an example. [sent-203, score-0.517]
28 The clause member(A, B) ← member(A, D) is a candidate, albeit not a mode conformant one, since D is never instantiated before appearing in the body literal (as required by the third mode declaration). [sent-206, score-1.273]
29 We would need to include the bottom clause literal B = [C|D] first, as it is the only literal that would instantiate D. [sent-207, score-1.085]
30 2 M ODE D ECLARATIONS AS P ROPOSITIONAL C LAUSES A candidate is obtained from the bottom clause by taking a subsequence of the bottom clause’s body literals and copying the head. [sent-214, score-1.022]
31 Each body literal input must be instantiated from previous body literal outputs or from inputs to the head (inputs to the head are instantiated by the query). [sent-219, score-1.05]
32 The head outputs must be instantiated from head inputs or body literal outputs. [sent-221, score-0.629]
33 Definition 5 Let C be a clause with n body literals. [sent-222, score-0.522]
34 The clause D = h(X, Z) ← q1 (Y, Z) is output-instantiated since Z in the head grabs output from q1 (Y, Z), but not input-instantiated since Y in q1 (Y, Z) is not instantiated. [sent-231, score-0.55]
35 The clause E = h(X, Z) ← q1 (X,Y ), q2 (Y, Z) is both input- and output-instantiated, and hence mode conformant. [sent-232, score-0.625]
36 Lemma 6 Given mode declarations, a bottom clause is always input-instantiated but not necessarily output-instantiated. [sent-234, score-0.696]
37 Proof Input-instantiation is a direct consequence of the way in which the bottom clause is constructed: we only add body literals when all their inputs have been instantiated by the head or previous body literals. [sent-235, score-1.082]
38 To see that a bottom clause may not be output-instantiated, it is enough to consider a mode head declaration in which the output does not correspond to any body literal outputs: modeh(∗, h(+type1, −type2)), modeb(∗, b(+type1, −type1)). [sent-236, score-1.234]
39 With type definitions type1(a) ∧ type2(b), example h(a, b) and background knowledge b(a, a), we get the bottom clause B = h(X,Y ) ← b(X, X). [sent-237, score-0.517]
40 Since the bottom clause head is always present in a candidate, no constraints apply when an input can be caught from the head. [sent-245, score-0.717]
41 Definition 7 Let B be a bottom clause with n body literals. [sent-247, score-0.593]
42 Let Ii and Oi be the input and output variables of body literal i (as given by the mode declarations), respectively. [sent-248, score-0.539]
43 The mode constraints F of bottom clause B is a conjunction of clauses, constructed as follows: 1. [sent-250, score-0.812]
44 Note that if an output variable of B’s head cannot be instantiated by any body literals, F will contain the empty clause (generated by the second rule) and therefore be inconsistent. [sent-259, score-0.687]
45 Since our search lattice defines a subset order on the literals, where the top element is the empty clause and the bottom element is the bottom clause, the generality order is the subset order for body literals. [sent-269, score-0.706]
46 We write C ⊆B D, omitting B whenever the bottom clause is clear from context. [sent-272, score-0.517]
47 For any candidate X from B, let VX be all the propositional variables corresponding to the first-order body literals of B that occur in X. [sent-298, score-0.632]
48 4 Variable Splitting Constraints Atom’s and Progol’s bottom clause construction assumes that equivalent ground terms are related when lifting instances. [sent-319, score-0.517]
49 The solution, also taken by the Aleph ILP system (Srinivasan, 2001), is then to let the bottom clause represent these equality assumptions (variable splitting may also add many more literals to the bottom clause, but this will not affect our constraints). [sent-324, score-0.856]
50 Example 11 Let B be the bottom clause h(X,Y ) ← V = W, q2 (X,V ), q3 (Y,W ), q4 (X,Y ). [sent-339, score-0.517]
51 If C is a candidate from B containing the literal V = W , we require that both the variables V and W occur in C (other than as equalities generated from bottom clause construction). [sent-340, score-0.918]
52 V occurs in the second literal and W in the third, so our propositional constraints are {¬b1 , b2 } and {¬b1 , b3 }. [sent-341, score-0.596]
53 Definition 14 Let B be a bottom clause with n literals. [sent-355, score-0.517]
54 If the ith literal of B is declared a functional predicate, and none of its outputs occur in the head (as input or output), we define its corresponding functional constraint to be the propositional clause: / {¬bi } ∪ {bx : x ∈ {1, 2, . [sent-357, score-0.687]
55 Example 13 Elaborating more on Example 12, consider the bottom clause B = average(X,Y, A) ← S is X +Y, D is Y − X, A is D + X, A is S/2. [sent-365, score-0.517]
56 Intuitively, the constraint reflects the fact that if the second literal is used (D is Y − X), then D must appear elsewhere since otherwise the literal does nothing useful in functional terms. [sent-367, score-0.623]
57 NrSample: Constraint-Based Induction NrSample’s search algorithm is simple: after a bottom clause has been constructed, the mode constraints are generated. [sent-375, score-0.834]
58 The bottom clause construction algorithm is equivalent to Progol’s and is somewhat involved; we refer the reader to Muggleton (1995) for a detailed description. [sent-383, score-0.517]
59 In the following sections we focus on important details of SAT solving, search order, and maximum clause lengths (step 5), as well as efficient candidate evaluation (step 8). [sent-408, score-0.605]
60 By Definition 7, all mode declarations have at most one negative literal (that is, they are dual horn clauses). [sent-416, score-0.616]
61 They construct models by first selecting an unassigned literal and assigning it either to true or false (according to some selection strategy), then propagating the effects of this assignment to all propositional clauses in the database. [sent-434, score-0.612]
62 Propagation of an assignment bi = true works as follows: all clauses containing the literal bi are removed (since they are now covered), and all clauses containing its negation ¬bi will have that literal removed (since that literal is not covered). [sent-435, score-1.156]
63 Freedom to pick any unassigned literal and assign it either true or false corresponds to choosing a literal from the bottom clause and deciding whether to include it or not (recall that a propositional literal bi correspond to the ith first-order literal of the bottom clause). [sent-452, score-1.993]
64 This corresponds to choosing that the third and eighth literal of the bottom clause should be included in the candidate, whereas the first should not. [sent-455, score-0.801]
65 In bit string notation (0 signifies false and 1 true), with a bottom clause containing 3 literals, this sequence is: (000, 100, 010, 110, 001, 101, 011, 111). [sent-480, score-0.517]
66 That we can never have full control of search order traversal using our propositional constraints is clear, since the very reason for using them is to prevent certain candidates from being generated. [sent-494, score-0.537]
67 For example, Progol allows users to set a limit on the number of iterations in the bottom clause construction, as well as on the maximum number of candidates to be evaluated (per bottom clause). [sent-516, score-0.717]
68 These limits do not interfere with NrSample’s constraint framework, since bottom clause construction is separated from search. [sent-517, score-0.544]
69 For example, with a bottom clause containing n = 4 body literals, the following formulae are needed to restrict all its candidates to at most c = 2 of those: {¬b1 , ¬b2 , ¬b3 } ∧ {¬b1 , ¬b2 , ¬b4 } ∧ 6. [sent-520, score-0.756]
70 Moreover, we generalize this technique so that our algorithm stores a mapping of which literals were generated from what mode declarations during bottom clause construction, and then keep track of how many times a mode declaration has been used during SAT solving. [sent-528, score-1.325]
71 This enables us to specify upper bounds on the number of times a certain mode may be used in a candidate solution, a feature which is particularly useful to prevent large number of recursive literals in a predicate. [sent-529, score-0.547]
72 This is implemented by using an extended mode declaration modeb(Recall, Atom, Max_Occurs) in which the third argument Max_Occurs specifies the maximum number of times a literal generated from this mode can occur in any candidate solution. [sent-530, score-0.833]
73 Note that the bottom clause may still contain more occurrences, although not all of them may be used at the same time in a candidate. [sent-531, score-0.517]
74 For example, if the number of positive and negative examples covered is P and N respectively, and L is the number of literals in the body of the candidate, Progol’s A∗ computes the fitness as P − N − L − E, where E is the fewest literals needed to get an input-output connected clause. [sent-536, score-0.522]
75 Intuitively, this states that (consistent) candidates are first compared by coverage, and only when they cover equally much are they compared by number of literals (fewer literals is better). [sent-540, score-0.575]
76 This optimization is possible since we need not consider clause lengths when two clauses have different coverage. [sent-555, score-0.532]
77 Seen as bit strings (si )n , where 1s indicate that literal 1 bi from the bottom clause is used, we start with the candidate corresponding to bit string “11. [sent-570, score-0.971]
78 In other words, in each layer we start by generating the candidate containing all leftmost literals of the bottom clause (as many as the depth we are considering), and cycle through all permutations until we reach the candidate that has all right-most literals. [sent-577, score-0.974]
79 From a technical point of view, emulate_nrsample does not actually generate candidates as clauses, but rather, as a set representing which literals from the bottom clause each candidate is made up of. [sent-623, score-0.986]
80 During bottom clause construction, a maximum of i = 3 iterations are performed. [sent-692, score-0.517]
81 For each bottom clause constructed, a maximum of n = 1000 (valid) candidates are explored. [sent-693, score-0.646]
82 Aleph’s enumeration explicitly states that the invalid candidates are to be ignored, as the following quotation from the Aleph homepage shows: With these directives Aleph ensures that for any hypothesised clause of the form H:- B1, B2, . [sent-700, score-0.67]
83 Any input variable of type T in a body literal Bi appears as an output variable of type T in a body literal that appears before Bi, or appears as an input variable of type T in H. [sent-704, score-0.72]
84 NrSample also needs a limit to ensure tractability while solving its propositional constraints: we set a limit of 50000 literal assignments per model constructed (equivalently: for each constructed candidate). [sent-708, score-0.527]
85 When this limit is reached, the algorithm simply gives up the current bottom clause search, adds the example, and resumes sequential covering. [sent-709, score-0.517]
86 Conclusions We have provided a novel framework for non-redundant candidate construction in inductive logic programming (ILP), using propositional constraints relative to a bottom clause. [sent-1035, score-0.559]
87 3674 P ROGRAM S YNTHESIS IN ILP U SING C ONSTRAINT S ATISFACTION First, we show that the mode constraints only generate mode conformant candidates (soundness). [sent-1081, score-0.631]
88 All literals of PC are in B¬v , so PC is a conjunction of literals where each literal of Bv is negative. [sent-1112, score-0.75]
89 But by Definition 7, F contains the clause bi ∈Bv bi , so MC is not a model for F. [sent-1114, score-0.552]
90 Next, we show that the mode constraints can generate any mode conformant candidate (completeness). [sent-1115, score-0.619]
91 If bn does not appear in C, MC (bn ) = f alse and the clause is satisfied. [sent-1137, score-0.543]
92 If bn appears in C, we note that since C is mode conformant, v appears in a previous body literal or the head. [sent-1138, score-0.572]
93 If the clause is empty, B has no body literal that outputs v, so no candidate from B is mode conformant either, contradicting our assumption that C is mode conformant. [sent-1143, score-1.329]
94 They are used during bottom clause construction, and are thus not specific to our benchmarked algorithms. [sent-1256, score-0.517]
95 Both operators are intended to prevent certain ground truths from occurring in the bottom clause (or their corresponding lifted instances). [sent-1257, score-0.545]
96 Example 17 The clause ‘prevent p(X, X)’ ensures that no bottom clause of the form p(X, X) occurs, where matching is done so that X unifies with ground terms. [sent-1261, score-0.963]
97 As a matter of technicality, when using lifted bottom clauses, that is, with mode declarations, variables in the bottom clause must be treated as ground terms. [sent-1264, score-0.767]
98 For example, the prevent rule X = X should not match with bottom clause literal Y = 0, as what we have in mind is to prevent trivial unifications of identical terms. [sent-1265, score-0.857]
99 By treating unlifted bottom clause literals (that is, ground truths), this problem does not arise. [sent-1266, score-0.74]
100 3679 A HLGREN AND Y UEN Example 19 The clause ‘p(X,Y ) prevents q(Y, X)’ ensures that the literal q(Y, X) does not occur if p(X,Y ) occurs as a previous literal in the bottom clause. [sent-1270, score-1.085]
wordName wordTfidf (topN-words)
[('clause', 0.446), ('modeb', 0.35), ('nrsample', 0.302), ('literal', 0.284), ('literals', 0.223), ('propositional', 0.216), ('mode', 0.179), ('ilp', 0.175), ('progol', 0.158), ('clist', 0.153), ('declarations', 0.153), ('sat', 0.14), ('candidates', 0.129), ('candidate', 0.117), ('head', 0.104), ('constraints', 0.096), ('modeh', 0.088), ('sublist', 0.088), ('clauses', 0.086), ('nrsfun', 0.083), ('append', 0.082), ('predicates', 0.079), ('body', 0.076), ('declaration', 0.074), ('hlgren', 0.074), ('uen', 0.074), ('enumeration', 0.071), ('bottom', 0.071), ('atisfaction', 0.07), ('rogram', 0.07), ('alse', 0.064), ('member', 0.061), ('wlist', 0.061), ('instantiated', 0.061), ('ynthesis', 0.06), ('synthesis', 0.057), ('aleph', 0.057), ('snum', 0.057), ('onstraint', 0.054), ('bi', 0.053), ('const', 0.049), ('conformant', 0.048), ('list', 0.047), ('pc', 0.045), ('backtracking', 0.044), ('muggleton', 0.044), ('sing', 0.042), ('search', 0.042), ('reverse', 0.042), ('logic', 0.04), ('pruning', 0.039), ('predicate', 0.037), ('chaff', 0.035), ('nrs', 0.035), ('animal', 0.035), ('formulae', 0.034), ('bn', 0.033), ('inconsistent', 0.033), ('solver', 0.033), ('car', 0.031), ('nimals', 0.031), ('rammar', 0.031), ('mc', 0.028), ('primitive', 0.028), ('functional', 0.028), ('prevent', 0.028), ('lists', 0.028), ('assignments', 0.027), ('constraint', 0.027), ('traversal', 0.026), ('mg', 0.026), ('program', 0.026), ('assignment', 0.026), ('add', 0.025), ('ih', 0.025), ('coverage', 0.025), ('execution', 0.024), ('satisfaction', 0.024), ('invalid', 0.024), ('rain', 0.022), ('specializations', 0.022), ('ember', 0.022), ('logically', 0.022), ('moskewicz', 0.022), ('ppend', 0.022), ('raedt', 0.022), ('sorted', 0.022), ('benchmarks', 0.02), ('substantially', 0.02), ('dd', 0.02), ('splitting', 0.02), ('conjunction', 0.02), ('inductive', 0.019), ('ms', 0.019), ('generates', 0.019), ('retrieve', 0.019), ('formula', 0.018), ('subsequence', 0.018), ('solvers', 0.018), ('faster', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
2 0.063708872 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
3 0.031099176 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
Author: Joachim Niehren, Jérôme Champavère, Aurélien Lemay, Rémi Gilleron
Abstract: Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction. Keywords: XML information extraction, XML schemas, interactive learning, tree automata, grammatical inference
4 0.026295977 90 jmlr-2013-Quasi-Newton Method: A New Direction
Author: Philipp Hennig, Martin Kiefel
Abstract: Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors. Keywords: optimization, numerical analysis, probability, Gaussian processes
5 0.022575961 68 jmlr-2013-Machine Learning with Operational Costs
Author: Theja Tulabandhula, Cynthia Rudin
Abstract: This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm’s objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization. Keywords: statistical learning theory, optimization, covering numbers, decision theory
6 0.0225577 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
7 0.022531491 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
8 0.021061117 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
9 0.020265767 22 jmlr-2013-Classifying With Confidence From Incomplete Information
10 0.019004973 120 jmlr-2013-Variational Algorithms for Marginal MAP
11 0.017040674 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
12 0.015750503 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
13 0.015181415 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
14 0.014462685 21 jmlr-2013-Classifier Selection using the Predicate Depth
15 0.014140562 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
16 0.014094248 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
17 0.014058828 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition
18 0.013185011 8 jmlr-2013-A Theory of Multiclass Boosting
19 0.013095199 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
20 0.01276102 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
topicId topicWeight
[(0, -0.072), (1, 0.008), (2, -0.022), (3, 0.015), (4, 0.016), (5, 0.038), (6, 0.027), (7, 0.024), (8, -0.095), (9, 0.056), (10, 0.009), (11, -0.11), (12, -0.015), (13, 0.052), (14, -0.108), (15, -0.025), (16, -0.019), (17, 0.028), (18, -0.05), (19, 0.071), (20, -0.032), (21, -0.076), (22, -0.059), (23, 0.05), (24, 0.054), (25, -0.122), (26, -0.111), (27, -0.054), (28, -0.076), (29, -0.031), (30, -0.002), (31, 0.069), (32, -0.1), (33, -0.061), (34, 0.023), (35, 0.025), (36, 0.119), (37, -0.069), (38, -0.162), (39, 0.073), (40, 0.111), (41, -0.212), (42, -0.4), (43, -0.074), (44, -0.159), (45, -0.156), (46, 0.114), (47, -0.055), (48, -0.07), (49, -0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.97226965 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
2 0.32200766 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies
Author: Joachim Niehren, Jérôme Champavère, Aurélien Lemay, Rémi Gilleron
Abstract: Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction. Keywords: XML information extraction, XML schemas, interactive learning, tree automata, grammatical inference
3 0.32164872 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
Author: Rami Mahdi, Jason Mezey
Abstract: Constraint-based learning of Bayesian networks (BN) from limited data can lead to multiple testing problems when recovering dense areas of the skeleton and to conflicting results in the orientation of edges. In this paper, we present a new constraint-based algorithm, light mutual min (LMM) for improved accuracy of BN learning from small sample data. LMM improves the assessment of candidate edges by using a ranking criterion that considers conditional independence on neighboring variables at both sides of an edge simultaneously. The algorithm also employs an adaptive relaxation of constraints that, selectively, allows some nodes not to condition on some neighbors. This relaxation aims at reducing the incorrect rejection of true edges connecting high degree nodes due to multiple testing. LMM additionally incorporates a new criterion for ranking v-structures that is used to recover the completed partially directed acyclic graph (CPDAG) and to resolve conflicting v-structures, a common problem in small sample constraint-based learning. Using simulated data, each of these components of LMM is shown to significantly improve network inference compared to commonly applied methods when learning from limited data, including more accurate recovery of skeletons and CPDAGs compared to the PC, MaxMin, and MaxMin hill climbing algorithms. A proof of asymptotic correctness is also provided for LMM for recovering the correct skeleton and CPDAG. Keywords: Bayesian networks, skeleton, constraint-based learning, mutual min
4 0.30033439 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
Author: Alexander Clark
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold’s identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars. Keywords: context-free grammars, grammatical inference, identification in the limit, structure learning
5 0.29154873 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
Author: Vinod K. Valsalam, Risto Miikkulainen
Abstract: Sorting networks are an interesting class of parallel sorting algorithms with applications in multiprocessor computers and switching networks. They are built by cascading a series of comparisonexchange units called comparators. Minimizing the number of comparators for a given number of inputs is a challenging optimization problem. This paper presents a two-pronged approach called Symmetry and Evolution based Network Sort Optimization (SENSO) that makes it possible to scale the solutions to networks with a larger number of inputs than previously possible. First, it uses the symmetry of the problem to decompose the minimization goal into subgoals that are easier to solve. Second, it minimizes the resulting greedy solutions further by using an evolutionary algorithm to learn the statistical distribution of comparators in minimal networks. The final solutions improve upon half-century of results published in patents, books, and peer-reviewed literature, demonstrating the potential of the SENSO approach for solving difficult combinatorial problems. Keywords: symmetry, evolution, estimation of distribution algorithms, sorting networks, combinatorial optimization
6 0.28064778 78 jmlr-2013-On the Learnability of Shuffle Ideals
7 0.26362166 90 jmlr-2013-Quasi-Newton Method: A New Direction
8 0.21086973 21 jmlr-2013-Classifier Selection using the Predicate Depth
9 0.20137592 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
10 0.19203626 22 jmlr-2013-Classifying With Confidence From Incomplete Information
11 0.18655971 1 jmlr-2013-AC++Template-Based Reinforcement Learning Library: Fitting the Code to the Mathematics
13 0.15723875 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning
14 0.15290149 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
15 0.15196773 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos
16 0.14823832 68 jmlr-2013-Machine Learning with Operational Costs
17 0.14078641 120 jmlr-2013-Variational Algorithms for Marginal MAP
18 0.12417327 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
19 0.11637575 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
20 0.11614971 7 jmlr-2013-A Risk Comparison of Ordinary Least Squares vs Ridge Regression
topicId topicWeight
[(0, 0.028), (5, 0.077), (6, 0.023), (9, 0.566), (10, 0.034), (20, 0.026), (23, 0.025), (41, 0.013), (68, 0.017), (70, 0.017), (75, 0.031), (85, 0.01), (87, 0.012), (89, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.81397438 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
Author: John Ahlgren, Shiu Yin Yuen
Abstract: We present NrSample, a framework for program synthesis in inductive logic programming. NrSample uses propositional logic constraints to exclude undesirable candidates from the search. This is achieved by representing constraints as propositional formulae and solving the associated constraint satisfaction problem. We present a variety of such constraints: pruning, input-output, functional (arithmetic), and variable splitting. NrSample is also capable of detecting search space exhaustion, leading to further speedups in clause induction and optimality. We benchmark NrSample against enumeration search (Aleph’s default) and Progol’s A∗ search in the context of program synthesis. The results show that, on large program synthesis problems, NrSample induces between 1 and 1358 times faster than enumeration (236 times faster on average), always with similar or better accuracy. Compared to Progol A∗ , NrSample is 18 times faster on average with similar or better accuracy except for two problems: one in which Progol A∗ substantially sacrificed accuracy to induce faster, and one in which Progol A∗ was a clear winner. Functional constraints provide a speedup of up to 53 times (21 times on average) with similar or better accuracy. We also benchmark using a few concept learning (non-program synthesis) problems. The results indicate that without strong constraints, the overhead of solving constraints is not compensated for. Keywords: inductive logic programming, program synthesis, theory induction, constraint satisfaction, Boolean satisfiability problem
2 0.81341636 78 jmlr-2013-On the Learnability of Shuffle Ideals
Author: Dana Angluin, James Aspnes, Sarah Eisenstat, Aryeh Kontorovich
Abstract: PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution. Keywords: PAC learning, statistical queries, regular languages, deterministic finite automata, shuffle ideals, subsequences
3 0.22900373 63 jmlr-2013-Learning Trees from Strings: A Strong Learning Algorithm for some Context-Free Grammars
Author: Alexander Clark
Abstract: Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold’s identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars. Keywords: context-free grammars, grammatical inference, identification in the limit, structure learning
4 0.20885584 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components
Author: Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
Abstract: The principal components analysis (PCA) algorithm is a standard tool for identifying good lowdimensional approximations to high-dimensional data. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method. Keywords: differential privacy, principal components analysis, dimension reduction
5 0.19913828 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
Author: Philip M. Long, Rocco A. Servedio
Abstract: We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin halfspace, based on an algorithm of Nesterov’s that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using ˜ n · poly(1/γ) processors and running in time O(1/γ) + O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions. Keywords: PAC learning, parallel learning algorithms, halfspace learning, linear classifiers
6 0.19735359 37 jmlr-2013-Divvy: Fast and Intuitive Exploratory Data Analysis
7 0.1969472 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
9 0.17656763 51 jmlr-2013-Greedy Sparsity-Constrained Optimization
10 0.17394154 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
11 0.17373048 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
13 0.16999362 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
15 0.16967435 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
16 0.16961303 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
17 0.16845538 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
18 0.16822059 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference
19 0.16819017 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood
20 0.16740231 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut