nips nips2012 nips2012-267 knowledge-graph by maker-knowledge-mining

267 nips-2012-Perceptron Learning of SAT


Source: pdf

Author: Alex Flint, Matthew Blaschko

Abstract: Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. [sent-8, score-0.166]

2 In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. [sent-11, score-0.371]

3 Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. [sent-12, score-0.417]

4 We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. [sent-14, score-0.172]

5 In this work, we explore the application of a perceptron inspired learning algorithm applied to branching heuristics in the Davis-Putnam-Logemann-Loveland algorithm [8, 7]. [sent-21, score-0.252]

6 The Davis-Putnam-Logemann-Loveland (DPLL) algorithm formulates SAT as a search problem, resulting in a valuation of variables that satisfies the sentence, or a tree resolution refutation proof indicating that the sentence is not satisfiable. [sent-22, score-0.404]

7 The branching rule in this depth-first search procedure is a key determinant of the efficiency of the algorithm, and numerous heuristics have been proposed in the SAT literature [15, 16, 26, 18, 13]. [sent-23, score-0.278]

8 Inspired by the recent framing of learning as search optimization [6], we explore here the application of a perceptron inspired learning rule to application specific samples of the SAT problem. [sent-24, score-0.172]

9 Ruml applied reinforcement learning to find valuations of satisfiable sentences [25]. [sent-27, score-0.218]

10 A similar approach to learning heuristics for search was explored in [12]. [sent-34, score-0.154]

11 2 Theorem Proving as a Search Problem The SAT problem [5] is to determine whether a sentence Ω in propositional logic is satisfiable. [sent-35, score-0.304]

12 A literal p is a proposition of the form q (a “positive literal”) or ¬q (a “negative literal”). [sent-38, score-0.337]

13 A clause ωk is a disjunction of nk literals, p1 ∨ p2 ∨ · · · ∨ pnk . [sent-39, score-0.215]

14 A sentence Ω in conjunctive normal form (CNF) [15] is a conjunction of m clauses, ω1 ∧ω2 ∧· · ·∧ωm . [sent-41, score-0.175]

15 A sentence Ω is satisfiable iff there exists a valuation under which Ω is true. [sent-44, score-0.3]

16 All sentences in propositional logic can be transformed to CNF [15]. [sent-46, score-0.295]

17 [7] proposed a simple procedure for recognising satisfiabile CNF sentences on N variables. [sent-49, score-0.192]

18 Their algorithm is essentially a depth first search over all possible 2N valuations over the input sentence, with specialized criteria to prune the search and transformation rules to simplify the sentence. [sent-50, score-0.2]

19 PickBranch applies a heuristic to choose a literal in Ω. [sent-53, score-0.37]

20 Much recent work has focussed on choosing heuristics for the selection of branching literals since good heuristics have been empirically shown to reduce processing time by several orders of magnitude [28, 16, 13]. [sent-55, score-0.376]

21 In this paper we learn heuristics by optimizing over a family of the form, argmaxp f (x, p) where x is a node in the search tree, p is a candidate literal, and f is a priority function mapping possible branches to real numbers. [sent-56, score-0.272]

22 The state x will contain at least a CNF sentence and possibly pointers to ancestor nodes or statistics of the local search region. [sent-57, score-0.308]

23 Given this relaxed notion of the search state, we are unaware of any branching heuristics in the literature that cannot be expressed in this form. [sent-58, score-0.275]

24 3 Perceptron Learning of SAT We propose to learn f from a sequence of sentences drawn from some distribution determined by a given application. [sent-60, score-0.166]

25 We identify f with an element of a Hilbert space, H, the properties of which 2 are determined by a set of statistics polynomial time computable from a SAT instance, Ω. [sent-61, score-0.172]

26 We use xj to denote a node that is visited in the application of the DPLL algorithm, and φi (xj ) to denote the feature map associated with instantiating literal pi . [sent-63, score-0.478]

27 We define yij to be +1 if the instantiation of pi at xj leads to the shortest possible proof, and −1 otherwise. [sent-66, score-0.145]

28 Our learning procedure therefore will ideally learn a setting of f that only instantiates literals for which yij is +1. [sent-67, score-0.209]

29 We do note, however, that the DPLL algorithm is a depth–first search over literal valuations. [sent-73, score-0.411]

30 Furthermore, for satisfiable sentences the length of the shortest proof is bounded by the number of variables. [sent-74, score-0.193]

31 Consequently, in this case, all nodes visited on a branch of the search tree that resolved to unsatisfiable have yij = −1 and the nodes on the branch leading to satisfiable have yij = +1. [sent-75, score-0.606]

32 We may run the DPLL algorithm with a current setting of f and if the sentence is satisfiable, update f using the inferred yij . [sent-76, score-0.289]

33 This learning framework is capable of computing in polynomial time valuations of satisfiable sentences in the following sense. [sent-77, score-0.34]

34 Theorem 1 ∃ a polynomial time computable φ with γ > 0 ⇐⇒ Ω belongs to a subset of satisfiable sentences for which there exists a polynomial time algorithm to find a valid valuation. [sent-78, score-0.46]

35 Proof Necessity is shown by noting that the argmax in each step of the DPLL algorithm is computable in time polynomial in the sentence length by computing φ for all literals, and that there exists a setting of f such that there will be at most a number of steps equal to the number of variables. [sent-79, score-0.347]

36 Sufficiency is shown by noting that we may run the polynomial algorithm to find a valid valuation and use that valuation to construct a feature space with γ ≥ 0 in polynomial time. [sent-80, score-0.486]

37 Concretely, choose a canonical ordering of literals indexed by i and let φi (xj ) be a scalar. [sent-81, score-0.142]

38 Set φi (xj ) = +i if literal pi is instantiated in the solution found by the polynomial algorithm, −1 otherwise. [sent-82, score-0.459]

39 2 Corollary 1 ∃ polynomial time computable feature space with γ > 0 for SAT ⇐⇒ P = N P Proof If P = N P there is a polynomial time solution to SAT, meaning that there is a polynomial time solution to finding valuations satisfiable sentences. [sent-84, score-0.502]

40 2 While Theorem 1 is positive for finding variable settings that satisfy sentences, unsatisfiable sentences remain problematic when we are unsure that there exists γ > 0 or if we have an incorrect setting of f . [sent-87, score-0.191]

41 We are unaware of an efficient method to determine all yij for visited nodes in proofs of unsatisfiable sentences. [sent-88, score-0.205]

42 However, we expect that similar substructures will exist in satisfiable and unsatisfiable sentences resulting from the same application. [sent-89, score-0.166]

43 Early iterations of our learning algorithm will mistakenly explore branches of the search tree for satisfiable sentences and these branches will share important characteristics with inefficient branches of proofs of unsatisfiability. [sent-90, score-0.408]

44 The positive nodes with a score less than T are averaged, as are negative nodes with a score greater than T . [sent-97, score-0.212]

45 2 Davis-Putnam-Logemann-Loveland Stochastic Gradient We use a modified perceptron style update based on the learning as search optimization framework proposed in [6]. [sent-102, score-0.169]

46 We know that nodes on a path to a valuation that satisfies the sentence have positive labels, and those nodes that require backtracking have negative labels (Figure 1). [sent-105, score-0.451]

47 If the sentence is satisfiable, we may compute a DPLL stochastic gradient, DPLL , and update f . [sent-106, score-0.197]

48 We define two sets of nodes, S+ and S− , such that all nodes in S+ have positive label and lower score than all nodes in S− (Figure 2). [sent-107, score-0.163]

49 In this work, we have used the sufficient condition of defining these sets by setting a score threshold, T , such that fk (φi (xj )) < T ∀(i, j) ∈ S+ , fk (φi (xj )) > T ∀(i, j) ∈ S− , and |S+ | × |S− | is maximized. [sent-108, score-0.23]

50 Considering the kth update, fk+1 2 = fk − DPLL 2 = fk 2 − 2 fk , DPLL + DPLL 2 ≤ fk 2 + 0 + R2 . [sent-114, score-0.42]

51 (4) We note that it is the case that fk , DPLL ≥ 0 for any selection of training examples such that the average of the negative examples score higher than the average of the positive examples generated by running a DPLL search. [sent-115, score-0.207]

52 It is possible that some negative examples with lower scores than the some positive nodes will be visited during the depth first search of the DPLL algorithm, but we are guaranteed that at least one of them will have higher score. [sent-116, score-0.219]

53 Features are computed as a function of a sentence Ω and a literal p. [sent-119, score-0.512]

54 We next obtain a lower bound on u, fk+1 = u, fk − u, DPLL ≥ u, fk + γ. [sent-125, score-0.21]

55 That − u, DPLL ≥ γ follows from the fact that the means of the positive and negative training examples lie in the convex hull of the positive and negative sets, respectively, and that u achieves a margin of γ. [sent-126, score-0.185]

56 Recall that each node xj consists of a CNF sentence Ω together with a valuation for zero or more variables. [sent-135, score-0.354]

57 Our feature function φ(x, p) maps a node x and a candidate branching literal p to a real vector φ. [sent-136, score-0.492]

58 Many heuristics involve counting occurences of literals and variables. [sent-137, score-0.329]

59 For notational convenience let C(p) be the number of occurences of p in Ω and let Ck (p) be the number of occurences of p among clauses of size k. [sent-138, score-0.539]

60 1 Relationship to previous branching heuristics Many branching heuristics have been proposed in the SAT literature [28, 13, 18, 26]. [sent-141, score-0.358]

61 Silva [26] suggested two simple heuristics based directly on literal counts. [sent-144, score-0.417]

62 The first was to always branch on the literal that maximizes C(p) and the second was to maximize C(p) + C(¬p). [sent-145, score-0.424]

63 Freeman [13] proposed a heuristic that identified the size of the smallest unsatisfied clause, m = min |ω|, ω ∈ Ω, and then identified the literal appearing most frequently amongst clauses of size m. [sent-148, score-0.645]

64 Bohm [3] proposed a heuristic that selects the literal maximizing α max Ck (p, xj ), Ck (¬p, xj ) + β min Ck (p, xj ), Ck (¬p, xj ) , 5 (5) with k = 2, or in the case of a tie, with k = 3 (and so on until all ties are broken). [sent-151, score-0.582]

65 Jerosolow and Wang [18] proposed a voting scheme in which clauses vote for their components with weight 2−k , where k is the length of the clause. [sent-154, score-0.275]

66 The total votes for a literal p is J(p) = 2−|ω| (6) where the sum is over clauses ω that contain p. [sent-155, score-0.612]

67 Many modern SAT solvers use boolean constraint propagation (BCP) to speed up the search process [23]. [sent-160, score-0.163]

68 One component of BCP generates new clauses as a result of conflicts encountered during the search. [sent-161, score-0.275]

69 Several modern SAT solvers use the time since a variable was last added to a conflict clause to measure the “activity” of that variable . [sent-162, score-0.212]

70 Empirically, resolving variables that have most recently appeared in conflict clauses results in an efficient search[14]. [sent-163, score-0.275]

71 After each decision we update the most–recent– activity table T (p) := t for each p added to a conflict clause during that iteration. [sent-166, score-0.281]

72 1 Horn A Horn clause [4] is a disjunction containing at most one positive literal, ¬q1 ∨ · · · ∨ ¬qk−1 ∨ qk . [sent-173, score-0.24]

73 A sentence Ω is a Horn formula iff it is a conjunction of Horn clauses. [sent-174, score-0.196]

74 If there are no unit clauses in Ω then Ω is trivially satisfiable by setting all variables to false. [sent-177, score-0.33]

75 Delete any clause from Ω that contains p and remove ¬p wherever it appears. [sent-179, score-0.191]

76 Theorem 3 There is a margin for Horn clauses in our feature space. [sent-181, score-0.358]

77 Proof We will show that there is a margin for Horn clauses in our feature space by showing that for a particular priority function f0 , our algorithm will emulate the unit propagation algorithm above. [sent-182, score-0.522]

78 Consider a node x and let Ω be the input sentence Ω0 simplified according to the (perhaps partial) valuation at x. [sent-185, score-0.301]

79 If Ω contains no unit clauses then clearly φ(x, p), f0 will be maximized for a negative literal p = ¬q. [sent-186, score-0.696]

80 If Ω does contain unit clauses then for literals p which appear in unit clauses we have φ(x, p), f0 ≥ 1, while for all other literals we have φ(x, p), f0 < 1. [sent-187, score-0.894]

81 Therefore H will select a unit literal if Ω contains one. [sent-188, score-0.392]

82 First note that every sentence encountered contains at least one unit clause, since, if not, that sentence would be trivially satisfiable by setting all variables to false and this would contradict the assumption that Ω is unsatisfiable. [sent-191, score-0.405]

83 So at each node x, the algorithm will first branch on some unit clause p, then later will back–track to x and branch on ¬p. [sent-192, score-0.442]

84 But since p appears in a unit clause at x this will immediately generate a contradiction and no further nodes will be expanded along that path. [sent-193, score-0.331]

85 2 1 For concreteness let = 1 K+1 where K is the length of the input sentence Ω 6 (a) Performance for planar graph colouring (b) Performance for hardware verification Figure 4: Results for our algorithm applied to (a) planar graph colouring; (b) hardware verification. [sent-195, score-0.501]

86 In figure (a) we report the mistake rate on the current training example since no training example is ever repeated, while in figure (b) it is computed on a seperate validation set (see figure 5). [sent-197, score-0.173]

87 2 2–CNF A 2–CNF sentence is a CNF sentence in which every clause contains exactly two literals. [sent-200, score-0.541]

88 In this section we show that a function exists in our feature space for recognising satisfiable 2–CNF sentences in polynomial time. [sent-201, score-0.348]

89 If there are no unit clauses in Ω then pick any literal and add it to Ω. [sent-204, score-0.667]

90 Otherwise, let {p} be a unit clause in Ω and apply unit propagation to p as described in the previous section. [sent-205, score-0.326]

91 If a contradiction is generated then back–track to the last branch and negate the literal added there. [sent-206, score-0.45]

92 Theorem 4 Under our feature space, H contains a priority function that recognizes 2–SAT sentences in polynomial time. [sent-210, score-0.37]

93 When using this weight vector, our algorithm will branch on a unit literal whenever one is present. [sent-213, score-0.479]

94 2 6 Empirical Results Planar Graph Colouring: We applied our algorithm on the problem of planar graph colouring, for which polynomial time algorithms are known [1]. [sent-216, score-0.183]

95 Due to the large size of these problems we extended an existing high–performance SAT solver, minisat [10], replacing its decision heuristic with our perceptron strategy. [sent-237, score-0.206]

96 Planar graph colouring is a known polynomial time computable problem, but it is difficult to characterize theoretically and an automated theorem prover was employed in the proof of polynomial solvability. [sent-244, score-0.458]

97 In this work, we argued that learning on positive examples is sufficient if the subset of SAT sentences generated by our application has a positive margin. [sent-248, score-0.216]

98 Additionally, we may consider updates to f during a run of the DPLL algorithm when the algorithm backtracks from a branch of the search tree for which we can prove that all yij = −1. [sent-251, score-0.305]

99 Linear-time algorithms for testing the satisfiability of propositional horn formulae. [sent-317, score-0.149]

100 The impact of branching heuristics in propositional satisfiability algorithms. [sent-419, score-0.263]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sat', 0.492), ('dpll', 0.382), ('literal', 0.337), ('clauses', 0.275), ('unsatis', 0.206), ('clause', 0.191), ('sentence', 0.175), ('sentences', 0.166), ('occurences', 0.132), ('cnf', 0.131), ('polynomial', 0.122), ('literals', 0.117), ('fk', 0.105), ('valuation', 0.104), ('branching', 0.099), ('yij', 0.092), ('colouring', 0.088), ('branch', 0.087), ('propositional', 0.084), ('heuristics', 0.08), ('satis', 0.078), ('search', 0.074), ('jeroslow', 0.073), ('minisat', 0.073), ('perceptron', 0.073), ('horn', 0.065), ('mistake', 0.064), ('planar', 0.061), ('nodes', 0.059), ('hardware', 0.058), ('unit', 0.055), ('xj', 0.053), ('valuations', 0.052), ('computable', 0.05), ('margin', 0.049), ('branches', 0.048), ('priority', 0.048), ('able', 0.047), ('logic', 0.045), ('veri', 0.043), ('boolean', 0.043), ('activity', 0.041), ('emulate', 0.036), ('ict', 0.034), ('davis', 0.034), ('feature', 0.034), ('heuristic', 0.033), ('ck', 0.032), ('visited', 0.032), ('ci', 0.031), ('validation', 0.031), ('instances', 0.03), ('bcp', 0.029), ('hooker', 0.029), ('logemann', 0.029), ('pickbranch', 0.029), ('unitpropagate', 0.029), ('automated', 0.029), ('negative', 0.029), ('updates', 0.028), ('training', 0.028), ('checking', 0.028), ('proof', 0.027), ('decision', 0.027), ('contradiction', 0.026), ('neighbour', 0.026), ('existance', 0.026), ('emulates', 0.026), ('recognising', 0.026), ('hoos', 0.026), ('hutter', 0.026), ('canonical', 0.025), ('propagation', 0.025), ('rule', 0.025), ('positive', 0.025), ('cue', 0.024), ('colours', 0.024), ('negated', 0.024), ('symbolic', 0.024), ('disjunction', 0.024), ('jw', 0.024), ('tree', 0.024), ('xl', 0.023), ('unaware', 0.022), ('node', 0.022), ('ever', 0.022), ('competition', 0.022), ('gure', 0.022), ('update', 0.022), ('solver', 0.022), ('competitions', 0.021), ('ft', 0.021), ('iff', 0.021), ('solvers', 0.021), ('broken', 0.02), ('back', 0.02), ('score', 0.02), ('theorem', 0.02), ('ability', 0.02), ('tracks', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 267 nips-2012-Perceptron Learning of SAT

Author: Alex Flint, Matthew Blaschko

Abstract: Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task. 1

2 0.089338601 322 nips-2012-Spiking and saturating dendrites differentially expand single neuron computation capacity

Author: Romain Cazé, Mark Humphries, Boris S. Gutkin

Abstract: The integration of excitatory inputs in dendrites is non-linear: multiple excitatory inputs can produce a local depolarization departing from the arithmetic sum of each input’s response taken separately. If this depolarization is bigger than the arithmetic sum, the dendrite is spiking; if the depolarization is smaller, the dendrite is saturating. Decomposing a dendritic tree into independent dendritic spiking units greatly extends its computational capacity, as the neuron then maps onto a two layer neural network, enabling it to compute linearly non-separable Boolean functions (lnBFs). How can these lnBFs be implemented by dendritic architectures in practise? And can saturating dendrites equally expand computational capacity? To address these questions we use a binary neuron model and Boolean algebra. First, we confirm that spiking dendrites enable a neuron to compute lnBFs using an architecture based on the disjunctive normal form (DNF). Second, we prove that saturating dendrites as well as spiking dendrites enable a neuron to compute lnBFs using an architecture based on the conjunctive normal form (CNF). Contrary to a DNF-based architecture, in a CNF-based architecture, dendritic unit tunings do not imply the neuron tuning, as has been observed experimentally. Third, we show that one cannot use a DNF-based architecture with saturating dendrites. Consequently, we show that an important family of lnBFs implemented with a CNF-architecture can require an exponential number of saturating dendritic units, whereas the same family implemented with either a DNF-architecture or a CNF-architecture always require a linear number of spiking dendritic units. This minimization could explain why a neuron spends energetic resources to make its dendrites spike. 1

3 0.073344052 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Author: Aharon Birnbaum, Shai S. Shwartz

Abstract: Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L∗ + ϵ, where L∗ is the γ γ optimal γ-margin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hinge-loss. For α = 0, Shalev-Shwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in ˜ time exp(O(1/γ)). An immediate question, which this paper tackles, is what is achievable if α ∈ (0, 1/γ). We derive positive results interpolating between the polynomial time for α = 1/γ and the exponential time for α = 0. In particular, we show that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time. Our results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. 1

4 0.056271885 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Author: Michael Collins, Shay B. Cohen

Abstract: We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance. 1

5 0.055814255 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

6 0.047895052 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

7 0.046917528 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

8 0.046628203 200 nips-2012-Local Supervised Learning through Space Partitioning

9 0.046083573 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

10 0.045480147 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

11 0.045180667 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

12 0.042382862 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

13 0.04130549 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

14 0.04043521 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

15 0.036335886 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

16 0.036163781 69 nips-2012-Clustering Sparse Graphs

17 0.034204893 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

18 0.034063477 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

19 0.033085451 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

20 0.032974415 16 nips-2012-A Polynomial-time Form of Robust Regression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.108), (1, -0.008), (2, -0.003), (3, -0.026), (4, 0.009), (5, -0.007), (6, -0.014), (7, 0.011), (8, -0.036), (9, 0.028), (10, -0.002), (11, 0.006), (12, 0.045), (13, 0.017), (14, -0.017), (15, -0.023), (16, -0.04), (17, -0.0), (18, 0.017), (19, 0.018), (20, -0.023), (21, 0.022), (22, -0.002), (23, -0.012), (24, -0.055), (25, 0.025), (26, 0.023), (27, -0.059), (28, 0.008), (29, 0.059), (30, 0.035), (31, -0.005), (32, 0.014), (33, 0.043), (34, -0.046), (35, -0.009), (36, -0.09), (37, -0.057), (38, -0.031), (39, 0.004), (40, -0.082), (41, -0.034), (42, 0.036), (43, 0.003), (44, -0.047), (45, -0.082), (46, -0.006), (47, 0.019), (48, 0.045), (49, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85277075 267 nips-2012-Perceptron Learning of SAT

Author: Alex Flint, Matthew Blaschko

Abstract: Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task. 1

2 0.60113108 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

3 0.57842112 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Author: Michael Collins, Shay B. Cohen

Abstract: We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance. 1

4 0.49326909 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

5 0.47277683 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

6 0.47214699 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

7 0.46906614 204 nips-2012-MAP Inference in Chains using Column Generation

8 0.46552676 198 nips-2012-Learning with Target Prior

9 0.45281744 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

10 0.45063207 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

11 0.44813639 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

12 0.44559681 322 nips-2012-Spiking and saturating dendrites differentially expand single neuron computation capacity

13 0.43833831 206 nips-2012-Majorization for CRFs and Latent Likelihoods

14 0.43188488 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

15 0.42574909 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

16 0.42227697 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

17 0.41970709 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

18 0.41316089 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

19 0.41094941 361 nips-2012-Volume Regularization for Binary Classification

20 0.40349177 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (17, 0.014), (21, 0.017), (38, 0.127), (39, 0.014), (42, 0.032), (54, 0.024), (55, 0.025), (74, 0.053), (76, 0.098), (80, 0.085), (92, 0.038), (98, 0.354)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80286682 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

Author: Sean Gerrish, David M. Blei

Abstract: We develop a probabilistic model of legislative data that uses the text of the bills to uncover lawmakers’ positions on specific political issues. Our model can be used to explore how a lawmaker’s voting patterns deviate from what is expected and how that deviation depends on what is being voted on. We derive approximate posterior inference algorithms based on variational methods. Across 12 years of legislative data, we demonstrate both improvement in heldout predictive performance and the model’s utility in interpreting an inherently multi-dimensional space. 1

same-paper 2 0.73079896 267 nips-2012-Perceptron Learning of SAT

Author: Alex Flint, Matthew Blaschko

Abstract: Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task. 1

3 0.66155642 55 nips-2012-Bayesian Warped Gaussian Processes

Author: Miguel Lázaro-gredilla

Abstract: Warped Gaussian processes (WGP) [1] model output observations in regression tasks as a parametric nonlinear transformation of a Gaussian process (GP). The use of this nonlinear transformation, which is included as part of the probabilistic model, was shown to enhance performance by providing a better prior model on several data sets. In order to learn its parameters, maximum likelihood was used. In this work we show that it is possible to use a non-parametric nonlinear transformation in WGP and variationally integrate it out. The resulting Bayesian WGP is then able to work in scenarios in which the maximum likelihood WGP failed: Low data regime, data with censored values, classification, etc. We demonstrate the superior performance of Bayesian warped GPs on several real data sets.

4 0.63228416 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

Author: Sourish Chaudhuri, Bhiksha Raj

Abstract: Approaches to audio classification and retrieval tasks largely rely on detectionbased discriminative models. We submit that such models make a simplistic assumption in mapping acoustics directly to semantics, whereas the actual process is likely more complex. We present a generative model that maps acoustics in a hierarchical manner to increasingly higher-level semantics. Our model has two layers with the first layer modeling generalized sound units with no clear semantic associations, while the second layer models local patterns over these sound units. We evaluate our model on a large-scale retrieval task from TRECVID 2011, and report significant improvements over standard baselines. 1

5 0.60385805 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Author: Aharon Birnbaum, Shai S. Shwartz

Abstract: Given α, ϵ, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 + α) L∗ + ϵ, where L∗ is the γ γ optimal γ-margin error rate. For α = 1/γ, polynomial time and sample complexity is achievable using the hinge-loss. For α = 0, Shalev-Shwartz et al. [2011] showed that poly(1/γ) time is impossible, while learning is possible in ˜ time exp(O(1/γ)). An immediate question, which this paper tackles, is what is achievable if α ∈ (0, 1/γ). We derive positive results interpolating between the polynomial time for α = 1/γ and the exponential time for α = 0. In particular, we show that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time. Our results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. 1

6 0.58527821 180 nips-2012-Learning Mixtures of Tree Graphical Models

7 0.48404521 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

8 0.48146552 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

9 0.47869077 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

10 0.47841454 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

11 0.47813493 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

12 0.477357 227 nips-2012-Multiclass Learning with Simplex Coding

13 0.47715592 292 nips-2012-Regularized Off-Policy TD-Learning

14 0.47714788 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

15 0.47712931 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

16 0.47692078 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

17 0.47676378 65 nips-2012-Cardinality Restricted Boltzmann Machines

18 0.47669342 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

19 0.47665831 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

20 0.47652504 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs