nips nips2011 nips2011-7 knowledge-graph by maker-knowledge-mining

7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

Source: pdf

Author: Matthew A. Kayala, Pierre F. Baldi

Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufﬁcient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we ﬁrst train atom level reactivity ﬁlters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The ﬁnal system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. [sent-4, score-0.767]

2 We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. [sent-6, score-1.289]

3 We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6. [sent-7, score-0.554]

4 We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. [sent-9, score-0.7]

5 Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. [sent-13, score-0.908]

6 Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89. [sent-14, score-0.306]

7 9% of the time when top ranked lists with at most four nonproductive reactions are considered. [sent-16, score-0.59]

8 Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle. [sent-18, score-0.218]

9 1 Introduction Determining the major products of chemical reactions given the input reactants and conditions is a fundamental problem in organic chemistry. [sent-19, score-0.918]

10 This leads to modeling reactions as minimum energy paths between stable atom conﬁgurations on a high-dimensional potential energy surface, where the path through the lowest energy transition state, i. [sent-27, score-0.678]

11 [H:3][Br:4]>>[Br:4][C:1][C:2][H:3] Figure 1: Overall transformation of an alkene (hydrocarbon with double bond) with hydrobromic acid (HBr) and corresponding mechanistic reactions. [sent-37, score-0.29]

12 (b) shows the two mechanistic reactions composing the overall transformation as arrowpushing diagrams[15, 16]. [sent-43, score-0.81]

13 Somewhere between low-level QM treatment and abstract graph-based overall transformations, one can consider reactions at the mechanistic level. [sent-47, score-0.777]

14 A mechanistic, or elementary, reaction is a concerted electron movement through a single transition state[15, 16]. [sent-48, score-0.699]

15 These mechanistic reactions can be composed to yield overall transformations. [sent-49, score-0.8]

16 For example, Figure 1 shows the overall transformation of an alkene interacting with hydrobromic acid to yield an alkyl halide, along with the two elementary reactions which compose the transformation. [sent-50, score-0.754]

17 A mechanistic reaction is described as an idealized molecular orbital (MO) interaction between an electron source (donor) MO and an electron sink (acceptor) MO. [sent-51, score-1.34]

18 In general, potential electron sources are composed of lone pairs of electrons and bonds, and potential electron sinks are composed of empty atomic orbitals and bonds. [sent-53, score-0.432]

19 Bonds can act as either a source or a sink depending on the context. [sent-54, score-0.28]

20 Note that by considering all possible pairings of source and sink MOs, this representation allows the exhaustive enumeration of all potential mechanistic reactions over an arbitrary set of molecules. [sent-57, score-1.044]

21 Here, the elementary reactions represent “productive” mechanistic steps, i. [sent-59, score-0.794]

22 those reactions which lead to the overall major products. [sent-61, score-0.583]

23 Thus, elementary reactions which are not the most kinetically favorable, but which eventually lead to the overall thermodynamic transformation product may be considered “productive”. [sent-62, score-0.691]

24 While mechanistic reaction representations are approximations quite far from the Schr¨ dinger equao tion, we expect them to be closer to the underlying reality and therefore more useful than overall transformations. [sent-64, score-0.832]

25 Furthermore, we expect them also to be easier to predict than overall transformations due to their more elementary nature and mechanistic interpretation. [sent-65, score-0.313]

26 In combination, these arguments suggest that working with mechanistic steps may facilitate the application of statistical machine learning approaches, and take advantage of their capability to generalize. [sent-66, score-0.194]

27 Thus, in this work, reactions are modeled as mechanisms, and for the remainder of the paper, we consider the term “reaction” to denote a single elementary reaction. [sent-67, score-0.6]

28 Furthermore, we consider the problem of reaction prediction to be precisely that of identifying the “productive” reactions over a given set of reactants under particular conditions. [sent-68, score-1.249]

29 There has been very little work on machine learning approaches to reaction prediction. [sent-69, score-0.546]

30 The sole example is a paper from 1990 on inductively extracting overall transformation patterns from reaction databases[12], a method which was never actually incorporated into a full reaction prediction system. [sent-70, score-1.186]

31 Given improvements in both computing power and machine learning methods over the past 20 years, one could imagine a machine learning system that mines reaction information to learn the grammar of chemistry, e. [sent-72, score-0.583]

32 , Reaxys[20] or SPRESI[21], the reactions in these databases are mostly unbalanced, not atom-mapped, and lack mechanistic detail[22]. [sent-80, score-0.762]

33 As a result, and to the best of our knowledge, effective machine learning approaches to reaction prediction still need to be developed. [sent-82, score-0.571]

34 2 A new approach The limitations of previous work motivate a new, fresh approach to reaction prediction combining machine learning with mechanistic representations. [sent-84, score-0.765]

35 The key idea is to ﬁrst enumerate all potential source and sink MOs, and thus all possible reactions by their pairing, and then use classiﬁcation and ranking techniques to identify productive reactions. [sent-85, score-1.246]

36 By using very general rules to enumerate possible reactions, the approach is not restricted to manually curated reaction patterns. [sent-87, score-0.546]

37 By detailing individual reactions at the mechanistic level, the system may be able to statistically learn efﬁcient predictive models based on physicochemical attributes rather than abstract overall transformations. [sent-88, score-0.847]

38 And by ranking possible reactions instead of making binary decisions, the system may provide results amenable to ﬂexible interpretation. [sent-89, score-0.707]

39 3 2 The data challenge A mechanistically deﬁned dataset of reactions to use with the proposed approach does not currently exist. [sent-92, score-0.585]

40 The validation suite is a manually composed set of reactants, reagents, and products covering a complete undergraduate organic chemistry curriculum. [sent-94, score-0.224]

41 Entering a set of reactants and a reagent model into Reaction Explorer yields the complete sequence of mechanistic steps leading to the ﬁnal products, where all reactions in this sequence share the conditions encoded by the corresponding reagent model. [sent-95, score-0.938]

42 Each one of these mechanistic steps is considered to be a distinct productive elementary reaction. [sent-96, score-0.52]

43 For a given set of reactants and conditions, which we call a (r, c) query tuple, the Reaction Explorer system labels a small set of reactions productive, while all other reactions enumerated by pairing source and sink MOs over the reactants are considered non-productive. [sent-97, score-1.673]

44 An (a, c) tuple has label srcreact = 1 if it is the main atom of a source MO in a productive reaction over any corresponding (r, c) query, and has label srcreact = 0 otherwise. [sent-99, score-1.093]

45 The label sinkreact is deﬁned similarly using sink MOs. [sent-100, score-0.225]

46 Note that any mechanistic interaction with the solvent or reagent is explicitly modeled, e. [sent-104, score-0.249]

47 As an initial validation of the method, we consider general ionic reactions from the Reaction Explorer validation suite involving C, H, N, O, Li, Mg, and the halides. [sent-107, score-0.571]

48 Extensions to include stereoselective, pericyclic, and radical reactions are discussed in Section 5. [sent-108, score-0.569]

49 14 million reactions composed of 84,825 source and 74,725 sink MOs from 2,752 distinct reactants and reaction conditions, i. [sent-110, score-1.527]

50 There are 22,894 atom symmetry classes, which when paired with reaction condition yields 29,104 (a, c) tuples. [sent-115, score-0.654]

51 3 The combinatorial complexity challenge In the dataset, the average molecule has 44 source MOs and 50 sink MOs. [sent-121, score-0.328]

52 For this average molecule, considering only intermolecular reactions with a second copy of the same molecule gives 44 × 50 = 2200 potential elementary reactions. [sent-122, score-0.671]

53 Thus, the number of possible reactions is very large, motivating identifying productive reactions given a (r, c) query in two stages. [sent-123, score-1.367]

54 In the ﬁrst stage, we train ﬁlters using classiﬁcation techniques on the source and sink reactivity labels. [sent-124, score-0.335]

55 The idea is to train highly sensitive classiﬁers which reduce the breadth of possible reactions without erroneously ﬁltering productive reactions. [sent-125, score-0.82]

56 Then only those source and sink MOs where the main atom passes the respective atom level ﬁlter are considered when enumerating reactions to consider in the second ranking stage for predicting reaction productivity. [sent-126, score-1.712]

57 Here, we train two separate classiﬁers to predict the source and sink atom level reactivity labels, each using the same feature descriptions and machine learning implementations. [sent-127, score-0.443]

58 There are 14 real-valued physicochemical features such as the reaction conditions, the molecular weight of the molecule, and the charge at and around the atom. [sent-131, score-0.625]

59 In addition to standard molecular graph features, we also include similar topological features over a restricted alphabet pharmacophore point graph, where pharmacophore point graph deﬁnitions are adapted from H¨ hnke, et al[23]. [sent-134, score-0.228]

60 3 Results We report the true negative rate (TNR) and the false negative rate (FNR) for both the source and sink classiﬁcation problems as well as for the the actual reaction ﬁltering problem, as shown in Table 1. [sent-149, score-0.826]

61 Source reactive and sink reactive rows show results on the respective classiﬁcation problems. [sent-157, score-0.327]

62 The reaction row shows results of using the two atom classiﬁers for an initial reaction ﬁltering. [sent-158, score-1.2]

63 2 4 The ranking challenge We pose the task of identifying the productive reactions as a ranking problem. [sent-177, score-1.066]

64 1 Feature representation Each reaction is composed of a source and sink MO. [sent-183, score-0.849]

65 The reaction feature vector is the concatenation of the corresponding source and sink atom level feature vectors with some modiﬁcations. [sent-184, score-0.934]

66 And ﬁnally, 450 features describing the forward and inverse reactions are calculated, including atoms and bonds involved and implied transition state geometry. [sent-187, score-0.58]

67 This leads to a total of 1,677 reaction features. [sent-188, score-0.546]

68 The ﬁnal output will approach 1 if the (source, sink) A pair is predicted to be relatively more productive than the (source, sink) B pair, and 0 otherwise. [sent-206, score-0.273]

69 Each machine in the ensemble is trained with all the productive reactions (from the training set) and a random partition of the non-productive reactions (from the training set). [sent-212, score-1.367]

70 Final ranking on the test set is determined by either simple majority vote or by ranking the average scores from the linear output node of the inner shared-weight network for each machine in the ensemble. [sent-213, score-0.246]

71 NDCG@i is a common information retrieval metric[26] that sums the overall usefulness (or gain) of productive reactions in a given list of the top-i results, where individual gain decays exponentially with lower position. [sent-217, score-0.876]

72 For example, NDCG@1 is the fraction of (r, c) queries in which the top ranked reaction is a productive reaction. [sent-219, score-0.896]

73 Percent Within-n is simply how many (r, c) queries have at most n non-productive reactions in the smallest ranked list containing all productive reactions. [sent-220, score-0.917]

74 For example, Percent Within-0 measures the percent of (r, c) 6 queries with perfect rank, and Percent Within-4 measures how often all productive reactions are recovered with at most 4 mis-ranked non-productive reactions. [sent-221, score-0.904]

75 Note that the NDCG@1 and Percent Within-0 will differ because roughly 10% of (r, c) queries have more than one productive reaction. [sent-222, score-0.307]

76 The non-productive MO interactions vastly outnumber the productive interactions. [sent-223, score-0.273]

77 5% of the queries, the top ranked reaction is productive. [sent-226, score-0.589]

78 9% of queries recover all productive reactions by considering lists with at most four non-productive reactions. [sent-229, score-0.854]

79 4 Chemical applications The strong performance of the ranking system is exhibited by its ability to make accurate multi-step reaction predictions. [sent-255, score-0.706]

80 An example, shown in the ﬁrst row of Table 3, is an intramolecular Claisen condensation reaction with conditions (room temperature, polar aprotic solvent) requiring three elementary steps. [sent-256, score-0.687]

81 The ranking method correctly predicts the given reaction as the highest ranked reaction at each step. [sent-257, score-1.258]

82 The ﬁrst row shows an example of full multi-step reaction prediction by the ranking system, a three step intramolecular Claisen condensation (room temp. [sent-259, score-0.716]

83 At each stage, the reaction shown is the top ranked when all possible reactions are considered by the two stage machine learning system. [sent-261, score-1.136]

84 These reactions lead to the formation of a seven homo-cycle (7 carbons) on the left and seven hetero-cycle (6 carbons, 1 oxygen) on the right. [sent-263, score-0.595]

85 MultiStep Reaction Prediction Generality Reasonable Errors A generalizable system should be able to make reasonable predictions about reactants and reaction types with which it has only had implicit, rather than explicit, experience. [sent-265, score-0.747]

86 Reaction Explorer, as a 7 rule-based expert system without explicit rules about larger ring forming reactions, does not make any predictions about seven and eight atom cyclizations. [sent-266, score-0.241]

87 In reality though, larger ring forming reactions are possible. [sent-267, score-0.603]

88 The second row of Table 3 shows the top two ranked reactions over a set of bromo-hept-1-en-2-olate reactants, leading to seven-member ring formation. [sent-268, score-0.612]

89 The ranking model, without ever being trained with seven or eight-member ring forming reactions, returns the enolate attack as the most favorable, but also returns the lone pair nucleophilic substitution as the second most favorable. [sent-269, score-0.233]

90 For example, the third row of Table 3 shows two reactions involving an oxonium compound and a bromide anion. [sent-275, score-0.58]

91 Our ranking models return these two reactions as the highest, ranking the deprotonation slightly ahead of the substitution. [sent-276, score-0.837]

92 This is considered a Within-1 ranking because the Reaction Explorer system labels only the substitution reaction as productive. [sent-277, score-0.706]

93 However, the immediate precursor reaction in the sequence of Reaction Explorer mechanisms leading to these reactants is the inverse of the deprotonation reaction, i. [sent-278, score-0.721]

94 Hydrogen transfer reactions like this are reversible, and thus the deprotonation is likely the kinetically favored mechanism, i. [sent-281, score-0.613]

95 5 Conclusion Being able to predict the outcome of chemical reactions is a fundamental scientiﬁc problem. [sent-286, score-0.658]

96 The ultimate goal of a reaction prediction system is to recapitulate and eventually surpass the ability of human chemists. [sent-287, score-0.608]

97 In this work, we take a signiﬁcant step in this direction, showing how to formulate reaction prediction as a machine learning problem and building an accurate implementation for a large and key subset of organic chemistry. [sent-288, score-0.68]

98 There are a number of immediate applications of our system, including validating retro-synthetic suggestions, generating virtual libraries of molecules, and mechanistically annotating existing reaction databases. [sent-289, score-0.584]

99 No electron left behind: a rule-based expert system to predict chemical reactions and reaction mechanisms. [sent-414, score-1.406]

100 Automated derivation of reaction rules for the EROS 6. [sent-423, score-0.546]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reactions', 0.547), ('reaction', 0.546), ('productive', 0.273), ('sink', 0.203), ('mechanistic', 0.194), ('reactants', 0.131), ('ranking', 0.123), ('electron', 0.115), ('chemical', 0.111), ('explorer', 0.109), ('organic', 0.109), ('atom', 0.108), ('cv', 0.097), ('ndcg', 0.087), ('source', 0.077), ('mos', 0.076), ('reactive', 0.062), ('mo', 0.058), ('electrons', 0.055), ('pharmacophore', 0.055), ('reactivity', 0.055), ('elementary', 0.053), ('expert', 0.05), ('percent', 0.05), ('bond', 0.048), ('chemistry', 0.048), ('molecule', 0.048), ('molecular', 0.046), ('deprotonation', 0.044), ('orbital', 0.044), ('solvation', 0.044), ('tnr', 0.044), ('ranked', 0.043), ('concerted', 0.038), ('mechanistically', 0.038), ('system', 0.037), ('overall', 0.036), ('queries', 0.034), ('reality', 0.034), ('transformation', 0.033), ('polar', 0.033), ('aprotic', 0.033), ('bonds', 0.033), ('bromide', 0.033), ('carbons', 0.033), ('generalizable', 0.033), ('lone', 0.033), ('physicochemical', 0.033), ('reagent', 0.033), ('srcreact', 0.033), ('attack', 0.031), ('transformations', 0.03), ('drug', 0.029), ('curation', 0.026), ('graph', 0.026), ('molecules', 0.025), ('string', 0.025), ('prediction', 0.025), ('internal', 0.024), ('suite', 0.024), ('seven', 0.024), ('composed', 0.023), ('potential', 0.023), ('tuple', 0.023), ('sd', 0.023), ('sigmoidal', 0.023), ('ring', 0.022), ('acs', 0.022), ('alkene', 0.022), ('alkyl', 0.022), ('anion', 0.022), ('carbocation', 0.022), ('chemistries', 0.022), ('chemoinformatics', 0.022), ('claisen', 0.022), ('dinger', 0.022), ('fnr', 0.022), ('gasteiger', 0.022), ('hnke', 0.022), ('hydrobromic', 0.022), ('intelligible', 0.022), ('intramolecular', 0.022), ('kinetically', 0.022), ('nudged', 0.022), ('orbitals', 0.022), ('pericyclic', 0.022), ('productivity', 0.022), ('radical', 0.022), ('sinkreact', 0.022), ('solvent', 0.022), ('stereoselective', 0.022), ('untapped', 0.022), ('site', 0.022), ('databases', 0.021), ('products', 0.02), ('list', 0.02), ('topological', 0.02), ('schr', 0.019), ('acid', 0.019), ('baldi', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

Author: Matthew A. Kayala, Pierre F. Baldi

2 0.092246778 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identiﬁed by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Speciﬁcally, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reﬂect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1

3 0.063825786 19 nips-2011-Active Classification based on Value of Classifier

Author: Tianshi Gao, Daphne Koller

Abstract: Modern classiﬁcation tasks usually involve many class labels and can be informed by a broad range of features. Many of these tasks are tackled by constructing a set of classiﬁers, which are then applied at test time and then pieced together in a ﬁxed procedure determined in advance or at training time. We present an active classiﬁcation process at the test time, where each classiﬁer in a large ensemble is viewed as a potential observation that might inform our classiﬁcation process. Observations are then selected dynamically based on previous observations, using a value-theoretic computation that balances an estimate of the expected classiﬁcation gain from each observation as well as its computational cost. The expected classiﬁcation gain is computed using a probabilistic model that uses the outcome from previous observations. This active classiﬁcation process is applied at test time for each individual test instance, resulting in an efﬁcient instance-speciﬁc decision path. We demonstrate the beneﬁt of the active scheme on various real-world datasets, and show that it can achieve comparable or even higher classiﬁcation accuracy at a fraction of the computational costs of traditional methods.

4 0.057905767 75 nips-2011-Dynamical segmentation of single trials from population neural data

Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani

Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identiﬁed by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identiﬁed without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the ﬁring rate of an isolated neuron based on the ﬁring rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reﬂect its computational role. 1

5 0.047355451 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

Author: Guy Broeck

Abstract: Probabilistic logics are receiving a lot of attention today because of their expressive power for knowledge representation and learning. However, this expressivity is detrimental to the tractability of inference, when done at the propositional level. To solve this problem, various lifted inference algorithms have been proposed that reason at the ﬁrst-order level, about groups of objects as a whole. Despite the existence of various lifted inference approaches, there are currently no completeness results about these algorithms. The key contribution of this paper is that we introduce a formal deﬁnition of lifted inference that allows us to reason about the completeness of lifted inference algorithms relative to a particular class of probabilistic models. We then show how to obtain a completeness result using a ﬁrst-order knowledge compilation approach for theories of formulae containing up to two logical variables. 1 Introduction and related work Probabilistic logic models build on ﬁrst-order logic to capture relational structure and on graphical models to represent and reason about uncertainty [1, 2]. Due to their expressivity, these models can concisely represent large problems with many interacting random variables. While the semantics of these logics is often deﬁned through grounding the models [3], performing inference at the propositional level is – as for ﬁrst-order logic – inefﬁcient. This has motivated the quest for lifted inference methods that exploit the structure of probabilistic logic models for efﬁcient inference, by reasoning about groups of objects as a whole and avoiding repeated computations. The ﬁrst approaches to exact lifted inference have upgraded the variable elimination algorithm to the ﬁrst-order level [4, 5, 6]. More recent work is based on methods from logical inference [7, 8, 9, 10], such as knowledge compilation. While these approaches often yield dramatic improvements in runtime over propositional inference methods on speciﬁc problems, it is still largely unclear for which classes of models these lifted inference operators will be useful and for which ones they will eventually have to resort to propositional inference. One notable exception in this regard is lifted belief propagation [11], which performs exact lifted inference on any model whose factor graph representation is a tree. A ﬁrst contribution of this paper is that we introduce a notion of domain lifted inference, which formally deﬁnes what lifting means, and which can be used to characterize the classes of probabilistic models to which lifted inference applies. Domain lifted inference essentially requires that probabilistic inference runs in polynomial time in the domain size of the logical variables appearing in the model. As a second contribution we show that the class of models expressed as 2-WFOMC formulae (weighted ﬁrst-order model counting with up to 2 logical variables per formula) can be domain lifted using an extended ﬁrst-order knowledge compilation approach [10]. The resulting approach allows for lifted inference even in the presence of (anti-) symmetric or total relations in a theory. These are extremely common and useful concepts that cannot be lifted by any of the existing ﬁrst-order knowledge compilation inference rules. 1 2 Background We will use standard concepts of function-free ﬁrst-order logic (FOL). An atom p(t1 , . . . , tn ) consists of a predicate p/n of arity n followed by n arguments, which are either constants or logical variables. An atom is ground if it does not contain any variables. A literal is an atom a or its negation ¬a. A clause is a disjunction l1 ∨ ... ∨ lk of literals. If k = 1, it is a unit clause. An expression is an atom, literal or clause. The pred(a) function maps an atom to its predicate and the vars(e) function maps an expression to its logical variables. A theory in conjunctive normal form (CNF) is a conjunction of clauses. We often represent theories by their set of clauses and clauses by their set of literals. Furthermore, we will assume that all logical variables are universally quantiﬁed. In addition, we associate a set of constraints with each clause or atom, either of the form X = t, where X is a logical variable and t is a constant or variable, or of the form X ∈ D, where D is a domain, or the negation of these constraints. These deﬁne a ﬁnite domain for each logical variable. Abusing notation, we will use constraints of the form X = t to denote a substitution of X by t. The function atom(e) maps an expression e to its atoms, now associating the constraints on e with each atom individually. To add the constraint c to an expression e, we use the notation e ∧ c. Two atoms unify if there is a substitution which makes them identical and if the conjunction of the constraints on both atoms with the substitution is satisﬁable. Two expressions e1 and e2 are independent, written e1 ⊥ e2 , if no atom a1 ∈ atom(e1 ) uniﬁes with an atom a2 ∈ atom(e2 ). ⊥ We adopt the Weighted First-Order Model Counting (WFOMC) [10] formalism to represent probabilistic logic models, building on the notion of a Herbrand interpretation. Herbrand interpretations are subsets of the Herbrand base HB (T ), which consists of all ground atoms that can be constructed with the available predicates and constant symbols in T . The atoms in a Herbrand interpretation are assumed to be true. All other atoms in HB (T ) are assumed to be false. An interpretation I satisﬁes a theory T , written as I |= T , if it satisﬁes all the clauses c ∈ T . The WFOMC problem is deﬁned on a weighted logic theory T , which is a logic theory augmented with a positive weight function w and a negative weight function w, which assign a weight to each predicate. The WFOMC problem involves computing wmc(T, w, w) = w(pred(a)) I|=T a∈I 3 3.1 w(pred(a)). (1) a∈HB(T )\I First-order knowledge compilation for lifted probabilistic inference Lifted probabilistic inference A ﬁrst-order probabilistic model deﬁnes a probability distribution P over the set of Herbrand interpretations H. Probabilistic inference in these models is concerned with computing the posterior probability P(q|e) of query q given evidence e, where q and e are logical expressions in general: P(q|e) = h∈H,h|=q∧e P(h) h∈H,h|=e P(h) (2) We propose one notion of lifted inference for ﬁrst-order probabilistic models, deﬁned in terms of the computational complexity of inference w.r.t. the domains of logical variables. It is clear that other notions of lifted inference are conceivable, especially in the case of approximate inference. Deﬁnition 1 (Domain Lifted Probabilistic Inference). A probabilistic inference procedure is domain lifted for a model m, query q and evidence e iff the inference procedure runs in polynomial time in |D1 |, . . . , |Dk | with Di the domain of the logical variable vi ∈ vars(m, q, e). Domain lifted inference does not prohibit the algorithm to be exponential in the size of the vocabulary, that is, the number of predicates, arguments and constants, of the probabilistic model, query and evidence. In fact, the deﬁnition allows inference to be exponential in the number of constants which occur in arguments of atoms in the theory, query or evidence, as long as it is polynomial in the cardinality of the logical variable domains. This deﬁnition of lifted inference stresses the ability to efﬁciently deal with the domains of the logical variables that arise, regardless of their size, and formalizes what seems to be generally accepted in the lifted inference literature. 2 A class of probabilistic models is a set of probabilistic models expressed in a particular formalism. As examples, consider Markov logic networks (MLN) [12] or parfactors [4], or the weighted FOL theories for WFOMC that we introduced above, when the weights are normalized. Deﬁnition 2 (Completeness). Restricting queries to atoms and evidence to a conjunction of literals, a procedure that is domain lifted for all probabilistic models m in a class of models M and for all queries q and evidence e, is called complete for M . 3.2 First-order knowledge compilation First-order knowledge compilation is an approach to lifted probabilistic inference consisting of the following three steps (see Van den Broeck et al. [10] for details): 1. Convert the probabilistic logical model to a weighted CNF. Converting MLNs or parfactors requires adding new atoms to the theory that represent the (truth) value of each factor or formula. set-disjunction 2 friends(X, Y ) ∧ smokes(X) ⇒ smokes(Y ) Smokers ⊆ People decomposable conjunction unit clause leaf (a) MLN Model ∧ smokes(X), X ∈ Smokers smokes(Y ) ∨ ¬ smokes(X) ∨¬ friends(X, Y ) ∨ ¬ f(X, Y ) friends(X, Y ) ∨ f(X, Y ) smokes(X) ∨ f(X, Y ) ¬ smokes(Y ) ∨ f(X, Y ). ∧ f(X, Y ), Y ∈ Smokers ∧ ¬ smokes(Y ), Y ∈ Smokers / ∧ f(X, Y ), X ∈ Smokers, Y ∈ Smokers / / x ∈ Smokers (b) CNF Theory deterministic disjunction Predicate friends smokes f w 1 1 e2 w 1 1 1 y ∈ Smokers / ∨ set-conjunction ∧ f(x, y) (c) Weight Functions ¬ friends(x, y) ∧ friends(x, y) ¬ f(x, y) (d) First-Order d-DNNF Circuit Figure 1: Friends-smokers example (taken from [10]) Example 1. The MLN in Figure 1a assigns a weight to a formula in FOL. Figure 1b represents the same model as a weighted CNF, introducing a new atom f(X, Y ) to encode the truth value of the MLN formula. The probabilistic information is captured by the weight functions in Figure 1c. 2. Compile the logical theory into a First-Order d-DNNF (FO d-DNNF) circuit. Figure 1d shows an example of such a circuit. Leaves represent unit clauses. Inner nodes represent the disjunction or conjunction of their children l and r, but with the constraint that disjunctions must be deterministic (l ∧ r is unsatisﬁable) and conjunctions must be decomposable (l ⊥ r). ⊥ 3. Perform WFOMC inference to compute posterior probabilities. In a FO d-DNNF circuit, WFOMC is polynomial in the size of the circuit and the cardinality of the domains. To compile the CNF theory into a FO d-DNNF circuit, Van den Broeck et al. [10] propose a set of compilation rules, which we will refer to as CR 1 . We will now brieﬂy describe these rules. Unit Propagation introduces a decomposable conjunction when the theory contains a unit clause. Independence creates a decomposable conjunction when the theory contains independent subtheories. Shannon decomposition applies when the theory contains ground atoms and introduces a deterministic disjunction between two modiﬁed theories: one where the ground atom is true, and one where it is false. Shattering splits clauses in the theory until all pairs of atoms represent either a disjoint or identical set of ground atoms. Example 2. In Figure 2a, the ﬁrst two clauses are made independent from the friends(X, X) clause and split off in a decomposable conjunction by unit propagation. The unit clause becomes a leaf of the FO d-DNNF circuit, while the other operand requires further compilation. 3 dislikes(X, Y ) ∨ friends(X, Y ) fun(X) ∨ ¬ friends(X, Y ) friends(X, Y ) ∨ dislikes(X, Y ) ¬ friends(X, Y ) ∨ likes(X, Y ) friends(X, X) fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) ∧ FunPeople ⊆ People x ∈ People friends(X, X) friends(X, Y ) ∨ dislikes(X, Y ), X = Y ¬ friends(X, Y ) ∨ likes(X, Y ), X = Y likes(X, X) dislikes(x, Y ) ∨ friends(x, Y ) fun(x) ∨ ¬ friends(x, Y ) fun(X), X ∈ FunPeople ¬ fun(X), X ∈ FunPeople / fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) (a) Unit propagation of friends(X, X) (b) Independent partial grounding (c) Atom counting of fun(X) Figure 2: Examples of compilation rules. Circles are FO d-DNNF inner nodes. White rectangles show theories before and after applying the rule. All variable domains are People. (taken from [10]) Independent Partial Grounding creates a decomposable conjunction over a set of child circuits, which are identical up to the value of a grounding constant. Since they are structurally identical, only one child circuit is actually compiled. Atom Counting applies when the theory contains an atom with a single logical variable X ∈ D. It explicitly represents the domain D ⊆ D of X for which the atom is true. It compiles the theory into a deterministic disjunction between all possible such domains. Again, these child circuits are identical up to the value of D and only one is compiled. Example 3. The theory in Figure 2b is compiled into a decomposable set-conjunction of theories that are independent and identical up to the value of the x constant. The theory in Figure 2c contains an atom with one logical variable: fun(X). Atom counting compiles it into a deterministic setdisjunction over theories that differ in FunPeople, which is the domain of X for which fun(X) is true. Subsequent steps of unit propagation remove the fun(X) atoms from the theory entirely. 3.3 Completeness We will now characterize those theories where the CR 1 compilation rules cannot be used, and where the inference procedure has to resort to grounding out the theory to propositional logic. For these, ﬁrst-order knowledge compilation using CR 1 is not yet domain lifted. When a logical theory contains symmetric, anti-symmetric or total relations, such as friends(X, Y ) ⇒ friends(Y, X), parent(X, Y ) ⇒ ¬ parent(Y, X), X = Y, ≤ (X, Y) ∨ ≤ (Y, X), or more general formulas, such as enemies(X, Y ) ⇒ ¬ friend(X, Y ) ∧ ¬ friend(Y, X), (3) (4) (5) (6) none of the CR 1 rules apply. Intuitively, the underlying problem is the presence of either: • Two unifying (not independent) atoms in the same clause which contain the same logical variable in different positions of the argument list. Examples include (the CNF of) Formulas 3, 4 and 5, where the X and Y variable are bound by unifying two atoms from the same clause. • Two logical variables that bind when unifying one pair of atoms but appear in different positions of the argument list of two other unifying atoms. Examples include Formula 6, which in CNF is ¬ friend(X, Y ) ∨ ¬ enemies(X, Y ) ¬ friend(Y, X) ∨ ¬ enemies(X, Y ) Here, unifying the enemies(X, Y ) atoms binds the X variables from both clauses, which appear in different positions of the argument lists of the unifying atoms friend(X, Y ) and friend(Y, X). Both of these properties preclude the use of CR 1 rules. Also in the context of other model classes, such as MLNs, probabilistic versions of the above formulas cannot be processed by CR 1 rules. 4 Even though ﬁrst-order knowledge compilation with CR 1 rules does not have a clear completeness result, we can show some properties of theories to which none of the compilation rules apply. First, we need to distinguish between the arity of an atom and its dimension. A predicate with arity two might have atoms with dimension one, when one of the arguments is ground or both are identical. Deﬁnition 3 (Dimension of an Expression). The dimension of an expression e is the number of logical variables it contains: dim(e) = | vars(e)|. Lemma 1 (CR 1 Postconditions). The CR 1 rules remove all atoms from the theory T which have zero or one logical variable arguments, such that afterwards ∀a ∈ atom(T ) : dim(a) > 1. When no CR 1 rule applies, the theory is shattered and contains no independent subtheories. Proof. Ground atoms are removed by the Shannon decomposition operator followed by unit propagation. Atoms with a single logical variable (including unary relations) are removed by the atom counting operator followed by unit propagation. If T contains independent subtheories, the independence operator can be applied. Shattering is always applied when T is not yet shattered. 4 Extending ﬁrst-order knowledge compilation In this section we introduce a new operator which does apply to the theories from Section 3.3. 4.1 Logical variable properties To formally deﬁne the operator we propose, and prove its correctness, we ﬁrst introduce some mathematical concepts related to the logical variables in a theory (partly after Jha et al. [8]). Deﬁnition 4 (Binding Variables). Two logical variables X, Y are directly binding b(X, Y ) if they are bound by unifying a pair of atoms in the theory. The binding relationship b+ (X, Y ) is the transitive closure of the directly binding relation b(X, Y ). Example 4. In the theory ¬ p(W, X) ∨ ¬ q(X) r(Y ) ∨ ¬ q(Y ) ¬ r(Z) ∨ s(Z) the variable pairs (X, Y ) and (Y, Z) are directly binding. The variables X, Y and Z are binding. Variable W does not bind to any other variable. Note that the binding relationship b+ (X, Y ) is an equivalence relation that deﬁnes two equivalence classes: {X, Y, Z} and {W }. Lemma 2 (Binding Domains). After shattering, binding logical variables have identical domains. Proof. During shattering (see Section 3.2), when two atoms unify, binding two variables with partially overlapping domains, the atoms’ clauses are split up into clauses where the domain of the variables is identical, and clauses where the domains are disjoint and the atoms no longer unify. Deﬁnition 5 (Root Binding Class). A root variable is a variable that appears in all the atoms in its clause. A root binding class is an equivalence class of binding variables where all variables are root. Example 5. In the theory of Example 4, {X, Y, Z} is a root binding class and {W } is not. 4.2 Domain recursion We will now introduce the new domain recursion operator, starting with its preconditions. Deﬁnition 6. A theory allows for domain recursion when (i) the theory is shattered, (ii) the theory contains no independent subtheories and (iii) there exists a root binding class. From now on, we will denote with C the set of clauses of the theory at hand and with B a root binding class guaranteed to exist if C allows for domain recursion. Lemma 2 states that all variables in B have identical domains. We will denote the domain of these variables with D. The intuition behind the domain recursion operator is that it modiﬁes D by making one element explicit: D = D ∪ {xD } with xD ∈ D . This explicit domain element is introduced by the S PLIT D / function, which splits clauses w.r.t. the new subdomain D and element xD . 5 Deﬁnition 7 (S PLIT D). For a clause c and given set of variables Vc ⊆ vars(c) with domain D, let S PLIT D(c, Vc ) = c, if Vc = ∅ S PLIT D(c1 , Vc \ {V }) ∪ S PLIT D(c2 , Vc \ {V }), if Vc = ∅ (7) where c1 = c ∧ (V = xD ) and c2 = c ∧ (V = xD ) ∧ (V ∈ D ) for some V ∈ Vc . For a set of clauses C and set of variables V with domain D: S PLIT D(C, V) = c∈C S PLIT D(c, V ∩ vars(c)). The domain recursion operator creates three sets of clauses: S PLIT D(C, B) = Cx ∪ Cv ∪ Cr , with Cx = {c ∧ (V = xD )|c ∈ C}, (8) V ∈B∩vars(c) Cv = {c ∧ (V = xD ) ∧ (V ∈ D )|c ∈ C}, (9) V ∈B∩vars(c) Cr = S PLIT D(C, B) \ Cx \ Cv . (10) Proposition 3. The conjunction of the domain recursion sets is equivalent to the original theory: c∈Cr c . c∈Cv c ∧ c∈C c ≡ c∈S PLIT D(C,B) c and therefore c∈C c ≡ c∈Cx c ∧ We will now show that these sets are independent and that their conjunction is decomposable. Theorem 4. The theories Cx , Cv and Cr are independent: Cx ⊥ Cv , Cx ⊥ Cr and Cv ⊥ Cr . ⊥ ⊥ ⊥ The proof of Theorem 4 relies on the following Lemma. Lemma 5. If the theory allows for domain recursion, all clauses and atoms contain the same number of variables from B: ∃n, ∀c ∈ C, ∀a ∈ atom(C) : | vars(c) ∩ B | = | vars(a) ∩ B | = n. c Proof. Denote with Cn the clauses in C that contain n logical variables from B and with Cn its compliment in C. If C is nonempty, there is a n > 0 for which Cn is nonempty. Then every atom in Cn contains exactly n variables from B (Deﬁnition 5). Since the theory contains no independent c c subtheories, there must be an atom a in Cn which uniﬁes with an atom ac in Cn , or Cn is empty. After shattering, all uniﬁcations bind one variable from a to a single variable from ac . Because a contains exactly n variables from B, ac must also contain exactly n (Deﬁnition 4), and because B is c a root binding class, the clause of ac also contains exactly n, which contradicts the deﬁnition of Cn . c Therefore, Cn is empty, and because the variables in B are root, they also appear in all atoms. Proof of Theorem 4. From Lemma 5, all atoms in C contain the same number of variables from B. In Cx , these variables are all constrained to be equal to xD , while in Cv and Cr at least one variable is constrained to be different from xD . An attempt to unify an atom from Cx with an atom from Cv or Cr therefore creates an unsatisﬁable set of constraints. Similarly, atoms from Cv and Cr cannot be uniﬁed. Finally, we extend the FO d-DNNF language proposed in Van den Broeck et al. [10] with a new node, the recursive decomposable conjunction ∧ r , and deﬁne the domain recursion compilation rule. Deﬁnition 8 ( ∧ r ). The FO d-DNNF node ∧ r (nx , nr , D, D , V) represents a decomposable conjunction between the d-DNNF nodes nx , nr and a d-DNNF node isomorphic to the ∧ r node itself. In particular, the isomorphic operand is identical to the node itself, except for the size of the domain of the variables in V, which becomes one smaller, going from D to D in the isomorphic operand. We have shown that the conjunction between sets Cx , Cv and Cr is decomposable (Theorem 4) and logically equivalent to the original theory (Proposition 3). Furthermore, Cv is identical to C, up to the constraints on the domain of the variables in B. This leads us to the following deﬁnition of domain recursion. Deﬁnition 9 (Domain Recursion). The domain recursion compilation rule compiles C into ∧ r (nx , nr , D, D , B), where nx , nr are the compiled circuits for Cx , Cr . The third set Cv is represented by the recursion on D, according to Deﬁnition 8. 6 nv Cr ∧r ¬ friends(x, X) ∨ friends(X, x), X = x ¬ friends(X, x) ∨ friends(x, X), X = x P erson ← P erson \ {x} nr nx ∨ Cx ¬ friends(x, x) ∨ friends(x, x) x ∈P erson x =x ¬ friends(x, x) friends(x, x) ∨ ∧ ¬ friends(x, x ) ¬ friends(x , x) ∧ friends(x, x ) friends(x , x) Figure 3: Circuit for the symmetric relation in Equation 3, rooted in a recursive conjunction. Example 6. Figure 3 shows the FO d-DNNF circuit for Equation 3. The theory is split up into three independent theories: Cr and Cx , shown in the Figure 3, and Cv = {¬ friends(X, Y ) ∨ friends(Y, X), X = x, Y = x}. The conjunction of these theories is equivalent to Equation 3. Theory Cv is identical to Equation 3, up to the inequality constraints on X and Y . Theorem 6. Given a function size, which maps domains to their size, the weighted ﬁrst-order model count of a ∧ r (nx , nr , D, D , V) node is size(D) wmc( ∧ r (nx , nr , D, D , V), size) = wmc(nx , size)size(D) wmc(nr , size ∪{D → s}), s=0 (11) where size ∪{D → s} adds to the size function that the subdomain D has cardinality s. Proof. If C allows for domain recursion, due to Theorem 4, the weighted model count is wmc(C, size) = 1, if size(D) = 0 wmc(Cx ) · wmc(Cv , size ) · wmc(Cr , size ) if size(D) > 0 (12) where size = size ∪{D → size(D) − 1}. Theorem 7. The Independent Partial Grounding compilation rule is a special case of the domain recursion rule, where ∀c ∈ C : | vars(c) ∩ B | = 1 (and therefore Cr = ∅). 4.3 Completeness In this section, we introduce a class of models for which ﬁrst-order knowledge compilation with domain recursion is complete. Deﬁnition 10 (k-WFOMC). The class of k-WFOMC consist of WFOMC theories with clauses that have up to k logical variables. A ﬁrst completeness result is for 2-WFOMC, using the set of knowledge compilation rules CR 2 , which are the rules in CR 1 extended with domain recursion. Theorem 8 (Completeness for 2-WFOMC). First-order knowledge compilation using the CR 2 compilation rules is a complete domain lifted probabilistic inference algorithm for 2-WFOMC. Proof. From Lemma 1, after applying the CR 1 rules, the theory contains only atoms with dimension larger than or equal to two. From Deﬁnition 10, each clause has dimension smaller than or equal to two. Therefore, each logical variable in the theory is a root variable and according to Deﬁnition 5, every equivalence class of binding variables is a root binding class. Because of Lemma 1, the theory allows for domain recursion, which requires further compilation of two theories: Cx and Cr into nx and nr . Both have dimension smaller than 2 and can be lifted by CR 1 compilation rules. The properties of 2-WFOMC are a sufﬁcient but not necessary condition for ﬁrst-order knowledge compilation to be domain lifted. We can obtain a similar result for MLNs or parfactors by reducing them to a WFOMC problem. If an MLN contains only formulae with up to k logical variables, then its WFOMC representation will be in k-WFOMC. 7 This result for 2-WFOMC is not trivial. Van den Broeck et al. [10] showed in their experiments that counting ﬁrst-order variable elimination (C-FOVE) [6] fails to lift the “Friends Smoker Drinker” problem, which is in 2-WFOMC. We will show in the next section that the CR 1 rules fail to lift the theory in Figure 4a, which is in 2-WFOMC. Note that there are also useful theories that are not in 2-WFOMC, such as those containing the transitive relation friends(X, Y ) ∧ friends(Y, Z) ⇒ friends(X, Z). 5 Empirical evaluation To complement the theoretical results of the previous section, we extended the WFOMC implementation1 with the domain recursion rule. We performed experiments with the theory in Figure 4a, which is a version of the friends and smokers model [11] extended with the symmetric relation of Equation 3. We evaluate the performance querying P(smokes(bob)) with increasing domain size, comparing our approach to the existing WFOMC implementation and its propositional counterpart, which ﬁrst grounds the theory and then compiles it with the c2d compiler [13] to a propositional d-DNNF circuit. We did not compare to C-FOVE [6] because it cannot perform lifted inference on this model. 2 smokes(X) ∧ friends(X, Y ) ⇒ smokes(Y ) friends(X, Y ) ⇒ friends(Y, X). Runtime [s] Propositional inference quickly becomes intractable when there are more than 20 people. The lifted inference algorithms scale much better. The CR 1 rules can exploit some regularities in the model. For example, they eliminate all the smokes(X) atoms from the theory. They do, however, resort to grounding at a later stage of the compilation process. With the domain recursion rule, there is no need for grounding. This advantage is clear in the experiments, our approach having an almost constant inference time in this range of domains sizes. Note that the runtimes for c2d include compilation and evaluation of the circuit, whereas the WFOMC runtimes only represent evaluation of the FO d-DNNF. After all, propositional compilation depends on the domain size but ﬁrst-order compilation does not. First-order compilation takes a constant two seconds for both rule sets. 10000 1000 100 10 1 0.1 0.01 c2d WFOMC - CR1 WFOMC - CR2 10 20 30 40 50 60 Number of People 70 80 (b) Evaluation Runtime (a) MLN Model Figure 4: Symmetric friends and smokers experiment, comparing propositional knowledge compilation (c2d) to WFOMC using compilation rules CR 1 and CR 2 (which includes domain recursion). 6 Conclusions We proposed a deﬁnition of complete domain lifted probabilistic inference w.r.t. classes of probabilistic logic models. This deﬁnition considers algorithms to be lifted if they are polynomial in the size of logical variable domains. Existing ﬁrst-order knowledge compilation turns out not to admit an intuitive completeness result. Therefore, we generalized the existing Independent Partial Grounding compilation rule to the domain recursion rule. With this one extra rule, we showed that ﬁrst-order knowledge compilation is complete for a signiﬁcant class of probabilistic logic models, where the WFOMC representation has up to two logical variables per clause. Acknowledgments The author would like to thank Luc De Raedt, Jesse Davis and the anonymous reviewers for valuable feedback. This work was supported by the Research Foundation-Flanders (FWO-Vlaanderen). 1 http://dtai.cs.kuleuven.be/wfomc/ 8 References [1] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007. [2] Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors. Probabilistic inductive logic programming: theory and applications. Springer-Verlag, Berlin, Heidelberg, 2008. [3] Daan Fierens, Guy Van den Broeck, Ingo Thon, Bernd Gutmann, and Luc De Raedt. Inference in probabilistic logic programs using weighted CNF’s. In Proceedings of UAI, pages 256–265, 2011. [4] David Poole. First-order probabilistic inference. In Proceedings of IJCAI, pages 985–991, 2003. [5] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted ﬁrst-order probabilistic inference. In Proceedings of IJCAI, pages 1319–1325, 2005. [6] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted Probabilistic Inference with Counting Formulas. In Proceedings of AAAI, pages 1062–1068, 2008. [7] Vibhav Gogate and Pedro Domingos. Exploiting Logical Structure in Lifted Probabilistic Inference. In Proceedings of StarAI, 2010. [8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted Inference Seen from the Other Side: The Tractable Features. In Proceedings of NIPS, 2010. [9] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of UAI, pages 256–265, 2011. [10] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted Probabilistic Inference by First-Order Knowledge Compilation. In Proceedings of IJCAI, pages 2178–2185, 2011. [11] Parag Singla and Pedro Domingos. Lifted ﬁrst-order belief propagation. In Proceedings of AAAI, pages 1094–1099, 2008. [12] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1): 107–136, 2006. [13] Adnan Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328–332, 2004. 9

6 0.038752589 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation

7 0.033117011 291 nips-2011-Transfer from Multiple MDPs

8 0.031977706 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning

9 0.030935887 53 nips-2011-Co-Training for Domain Adaptation

10 0.030574227 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

11 0.029971175 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

12 0.026508909 219 nips-2011-Predicting response time and error rates in visual search

13 0.02531627 157 nips-2011-Learning to Search Efficiently in High Dimensions

14 0.024979319 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

15 0.024650529 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

16 0.024016282 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

17 0.023975685 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

18 0.02299956 231 nips-2011-Randomized Algorithms for Comparison-based Search

19 0.022308409 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

20 0.021585463 250 nips-2011-Shallow vs. Deep Sum-Product Networks

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.081), (1, 0.017), (2, -0.006), (3, 0.032), (4, 0.016), (5, -0.013), (6, -0.022), (7, -0.039), (8, -0.022), (9, -0.021), (10, -0.028), (11, -0.025), (12, -0.007), (13, 0.021), (14, 0.05), (15, -0.006), (16, -0.024), (17, 0.04), (18, 0.021), (19, -0.007), (20, 0.008), (21, 0.056), (22, -0.003), (23, 0.014), (24, -0.048), (25, 0.002), (26, 0.006), (27, -0.004), (28, -0.04), (29, -0.025), (30, 0.085), (31, -0.089), (32, -0.002), (33, -0.006), (34, -0.051), (35, -0.069), (36, -0.033), (37, 0.01), (38, -0.047), (39, -0.015), (40, -0.003), (41, -0.023), (42, -0.01), (43, -0.023), (44, -0.025), (45, 0.119), (46, -0.045), (47, 0.064), (48, -0.104), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90680349 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

Author: Matthew A. Kayala, Pierre F. Baldi

2 0.4746899 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification

Author: Ichiro Takeuchi, Masashi Sugiyama

Abstract: We consider feature selection and weighting for nearest neighbor classiﬁers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efﬁciently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the ﬁrst algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efﬁcient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1

3 0.47101173 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

Author: Mohammad S. Sorower, Janardhan R. Doppa, Walker Orr, Prasad Tadepalli, Thomas G. Dietterich, Xiaoli Z. Fern

Abstract: We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces a mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can learn accurate rules and apply them to new texts to make correct inferences. Experiments also show that the method out-performs the standard EM approach that assumes mentions are missing at random. 1

4 0.46349278 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

Author: Guy Broeck

5 0.45413437 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

6 0.44918343 130 nips-2011-Inductive reasoning about chimeric creatures

7 0.43977973 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

8 0.43877137 19 nips-2011-Active Classification based on Value of Classifier

9 0.4293009 277 nips-2011-Submodular Multi-Label Learning

10 0.42737225 33 nips-2011-An Exact Algorithm for F-Measure Maximization

11 0.40976599 53 nips-2011-Co-Training for Domain Adaptation

12 0.37705433 231 nips-2011-Randomized Algorithms for Comparison-based Search

13 0.37359142 8 nips-2011-A Model for Temporal Dependencies in Event Streams

14 0.370078 254 nips-2011-Similarity-based Learning via Data Driven Embeddings

15 0.35748443 275 nips-2011-Structured Learning for Cell Tracking

16 0.35636622 229 nips-2011-Query-Aware MCMC

17 0.35163841 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

18 0.3462202 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

19 0.34220043 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

20 0.34088218 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (4, 0.042), (20, 0.032), (26, 0.029), (31, 0.054), (33, 0.02), (43, 0.042), (44, 0.404), (45, 0.073), (57, 0.031), (65, 0.015), (74, 0.036), (83, 0.044), (99, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77900273 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

Author: Matthew A. Kayala, Pierre F. Baldi

2 0.56721985 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

Author: Cristina Savin, Peter Dayan, Máté Lengyel

Abstract: Storing a new pattern in a palimpsest memory system comes at the cost of interfering with the memory traces of previously stored items. Knowing the age of a pattern thus becomes critical for recalling it faithfully. This implies that there should be a tight coupling between estimates of age, as a form of familiarity, and the neural dynamics of recollection, something which current theories omit. Using a normative model of autoassociative memory, we show that a dual memory system, consisting of two interacting modules for familiarity and recollection, has best performance for both recollection and recognition. This ﬁnding provides a new window onto actively contentious psychological and neural aspects of recognition memory. 1

3 0.43500432 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, speciﬁed by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

4 0.31089103 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a ﬁxed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We deﬁne the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we deﬁne a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with inﬁnite Littlestone dimension learnable. 1

5 0.30984733 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

Abstract: This paper addresses the problem of ﬁnding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difﬁculty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which deﬁnes approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.

6 0.30967343 227 nips-2011-Pylon Model for Semantic Segmentation

7 0.30934936 22 nips-2011-Active Ranking using Pairwise Comparisons

8 0.30916235 186 nips-2011-Noise Thresholds for Spectral Clustering

9 0.30878922 180 nips-2011-Multiple Instance Filtering

10 0.30826357 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

11 0.30788362 258 nips-2011-Sparse Bayesian Multi-Task Learning

12 0.30785388 303 nips-2011-Video Annotation and Tracking with Active Learning

13 0.30740929 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

14 0.30706543 276 nips-2011-Structured sparse coding via lateral inhibition

15 0.30625325 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning

16 0.30599135 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

17 0.30574465 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

18 0.30569965 64 nips-2011-Convergent Bounds on the Euclidean Distance

19 0.30545905 102 nips-2011-Generalised Coupled Tensor Factorisation

20 0.30541232 219 nips-2011-Predicting response time and error rates in visual search