nips nips2011 nips2011-7 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. [sent-4, score-0.767]
2 We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. [sent-6, score-1.289]
3 We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6. [sent-7, score-0.554]
4 We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. [sent-9, score-0.7]
5 Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. [sent-13, score-0.908]
6 Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89. [sent-14, score-0.306]
7 9% of the time when top ranked lists with at most four nonproductive reactions are considered. [sent-16, score-0.59]
8 Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle. [sent-18, score-0.218]
9 1 Introduction Determining the major products of chemical reactions given the input reactants and conditions is a fundamental problem in organic chemistry. [sent-19, score-0.918]
10 This leads to modeling reactions as minimum energy paths between stable atom configurations on a high-dimensional potential energy surface, where the path through the lowest energy transition state, i. [sent-27, score-0.678]
11 [H:3][Br:4]>>[Br:4][C:1][C:2][H:3] Figure 1: Overall transformation of an alkene (hydrocarbon with double bond) with hydrobromic acid (HBr) and corresponding mechanistic reactions. [sent-37, score-0.29]
12 (b) shows the two mechanistic reactions composing the overall transformation as arrowpushing diagrams[15, 16]. [sent-43, score-0.81]
13 Somewhere between low-level QM treatment and abstract graph-based overall transformations, one can consider reactions at the mechanistic level. [sent-47, score-0.777]
14 A mechanistic, or elementary, reaction is a concerted electron movement through a single transition state[15, 16]. [sent-48, score-0.699]
15 These mechanistic reactions can be composed to yield overall transformations. [sent-49, score-0.8]
16 For example, Figure 1 shows the overall transformation of an alkene interacting with hydrobromic acid to yield an alkyl halide, along with the two elementary reactions which compose the transformation. [sent-50, score-0.754]
17 A mechanistic reaction is described as an idealized molecular orbital (MO) interaction between an electron source (donor) MO and an electron sink (acceptor) MO. [sent-51, score-1.34]
18 In general, potential electron sources are composed of lone pairs of electrons and bonds, and potential electron sinks are composed of empty atomic orbitals and bonds. [sent-53, score-0.432]
19 Bonds can act as either a source or a sink depending on the context. [sent-54, score-0.28]
20 Note that by considering all possible pairings of source and sink MOs, this representation allows the exhaustive enumeration of all potential mechanistic reactions over an arbitrary set of molecules. [sent-57, score-1.044]
21 Here, the elementary reactions represent “productive” mechanistic steps, i. [sent-59, score-0.794]
22 those reactions which lead to the overall major products. [sent-61, score-0.583]
23 Thus, elementary reactions which are not the most kinetically favorable, but which eventually lead to the overall thermodynamic transformation product may be considered “productive”. [sent-62, score-0.691]
24 While mechanistic reaction representations are approximations quite far from the Schr¨ dinger equao tion, we expect them to be closer to the underlying reality and therefore more useful than overall transformations. [sent-64, score-0.832]
25 Furthermore, we expect them also to be easier to predict than overall transformations due to their more elementary nature and mechanistic interpretation. [sent-65, score-0.313]
26 In combination, these arguments suggest that working with mechanistic steps may facilitate the application of statistical machine learning approaches, and take advantage of their capability to generalize. [sent-66, score-0.194]
27 Thus, in this work, reactions are modeled as mechanisms, and for the remainder of the paper, we consider the term “reaction” to denote a single elementary reaction. [sent-67, score-0.6]
28 Furthermore, we consider the problem of reaction prediction to be precisely that of identifying the “productive” reactions over a given set of reactants under particular conditions. [sent-68, score-1.249]
29 There has been very little work on machine learning approaches to reaction prediction. [sent-69, score-0.546]
30 The sole example is a paper from 1990 on inductively extracting overall transformation patterns from reaction databases[12], a method which was never actually incorporated into a full reaction prediction system. [sent-70, score-1.186]
31 Given improvements in both computing power and machine learning methods over the past 20 years, one could imagine a machine learning system that mines reaction information to learn the grammar of chemistry, e. [sent-72, score-0.583]
32 , Reaxys[20] or SPRESI[21], the reactions in these databases are mostly unbalanced, not atom-mapped, and lack mechanistic detail[22]. [sent-80, score-0.762]
33 As a result, and to the best of our knowledge, effective machine learning approaches to reaction prediction still need to be developed. [sent-82, score-0.571]
34 2 A new approach The limitations of previous work motivate a new, fresh approach to reaction prediction combining machine learning with mechanistic representations. [sent-84, score-0.765]
35 The key idea is to first enumerate all potential source and sink MOs, and thus all possible reactions by their pairing, and then use classification and ranking techniques to identify productive reactions. [sent-85, score-1.246]
36 By using very general rules to enumerate possible reactions, the approach is not restricted to manually curated reaction patterns. [sent-87, score-0.546]
37 By detailing individual reactions at the mechanistic level, the system may be able to statistically learn efficient predictive models based on physicochemical attributes rather than abstract overall transformations. [sent-88, score-0.847]
38 And by ranking possible reactions instead of making binary decisions, the system may provide results amenable to flexible interpretation. [sent-89, score-0.707]
39 3 2 The data challenge A mechanistically defined dataset of reactions to use with the proposed approach does not currently exist. [sent-92, score-0.585]
40 The validation suite is a manually composed set of reactants, reagents, and products covering a complete undergraduate organic chemistry curriculum. [sent-94, score-0.224]
41 Entering a set of reactants and a reagent model into Reaction Explorer yields the complete sequence of mechanistic steps leading to the final products, where all reactions in this sequence share the conditions encoded by the corresponding reagent model. [sent-95, score-0.938]
42 Each one of these mechanistic steps is considered to be a distinct productive elementary reaction. [sent-96, score-0.52]
43 For a given set of reactants and conditions, which we call a (r, c) query tuple, the Reaction Explorer system labels a small set of reactions productive, while all other reactions enumerated by pairing source and sink MOs over the reactants are considered non-productive. [sent-97, score-1.673]
44 An (a, c) tuple has label srcreact = 1 if it is the main atom of a source MO in a productive reaction over any corresponding (r, c) query, and has label srcreact = 0 otherwise. [sent-99, score-1.093]
45 The label sinkreact is defined similarly using sink MOs. [sent-100, score-0.225]
46 Note that any mechanistic interaction with the solvent or reagent is explicitly modeled, e. [sent-104, score-0.249]
47 As an initial validation of the method, we consider general ionic reactions from the Reaction Explorer validation suite involving C, H, N, O, Li, Mg, and the halides. [sent-107, score-0.571]
48 Extensions to include stereoselective, pericyclic, and radical reactions are discussed in Section 5. [sent-108, score-0.569]
49 14 million reactions composed of 84,825 source and 74,725 sink MOs from 2,752 distinct reactants and reaction conditions, i. [sent-110, score-1.527]
50 There are 22,894 atom symmetry classes, which when paired with reaction condition yields 29,104 (a, c) tuples. [sent-115, score-0.654]
51 3 The combinatorial complexity challenge In the dataset, the average molecule has 44 source MOs and 50 sink MOs. [sent-121, score-0.328]
52 For this average molecule, considering only intermolecular reactions with a second copy of the same molecule gives 44 × 50 = 2200 potential elementary reactions. [sent-122, score-0.671]
53 Thus, the number of possible reactions is very large, motivating identifying productive reactions given a (r, c) query in two stages. [sent-123, score-1.367]
54 In the first stage, we train filters using classification techniques on the source and sink reactivity labels. [sent-124, score-0.335]
55 The idea is to train highly sensitive classifiers which reduce the breadth of possible reactions without erroneously filtering productive reactions. [sent-125, score-0.82]
56 Then only those source and sink MOs where the main atom passes the respective atom level filter are considered when enumerating reactions to consider in the second ranking stage for predicting reaction productivity. [sent-126, score-1.712]
57 Here, we train two separate classifiers to predict the source and sink atom level reactivity labels, each using the same feature descriptions and machine learning implementations. [sent-127, score-0.443]
58 There are 14 real-valued physicochemical features such as the reaction conditions, the molecular weight of the molecule, and the charge at and around the atom. [sent-131, score-0.625]
59 In addition to standard molecular graph features, we also include similar topological features over a restricted alphabet pharmacophore point graph, where pharmacophore point graph definitions are adapted from H¨ hnke, et al[23]. [sent-134, score-0.228]
60 3 Results We report the true negative rate (TNR) and the false negative rate (FNR) for both the source and sink classification problems as well as for the the actual reaction filtering problem, as shown in Table 1. [sent-149, score-0.826]
61 Source reactive and sink reactive rows show results on the respective classification problems. [sent-157, score-0.327]
62 The reaction row shows results of using the two atom classifiers for an initial reaction filtering. [sent-158, score-1.2]
63 2 4 The ranking challenge We pose the task of identifying the productive reactions as a ranking problem. [sent-177, score-1.066]
64 1 Feature representation Each reaction is composed of a source and sink MO. [sent-183, score-0.849]
65 The reaction feature vector is the concatenation of the corresponding source and sink atom level feature vectors with some modifications. [sent-184, score-0.934]
66 And finally, 450 features describing the forward and inverse reactions are calculated, including atoms and bonds involved and implied transition state geometry. [sent-187, score-0.58]
67 This leads to a total of 1,677 reaction features. [sent-188, score-0.546]
68 The final output will approach 1 if the (source, sink) A pair is predicted to be relatively more productive than the (source, sink) B pair, and 0 otherwise. [sent-206, score-0.273]
69 Each machine in the ensemble is trained with all the productive reactions (from the training set) and a random partition of the non-productive reactions (from the training set). [sent-212, score-1.367]
70 Final ranking on the test set is determined by either simple majority vote or by ranking the average scores from the linear output node of the inner shared-weight network for each machine in the ensemble. [sent-213, score-0.246]
71 NDCG@i is a common information retrieval metric[26] that sums the overall usefulness (or gain) of productive reactions in a given list of the top-i results, where individual gain decays exponentially with lower position. [sent-217, score-0.876]
72 For example, NDCG@1 is the fraction of (r, c) queries in which the top ranked reaction is a productive reaction. [sent-219, score-0.896]
73 Percent Within-n is simply how many (r, c) queries have at most n non-productive reactions in the smallest ranked list containing all productive reactions. [sent-220, score-0.917]
74 For example, Percent Within-0 measures the percent of (r, c) 6 queries with perfect rank, and Percent Within-4 measures how often all productive reactions are recovered with at most 4 mis-ranked non-productive reactions. [sent-221, score-0.904]
75 Note that the NDCG@1 and Percent Within-0 will differ because roughly 10% of (r, c) queries have more than one productive reaction. [sent-222, score-0.307]
76 The non-productive MO interactions vastly outnumber the productive interactions. [sent-223, score-0.273]
77 5% of the queries, the top ranked reaction is productive. [sent-226, score-0.589]
78 9% of queries recover all productive reactions by considering lists with at most four non-productive reactions. [sent-229, score-0.854]
79 4 Chemical applications The strong performance of the ranking system is exhibited by its ability to make accurate multi-step reaction predictions. [sent-255, score-0.706]
80 An example, shown in the first row of Table 3, is an intramolecular Claisen condensation reaction with conditions (room temperature, polar aprotic solvent) requiring three elementary steps. [sent-256, score-0.687]
81 The ranking method correctly predicts the given reaction as the highest ranked reaction at each step. [sent-257, score-1.258]
82 The first row shows an example of full multi-step reaction prediction by the ranking system, a three step intramolecular Claisen condensation (room temp. [sent-259, score-0.716]
83 At each stage, the reaction shown is the top ranked when all possible reactions are considered by the two stage machine learning system. [sent-261, score-1.136]
84 These reactions lead to the formation of a seven homo-cycle (7 carbons) on the left and seven hetero-cycle (6 carbons, 1 oxygen) on the right. [sent-263, score-0.595]
85 MultiStep Reaction Prediction Generality Reasonable Errors A generalizable system should be able to make reasonable predictions about reactants and reaction types with which it has only had implicit, rather than explicit, experience. [sent-265, score-0.747]
86 Reaction Explorer, as a 7 rule-based expert system without explicit rules about larger ring forming reactions, does not make any predictions about seven and eight atom cyclizations. [sent-266, score-0.241]
87 In reality though, larger ring forming reactions are possible. [sent-267, score-0.603]
88 The second row of Table 3 shows the top two ranked reactions over a set of bromo-hept-1-en-2-olate reactants, leading to seven-member ring formation. [sent-268, score-0.612]
89 The ranking model, without ever being trained with seven or eight-member ring forming reactions, returns the enolate attack as the most favorable, but also returns the lone pair nucleophilic substitution as the second most favorable. [sent-269, score-0.233]
90 For example, the third row of Table 3 shows two reactions involving an oxonium compound and a bromide anion. [sent-275, score-0.58]
91 Our ranking models return these two reactions as the highest, ranking the deprotonation slightly ahead of the substitution. [sent-276, score-0.837]
92 This is considered a Within-1 ranking because the Reaction Explorer system labels only the substitution reaction as productive. [sent-277, score-0.706]
93 However, the immediate precursor reaction in the sequence of Reaction Explorer mechanisms leading to these reactants is the inverse of the deprotonation reaction, i. [sent-278, score-0.721]
94 Hydrogen transfer reactions like this are reversible, and thus the deprotonation is likely the kinetically favored mechanism, i. [sent-281, score-0.613]
95 5 Conclusion Being able to predict the outcome of chemical reactions is a fundamental scientific problem. [sent-286, score-0.658]
96 The ultimate goal of a reaction prediction system is to recapitulate and eventually surpass the ability of human chemists. [sent-287, score-0.608]
97 In this work, we take a significant step in this direction, showing how to formulate reaction prediction as a machine learning problem and building an accurate implementation for a large and key subset of organic chemistry. [sent-288, score-0.68]
98 There are a number of immediate applications of our system, including validating retro-synthetic suggestions, generating virtual libraries of molecules, and mechanistically annotating existing reaction databases. [sent-289, score-0.584]
99 No electron left behind: a rule-based expert system to predict chemical reactions and reaction mechanisms. [sent-414, score-1.406]
100 Automated derivation of reaction rules for the EROS 6. [sent-423, score-0.546]
wordName wordTfidf (topN-words)
[('reactions', 0.547), ('reaction', 0.546), ('productive', 0.273), ('sink', 0.203), ('mechanistic', 0.194), ('reactants', 0.131), ('ranking', 0.123), ('electron', 0.115), ('chemical', 0.111), ('explorer', 0.109), ('organic', 0.109), ('atom', 0.108), ('cv', 0.097), ('ndcg', 0.087), ('source', 0.077), ('mos', 0.076), ('reactive', 0.062), ('mo', 0.058), ('electrons', 0.055), ('pharmacophore', 0.055), ('reactivity', 0.055), ('elementary', 0.053), ('expert', 0.05), ('percent', 0.05), ('bond', 0.048), ('chemistry', 0.048), ('molecule', 0.048), ('molecular', 0.046), ('deprotonation', 0.044), ('orbital', 0.044), ('solvation', 0.044), ('tnr', 0.044), ('ranked', 0.043), ('concerted', 0.038), ('mechanistically', 0.038), ('system', 0.037), ('overall', 0.036), ('queries', 0.034), ('reality', 0.034), ('transformation', 0.033), ('polar', 0.033), ('aprotic', 0.033), ('bonds', 0.033), ('bromide', 0.033), ('carbons', 0.033), ('generalizable', 0.033), ('lone', 0.033), ('physicochemical', 0.033), ('reagent', 0.033), ('srcreact', 0.033), ('attack', 0.031), ('transformations', 0.03), ('drug', 0.029), ('curation', 0.026), ('graph', 0.026), ('molecules', 0.025), ('string', 0.025), ('prediction', 0.025), ('internal', 0.024), ('suite', 0.024), ('seven', 0.024), ('composed', 0.023), ('potential', 0.023), ('tuple', 0.023), ('sd', 0.023), ('sigmoidal', 0.023), ('ring', 0.022), ('acs', 0.022), ('alkene', 0.022), ('alkyl', 0.022), ('anion', 0.022), ('carbocation', 0.022), ('chemistries', 0.022), ('chemoinformatics', 0.022), ('claisen', 0.022), ('dinger', 0.022), ('fnr', 0.022), ('gasteiger', 0.022), ('hnke', 0.022), ('hydrobromic', 0.022), ('intelligible', 0.022), ('intramolecular', 0.022), ('kinetically', 0.022), ('nudged', 0.022), ('orbitals', 0.022), ('pericyclic', 0.022), ('productivity', 0.022), ('radical', 0.022), ('sinkreact', 0.022), ('solvent', 0.022), ('stereoselective', 0.022), ('untapped', 0.022), ('site', 0.022), ('databases', 0.021), ('products', 0.02), ('list', 0.02), ('topological', 0.02), ('schr', 0.019), ('acid', 0.019), ('baldi', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
2 0.092246778 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
3 0.063825786 19 nips-2011-Active Classification based on Value of Classifier
Author: Tianshi Gao, Daphne Koller
Abstract: Modern classification tasks usually involve many class labels and can be informed by a broad range of features. Many of these tasks are tackled by constructing a set of classifiers, which are then applied at test time and then pieced together in a fixed procedure determined in advance or at training time. We present an active classification process at the test time, where each classifier in a large ensemble is viewed as a potential observation that might inform our classification process. Observations are then selected dynamically based on previous observations, using a value-theoretic computation that balances an estimate of the expected classification gain from each observation as well as its computational cost. The expected classification gain is computed using a probabilistic model that uses the outcome from previous observations. This active classification process is applied at test time for each individual test instance, resulting in an efficient instance-specific decision path. We demonstrate the benefit of the active scheme on various real-world datasets, and show that it can achieve comparable or even higher classification accuracy at a fraction of the computational costs of traditional methods.
4 0.057905767 75 nips-2011-Dynamical segmentation of single trials from population neural data
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
5 0.047355451 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
Author: Guy Broeck
Abstract: Probabilistic logics are receiving a lot of attention today because of their expressive power for knowledge representation and learning. However, this expressivity is detrimental to the tractability of inference, when done at the propositional level. To solve this problem, various lifted inference algorithms have been proposed that reason at the first-order level, about groups of objects as a whole. Despite the existence of various lifted inference approaches, there are currently no completeness results about these algorithms. The key contribution of this paper is that we introduce a formal definition of lifted inference that allows us to reason about the completeness of lifted inference algorithms relative to a particular class of probabilistic models. We then show how to obtain a completeness result using a first-order knowledge compilation approach for theories of formulae containing up to two logical variables. 1 Introduction and related work Probabilistic logic models build on first-order logic to capture relational structure and on graphical models to represent and reason about uncertainty [1, 2]. Due to their expressivity, these models can concisely represent large problems with many interacting random variables. While the semantics of these logics is often defined through grounding the models [3], performing inference at the propositional level is – as for first-order logic – inefficient. This has motivated the quest for lifted inference methods that exploit the structure of probabilistic logic models for efficient inference, by reasoning about groups of objects as a whole and avoiding repeated computations. The first approaches to exact lifted inference have upgraded the variable elimination algorithm to the first-order level [4, 5, 6]. More recent work is based on methods from logical inference [7, 8, 9, 10], such as knowledge compilation. While these approaches often yield dramatic improvements in runtime over propositional inference methods on specific problems, it is still largely unclear for which classes of models these lifted inference operators will be useful and for which ones they will eventually have to resort to propositional inference. One notable exception in this regard is lifted belief propagation [11], which performs exact lifted inference on any model whose factor graph representation is a tree. A first contribution of this paper is that we introduce a notion of domain lifted inference, which formally defines what lifting means, and which can be used to characterize the classes of probabilistic models to which lifted inference applies. Domain lifted inference essentially requires that probabilistic inference runs in polynomial time in the domain size of the logical variables appearing in the model. As a second contribution we show that the class of models expressed as 2-WFOMC formulae (weighted first-order model counting with up to 2 logical variables per formula) can be domain lifted using an extended first-order knowledge compilation approach [10]. The resulting approach allows for lifted inference even in the presence of (anti-) symmetric or total relations in a theory. These are extremely common and useful concepts that cannot be lifted by any of the existing first-order knowledge compilation inference rules. 1 2 Background We will use standard concepts of function-free first-order logic (FOL). An atom p(t1 , . . . , tn ) consists of a predicate p/n of arity n followed by n arguments, which are either constants or logical variables. An atom is ground if it does not contain any variables. A literal is an atom a or its negation ¬a. A clause is a disjunction l1 ∨ ... ∨ lk of literals. If k = 1, it is a unit clause. An expression is an atom, literal or clause. The pred(a) function maps an atom to its predicate and the vars(e) function maps an expression to its logical variables. A theory in conjunctive normal form (CNF) is a conjunction of clauses. We often represent theories by their set of clauses and clauses by their set of literals. Furthermore, we will assume that all logical variables are universally quantified. In addition, we associate a set of constraints with each clause or atom, either of the form X = t, where X is a logical variable and t is a constant or variable, or of the form X ∈ D, where D is a domain, or the negation of these constraints. These define a finite domain for each logical variable. Abusing notation, we will use constraints of the form X = t to denote a substitution of X by t. The function atom(e) maps an expression e to its atoms, now associating the constraints on e with each atom individually. To add the constraint c to an expression e, we use the notation e ∧ c. Two atoms unify if there is a substitution which makes them identical and if the conjunction of the constraints on both atoms with the substitution is satisfiable. Two expressions e1 and e2 are independent, written e1 ⊥ e2 , if no atom a1 ∈ atom(e1 ) unifies with an atom a2 ∈ atom(e2 ). ⊥ We adopt the Weighted First-Order Model Counting (WFOMC) [10] formalism to represent probabilistic logic models, building on the notion of a Herbrand interpretation. Herbrand interpretations are subsets of the Herbrand base HB (T ), which consists of all ground atoms that can be constructed with the available predicates and constant symbols in T . The atoms in a Herbrand interpretation are assumed to be true. All other atoms in HB (T ) are assumed to be false. An interpretation I satisfies a theory T , written as I |= T , if it satisfies all the clauses c ∈ T . The WFOMC problem is defined on a weighted logic theory T , which is a logic theory augmented with a positive weight function w and a negative weight function w, which assign a weight to each predicate. The WFOMC problem involves computing wmc(T, w, w) = w(pred(a)) I|=T a∈I 3 3.1 w(pred(a)). (1) a∈HB(T )\I First-order knowledge compilation for lifted probabilistic inference Lifted probabilistic inference A first-order probabilistic model defines a probability distribution P over the set of Herbrand interpretations H. Probabilistic inference in these models is concerned with computing the posterior probability P(q|e) of query q given evidence e, where q and e are logical expressions in general: P(q|e) = h∈H,h|=q∧e P(h) h∈H,h|=e P(h) (2) We propose one notion of lifted inference for first-order probabilistic models, defined in terms of the computational complexity of inference w.r.t. the domains of logical variables. It is clear that other notions of lifted inference are conceivable, especially in the case of approximate inference. Definition 1 (Domain Lifted Probabilistic Inference). A probabilistic inference procedure is domain lifted for a model m, query q and evidence e iff the inference procedure runs in polynomial time in |D1 |, . . . , |Dk | with Di the domain of the logical variable vi ∈ vars(m, q, e). Domain lifted inference does not prohibit the algorithm to be exponential in the size of the vocabulary, that is, the number of predicates, arguments and constants, of the probabilistic model, query and evidence. In fact, the definition allows inference to be exponential in the number of constants which occur in arguments of atoms in the theory, query or evidence, as long as it is polynomial in the cardinality of the logical variable domains. This definition of lifted inference stresses the ability to efficiently deal with the domains of the logical variables that arise, regardless of their size, and formalizes what seems to be generally accepted in the lifted inference literature. 2 A class of probabilistic models is a set of probabilistic models expressed in a particular formalism. As examples, consider Markov logic networks (MLN) [12] or parfactors [4], or the weighted FOL theories for WFOMC that we introduced above, when the weights are normalized. Definition 2 (Completeness). Restricting queries to atoms and evidence to a conjunction of literals, a procedure that is domain lifted for all probabilistic models m in a class of models M and for all queries q and evidence e, is called complete for M . 3.2 First-order knowledge compilation First-order knowledge compilation is an approach to lifted probabilistic inference consisting of the following three steps (see Van den Broeck et al. [10] for details): 1. Convert the probabilistic logical model to a weighted CNF. Converting MLNs or parfactors requires adding new atoms to the theory that represent the (truth) value of each factor or formula. set-disjunction 2 friends(X, Y ) ∧ smokes(X) ⇒ smokes(Y ) Smokers ⊆ People decomposable conjunction unit clause leaf (a) MLN Model ∧ smokes(X), X ∈ Smokers smokes(Y ) ∨ ¬ smokes(X) ∨¬ friends(X, Y ) ∨ ¬ f(X, Y ) friends(X, Y ) ∨ f(X, Y ) smokes(X) ∨ f(X, Y ) ¬ smokes(Y ) ∨ f(X, Y ). ∧ f(X, Y ), Y ∈ Smokers ∧ ¬ smokes(Y ), Y ∈ Smokers / ∧ f(X, Y ), X ∈ Smokers, Y ∈ Smokers / / x ∈ Smokers (b) CNF Theory deterministic disjunction Predicate friends smokes f w 1 1 e2 w 1 1 1 y ∈ Smokers / ∨ set-conjunction ∧ f(x, y) (c) Weight Functions ¬ friends(x, y) ∧ friends(x, y) ¬ f(x, y) (d) First-Order d-DNNF Circuit Figure 1: Friends-smokers example (taken from [10]) Example 1. The MLN in Figure 1a assigns a weight to a formula in FOL. Figure 1b represents the same model as a weighted CNF, introducing a new atom f(X, Y ) to encode the truth value of the MLN formula. The probabilistic information is captured by the weight functions in Figure 1c. 2. Compile the logical theory into a First-Order d-DNNF (FO d-DNNF) circuit. Figure 1d shows an example of such a circuit. Leaves represent unit clauses. Inner nodes represent the disjunction or conjunction of their children l and r, but with the constraint that disjunctions must be deterministic (l ∧ r is unsatisfiable) and conjunctions must be decomposable (l ⊥ r). ⊥ 3. Perform WFOMC inference to compute posterior probabilities. In a FO d-DNNF circuit, WFOMC is polynomial in the size of the circuit and the cardinality of the domains. To compile the CNF theory into a FO d-DNNF circuit, Van den Broeck et al. [10] propose a set of compilation rules, which we will refer to as CR 1 . We will now briefly describe these rules. Unit Propagation introduces a decomposable conjunction when the theory contains a unit clause. Independence creates a decomposable conjunction when the theory contains independent subtheories. Shannon decomposition applies when the theory contains ground atoms and introduces a deterministic disjunction between two modified theories: one where the ground atom is true, and one where it is false. Shattering splits clauses in the theory until all pairs of atoms represent either a disjoint or identical set of ground atoms. Example 2. In Figure 2a, the first two clauses are made independent from the friends(X, X) clause and split off in a decomposable conjunction by unit propagation. The unit clause becomes a leaf of the FO d-DNNF circuit, while the other operand requires further compilation. 3 dislikes(X, Y ) ∨ friends(X, Y ) fun(X) ∨ ¬ friends(X, Y ) friends(X, Y ) ∨ dislikes(X, Y ) ¬ friends(X, Y ) ∨ likes(X, Y ) friends(X, X) fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) ∧ FunPeople ⊆ People x ∈ People friends(X, X) friends(X, Y ) ∨ dislikes(X, Y ), X = Y ¬ friends(X, Y ) ∨ likes(X, Y ), X = Y likes(X, X) dislikes(x, Y ) ∨ friends(x, Y ) fun(x) ∨ ¬ friends(x, Y ) fun(X), X ∈ FunPeople ¬ fun(X), X ∈ FunPeople / fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) (a) Unit propagation of friends(X, X) (b) Independent partial grounding (c) Atom counting of fun(X) Figure 2: Examples of compilation rules. Circles are FO d-DNNF inner nodes. White rectangles show theories before and after applying the rule. All variable domains are People. (taken from [10]) Independent Partial Grounding creates a decomposable conjunction over a set of child circuits, which are identical up to the value of a grounding constant. Since they are structurally identical, only one child circuit is actually compiled. Atom Counting applies when the theory contains an atom with a single logical variable X ∈ D. It explicitly represents the domain D ⊆ D of X for which the atom is true. It compiles the theory into a deterministic disjunction between all possible such domains. Again, these child circuits are identical up to the value of D and only one is compiled. Example 3. The theory in Figure 2b is compiled into a decomposable set-conjunction of theories that are independent and identical up to the value of the x constant. The theory in Figure 2c contains an atom with one logical variable: fun(X). Atom counting compiles it into a deterministic setdisjunction over theories that differ in FunPeople, which is the domain of X for which fun(X) is true. Subsequent steps of unit propagation remove the fun(X) atoms from the theory entirely. 3.3 Completeness We will now characterize those theories where the CR 1 compilation rules cannot be used, and where the inference procedure has to resort to grounding out the theory to propositional logic. For these, first-order knowledge compilation using CR 1 is not yet domain lifted. When a logical theory contains symmetric, anti-symmetric or total relations, such as friends(X, Y ) ⇒ friends(Y, X), parent(X, Y ) ⇒ ¬ parent(Y, X), X = Y, ≤ (X, Y) ∨ ≤ (Y, X), or more general formulas, such as enemies(X, Y ) ⇒ ¬ friend(X, Y ) ∧ ¬ friend(Y, X), (3) (4) (5) (6) none of the CR 1 rules apply. Intuitively, the underlying problem is the presence of either: • Two unifying (not independent) atoms in the same clause which contain the same logical variable in different positions of the argument list. Examples include (the CNF of) Formulas 3, 4 and 5, where the X and Y variable are bound by unifying two atoms from the same clause. • Two logical variables that bind when unifying one pair of atoms but appear in different positions of the argument list of two other unifying atoms. Examples include Formula 6, which in CNF is ¬ friend(X, Y ) ∨ ¬ enemies(X, Y ) ¬ friend(Y, X) ∨ ¬ enemies(X, Y ) Here, unifying the enemies(X, Y ) atoms binds the X variables from both clauses, which appear in different positions of the argument lists of the unifying atoms friend(X, Y ) and friend(Y, X). Both of these properties preclude the use of CR 1 rules. Also in the context of other model classes, such as MLNs, probabilistic versions of the above formulas cannot be processed by CR 1 rules. 4 Even though first-order knowledge compilation with CR 1 rules does not have a clear completeness result, we can show some properties of theories to which none of the compilation rules apply. First, we need to distinguish between the arity of an atom and its dimension. A predicate with arity two might have atoms with dimension one, when one of the arguments is ground or both are identical. Definition 3 (Dimension of an Expression). The dimension of an expression e is the number of logical variables it contains: dim(e) = | vars(e)|. Lemma 1 (CR 1 Postconditions). The CR 1 rules remove all atoms from the theory T which have zero or one logical variable arguments, such that afterwards ∀a ∈ atom(T ) : dim(a) > 1. When no CR 1 rule applies, the theory is shattered and contains no independent subtheories. Proof. Ground atoms are removed by the Shannon decomposition operator followed by unit propagation. Atoms with a single logical variable (including unary relations) are removed by the atom counting operator followed by unit propagation. If T contains independent subtheories, the independence operator can be applied. Shattering is always applied when T is not yet shattered. 4 Extending first-order knowledge compilation In this section we introduce a new operator which does apply to the theories from Section 3.3. 4.1 Logical variable properties To formally define the operator we propose, and prove its correctness, we first introduce some mathematical concepts related to the logical variables in a theory (partly after Jha et al. [8]). Definition 4 (Binding Variables). Two logical variables X, Y are directly binding b(X, Y ) if they are bound by unifying a pair of atoms in the theory. The binding relationship b+ (X, Y ) is the transitive closure of the directly binding relation b(X, Y ). Example 4. In the theory ¬ p(W, X) ∨ ¬ q(X) r(Y ) ∨ ¬ q(Y ) ¬ r(Z) ∨ s(Z) the variable pairs (X, Y ) and (Y, Z) are directly binding. The variables X, Y and Z are binding. Variable W does not bind to any other variable. Note that the binding relationship b+ (X, Y ) is an equivalence relation that defines two equivalence classes: {X, Y, Z} and {W }. Lemma 2 (Binding Domains). After shattering, binding logical variables have identical domains. Proof. During shattering (see Section 3.2), when two atoms unify, binding two variables with partially overlapping domains, the atoms’ clauses are split up into clauses where the domain of the variables is identical, and clauses where the domains are disjoint and the atoms no longer unify. Definition 5 (Root Binding Class). A root variable is a variable that appears in all the atoms in its clause. A root binding class is an equivalence class of binding variables where all variables are root. Example 5. In the theory of Example 4, {X, Y, Z} is a root binding class and {W } is not. 4.2 Domain recursion We will now introduce the new domain recursion operator, starting with its preconditions. Definition 6. A theory allows for domain recursion when (i) the theory is shattered, (ii) the theory contains no independent subtheories and (iii) there exists a root binding class. From now on, we will denote with C the set of clauses of the theory at hand and with B a root binding class guaranteed to exist if C allows for domain recursion. Lemma 2 states that all variables in B have identical domains. We will denote the domain of these variables with D. The intuition behind the domain recursion operator is that it modifies D by making one element explicit: D = D ∪ {xD } with xD ∈ D . This explicit domain element is introduced by the S PLIT D / function, which splits clauses w.r.t. the new subdomain D and element xD . 5 Definition 7 (S PLIT D). For a clause c and given set of variables Vc ⊆ vars(c) with domain D, let S PLIT D(c, Vc ) = c, if Vc = ∅ S PLIT D(c1 , Vc \ {V }) ∪ S PLIT D(c2 , Vc \ {V }), if Vc = ∅ (7) where c1 = c ∧ (V = xD ) and c2 = c ∧ (V = xD ) ∧ (V ∈ D ) for some V ∈ Vc . For a set of clauses C and set of variables V with domain D: S PLIT D(C, V) = c∈C S PLIT D(c, V ∩ vars(c)). The domain recursion operator creates three sets of clauses: S PLIT D(C, B) = Cx ∪ Cv ∪ Cr , with Cx = {c ∧ (V = xD )|c ∈ C}, (8) V ∈B∩vars(c) Cv = {c ∧ (V = xD ) ∧ (V ∈ D )|c ∈ C}, (9) V ∈B∩vars(c) Cr = S PLIT D(C, B) \ Cx \ Cv . (10) Proposition 3. The conjunction of the domain recursion sets is equivalent to the original theory: c∈Cr c . c∈Cv c ∧ c∈C c ≡ c∈S PLIT D(C,B) c and therefore c∈C c ≡ c∈Cx c ∧ We will now show that these sets are independent and that their conjunction is decomposable. Theorem 4. The theories Cx , Cv and Cr are independent: Cx ⊥ Cv , Cx ⊥ Cr and Cv ⊥ Cr . ⊥ ⊥ ⊥ The proof of Theorem 4 relies on the following Lemma. Lemma 5. If the theory allows for domain recursion, all clauses and atoms contain the same number of variables from B: ∃n, ∀c ∈ C, ∀a ∈ atom(C) : | vars(c) ∩ B | = | vars(a) ∩ B | = n. c Proof. Denote with Cn the clauses in C that contain n logical variables from B and with Cn its compliment in C. If C is nonempty, there is a n > 0 for which Cn is nonempty. Then every atom in Cn contains exactly n variables from B (Definition 5). Since the theory contains no independent c c subtheories, there must be an atom a in Cn which unifies with an atom ac in Cn , or Cn is empty. After shattering, all unifications bind one variable from a to a single variable from ac . Because a contains exactly n variables from B, ac must also contain exactly n (Definition 4), and because B is c a root binding class, the clause of ac also contains exactly n, which contradicts the definition of Cn . c Therefore, Cn is empty, and because the variables in B are root, they also appear in all atoms. Proof of Theorem 4. From Lemma 5, all atoms in C contain the same number of variables from B. In Cx , these variables are all constrained to be equal to xD , while in Cv and Cr at least one variable is constrained to be different from xD . An attempt to unify an atom from Cx with an atom from Cv or Cr therefore creates an unsatisfiable set of constraints. Similarly, atoms from Cv and Cr cannot be unified. Finally, we extend the FO d-DNNF language proposed in Van den Broeck et al. [10] with a new node, the recursive decomposable conjunction ∧ r , and define the domain recursion compilation rule. Definition 8 ( ∧ r ). The FO d-DNNF node ∧ r (nx , nr , D, D , V) represents a decomposable conjunction between the d-DNNF nodes nx , nr and a d-DNNF node isomorphic to the ∧ r node itself. In particular, the isomorphic operand is identical to the node itself, except for the size of the domain of the variables in V, which becomes one smaller, going from D to D in the isomorphic operand. We have shown that the conjunction between sets Cx , Cv and Cr is decomposable (Theorem 4) and logically equivalent to the original theory (Proposition 3). Furthermore, Cv is identical to C, up to the constraints on the domain of the variables in B. This leads us to the following definition of domain recursion. Definition 9 (Domain Recursion). The domain recursion compilation rule compiles C into ∧ r (nx , nr , D, D , B), where nx , nr are the compiled circuits for Cx , Cr . The third set Cv is represented by the recursion on D, according to Definition 8. 6 nv Cr ∧r ¬ friends(x, X) ∨ friends(X, x), X = x ¬ friends(X, x) ∨ friends(x, X), X = x P erson ← P erson \ {x} nr nx ∨ Cx ¬ friends(x, x) ∨ friends(x, x) x ∈P erson x =x ¬ friends(x, x) friends(x, x) ∨ ∧ ¬ friends(x, x ) ¬ friends(x , x) ∧ friends(x, x ) friends(x , x) Figure 3: Circuit for the symmetric relation in Equation 3, rooted in a recursive conjunction. Example 6. Figure 3 shows the FO d-DNNF circuit for Equation 3. The theory is split up into three independent theories: Cr and Cx , shown in the Figure 3, and Cv = {¬ friends(X, Y ) ∨ friends(Y, X), X = x, Y = x}. The conjunction of these theories is equivalent to Equation 3. Theory Cv is identical to Equation 3, up to the inequality constraints on X and Y . Theorem 6. Given a function size, which maps domains to their size, the weighted first-order model count of a ∧ r (nx , nr , D, D , V) node is size(D) wmc( ∧ r (nx , nr , D, D , V), size) = wmc(nx , size)size(D) wmc(nr , size ∪{D → s}), s=0 (11) where size ∪{D → s} adds to the size function that the subdomain D has cardinality s. Proof. If C allows for domain recursion, due to Theorem 4, the weighted model count is wmc(C, size) = 1, if size(D) = 0 wmc(Cx ) · wmc(Cv , size ) · wmc(Cr , size ) if size(D) > 0 (12) where size = size ∪{D → size(D) − 1}. Theorem 7. The Independent Partial Grounding compilation rule is a special case of the domain recursion rule, where ∀c ∈ C : | vars(c) ∩ B | = 1 (and therefore Cr = ∅). 4.3 Completeness In this section, we introduce a class of models for which first-order knowledge compilation with domain recursion is complete. Definition 10 (k-WFOMC). The class of k-WFOMC consist of WFOMC theories with clauses that have up to k logical variables. A first completeness result is for 2-WFOMC, using the set of knowledge compilation rules CR 2 , which are the rules in CR 1 extended with domain recursion. Theorem 8 (Completeness for 2-WFOMC). First-order knowledge compilation using the CR 2 compilation rules is a complete domain lifted probabilistic inference algorithm for 2-WFOMC. Proof. From Lemma 1, after applying the CR 1 rules, the theory contains only atoms with dimension larger than or equal to two. From Definition 10, each clause has dimension smaller than or equal to two. Therefore, each logical variable in the theory is a root variable and according to Definition 5, every equivalence class of binding variables is a root binding class. Because of Lemma 1, the theory allows for domain recursion, which requires further compilation of two theories: Cx and Cr into nx and nr . Both have dimension smaller than 2 and can be lifted by CR 1 compilation rules. The properties of 2-WFOMC are a sufficient but not necessary condition for first-order knowledge compilation to be domain lifted. We can obtain a similar result for MLNs or parfactors by reducing them to a WFOMC problem. If an MLN contains only formulae with up to k logical variables, then its WFOMC representation will be in k-WFOMC. 7 This result for 2-WFOMC is not trivial. Van den Broeck et al. [10] showed in their experiments that counting first-order variable elimination (C-FOVE) [6] fails to lift the “Friends Smoker Drinker” problem, which is in 2-WFOMC. We will show in the next section that the CR 1 rules fail to lift the theory in Figure 4a, which is in 2-WFOMC. Note that there are also useful theories that are not in 2-WFOMC, such as those containing the transitive relation friends(X, Y ) ∧ friends(Y, Z) ⇒ friends(X, Z). 5 Empirical evaluation To complement the theoretical results of the previous section, we extended the WFOMC implementation1 with the domain recursion rule. We performed experiments with the theory in Figure 4a, which is a version of the friends and smokers model [11] extended with the symmetric relation of Equation 3. We evaluate the performance querying P(smokes(bob)) with increasing domain size, comparing our approach to the existing WFOMC implementation and its propositional counterpart, which first grounds the theory and then compiles it with the c2d compiler [13] to a propositional d-DNNF circuit. We did not compare to C-FOVE [6] because it cannot perform lifted inference on this model. 2 smokes(X) ∧ friends(X, Y ) ⇒ smokes(Y ) friends(X, Y ) ⇒ friends(Y, X). Runtime [s] Propositional inference quickly becomes intractable when there are more than 20 people. The lifted inference algorithms scale much better. The CR 1 rules can exploit some regularities in the model. For example, they eliminate all the smokes(X) atoms from the theory. They do, however, resort to grounding at a later stage of the compilation process. With the domain recursion rule, there is no need for grounding. This advantage is clear in the experiments, our approach having an almost constant inference time in this range of domains sizes. Note that the runtimes for c2d include compilation and evaluation of the circuit, whereas the WFOMC runtimes only represent evaluation of the FO d-DNNF. After all, propositional compilation depends on the domain size but first-order compilation does not. First-order compilation takes a constant two seconds for both rule sets. 10000 1000 100 10 1 0.1 0.01 c2d WFOMC - CR1 WFOMC - CR2 10 20 30 40 50 60 Number of People 70 80 (b) Evaluation Runtime (a) MLN Model Figure 4: Symmetric friends and smokers experiment, comparing propositional knowledge compilation (c2d) to WFOMC using compilation rules CR 1 and CR 2 (which includes domain recursion). 6 Conclusions We proposed a definition of complete domain lifted probabilistic inference w.r.t. classes of probabilistic logic models. This definition considers algorithms to be lifted if they are polynomial in the size of logical variable domains. Existing first-order knowledge compilation turns out not to admit an intuitive completeness result. Therefore, we generalized the existing Independent Partial Grounding compilation rule to the domain recursion rule. With this one extra rule, we showed that first-order knowledge compilation is complete for a significant class of probabilistic logic models, where the WFOMC representation has up to two logical variables per clause. Acknowledgments The author would like to thank Luc De Raedt, Jesse Davis and the anonymous reviewers for valuable feedback. This work was supported by the Research Foundation-Flanders (FWO-Vlaanderen). 1 http://dtai.cs.kuleuven.be/wfomc/ 8 References [1] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007. [2] Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors. Probabilistic inductive logic programming: theory and applications. Springer-Verlag, Berlin, Heidelberg, 2008. [3] Daan Fierens, Guy Van den Broeck, Ingo Thon, Bernd Gutmann, and Luc De Raedt. Inference in probabilistic logic programs using weighted CNF’s. In Proceedings of UAI, pages 256–265, 2011. [4] David Poole. First-order probabilistic inference. In Proceedings of IJCAI, pages 985–991, 2003. [5] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Proceedings of IJCAI, pages 1319–1325, 2005. [6] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted Probabilistic Inference with Counting Formulas. In Proceedings of AAAI, pages 1062–1068, 2008. [7] Vibhav Gogate and Pedro Domingos. Exploiting Logical Structure in Lifted Probabilistic Inference. In Proceedings of StarAI, 2010. [8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted Inference Seen from the Other Side: The Tractable Features. In Proceedings of NIPS, 2010. [9] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of UAI, pages 256–265, 2011. [10] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted Probabilistic Inference by First-Order Knowledge Compilation. In Proceedings of IJCAI, pages 2178–2185, 2011. [11] Parag Singla and Pedro Domingos. Lifted first-order belief propagation. In Proceedings of AAAI, pages 1094–1099, 2008. [12] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1): 107–136, 2006. [13] Adnan Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328–332, 2004. 9
6 0.038752589 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
7 0.033117011 291 nips-2011-Transfer from Multiple MDPs
8 0.031977706 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning
9 0.030935887 53 nips-2011-Co-Training for Domain Adaptation
10 0.030574227 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
11 0.029971175 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
12 0.026508909 219 nips-2011-Predicting response time and error rates in visual search
13 0.02531627 157 nips-2011-Learning to Search Efficiently in High Dimensions
14 0.024979319 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies
15 0.024650529 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
16 0.024016282 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
17 0.023975685 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
18 0.02299956 231 nips-2011-Randomized Algorithms for Comparison-based Search
19 0.022308409 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
20 0.021585463 250 nips-2011-Shallow vs. Deep Sum-Product Networks
topicId topicWeight
[(0, 0.081), (1, 0.017), (2, -0.006), (3, 0.032), (4, 0.016), (5, -0.013), (6, -0.022), (7, -0.039), (8, -0.022), (9, -0.021), (10, -0.028), (11, -0.025), (12, -0.007), (13, 0.021), (14, 0.05), (15, -0.006), (16, -0.024), (17, 0.04), (18, 0.021), (19, -0.007), (20, 0.008), (21, 0.056), (22, -0.003), (23, 0.014), (24, -0.048), (25, 0.002), (26, 0.006), (27, -0.004), (28, -0.04), (29, -0.025), (30, 0.085), (31, -0.089), (32, -0.002), (33, -0.006), (34, -0.051), (35, -0.069), (36, -0.033), (37, 0.01), (38, -0.047), (39, -0.015), (40, -0.003), (41, -0.023), (42, -0.01), (43, -0.023), (44, -0.025), (45, 0.119), (46, -0.045), (47, 0.064), (48, -0.104), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.90680349 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
2 0.4746899 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
Author: Ichiro Takeuchi, Masashi Sugiyama
Abstract: We consider feature selection and weighting for nearest neighbor classifiers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efficiently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the first algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efficient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1
3 0.47101173 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions
Author: Mohammad S. Sorower, Janardhan R. Doppa, Walker Orr, Prasad Tadepalli, Thomas G. Dietterich, Xiaoli Z. Fern
Abstract: We consider the problem of learning rules from natural language text sources. These sources, such as news articles and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the correct conclusions. We study the problem of learning domain knowledge from such concise texts, which is an instance of the general problem of learning in the presence of missing data. However, unlike standard approaches to missing data, in this setting we know that facts are more likely to be missing from the text in cases where the reader can infer them from the facts that are mentioned combined with the domain knowledge. Hence, we can explicitly model this “missingness” process and invert it via probabilistic inference to learn the underlying domain knowledge. This paper introduces a mention model that models the probability of facts being mentioned in the text based on what other facts have already been mentioned and domain knowledge in the form of Horn clause rules. Learning must simultaneously search the space of rules and learn the parameters of the mention model. We accomplish this via an application of Expectation Maximization within a Markov Logic framework. An experimental evaluation on synthetic and natural text data shows that the method can learn accurate rules and apply them to new texts to make correct inferences. Experiments also show that the method out-performs the standard EM approach that assumes mentions are missing at random. 1
4 0.46349278 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
Author: Guy Broeck
Abstract: Probabilistic logics are receiving a lot of attention today because of their expressive power for knowledge representation and learning. However, this expressivity is detrimental to the tractability of inference, when done at the propositional level. To solve this problem, various lifted inference algorithms have been proposed that reason at the first-order level, about groups of objects as a whole. Despite the existence of various lifted inference approaches, there are currently no completeness results about these algorithms. The key contribution of this paper is that we introduce a formal definition of lifted inference that allows us to reason about the completeness of lifted inference algorithms relative to a particular class of probabilistic models. We then show how to obtain a completeness result using a first-order knowledge compilation approach for theories of formulae containing up to two logical variables. 1 Introduction and related work Probabilistic logic models build on first-order logic to capture relational structure and on graphical models to represent and reason about uncertainty [1, 2]. Due to their expressivity, these models can concisely represent large problems with many interacting random variables. While the semantics of these logics is often defined through grounding the models [3], performing inference at the propositional level is – as for first-order logic – inefficient. This has motivated the quest for lifted inference methods that exploit the structure of probabilistic logic models for efficient inference, by reasoning about groups of objects as a whole and avoiding repeated computations. The first approaches to exact lifted inference have upgraded the variable elimination algorithm to the first-order level [4, 5, 6]. More recent work is based on methods from logical inference [7, 8, 9, 10], such as knowledge compilation. While these approaches often yield dramatic improvements in runtime over propositional inference methods on specific problems, it is still largely unclear for which classes of models these lifted inference operators will be useful and for which ones they will eventually have to resort to propositional inference. One notable exception in this regard is lifted belief propagation [11], which performs exact lifted inference on any model whose factor graph representation is a tree. A first contribution of this paper is that we introduce a notion of domain lifted inference, which formally defines what lifting means, and which can be used to characterize the classes of probabilistic models to which lifted inference applies. Domain lifted inference essentially requires that probabilistic inference runs in polynomial time in the domain size of the logical variables appearing in the model. As a second contribution we show that the class of models expressed as 2-WFOMC formulae (weighted first-order model counting with up to 2 logical variables per formula) can be domain lifted using an extended first-order knowledge compilation approach [10]. The resulting approach allows for lifted inference even in the presence of (anti-) symmetric or total relations in a theory. These are extremely common and useful concepts that cannot be lifted by any of the existing first-order knowledge compilation inference rules. 1 2 Background We will use standard concepts of function-free first-order logic (FOL). An atom p(t1 , . . . , tn ) consists of a predicate p/n of arity n followed by n arguments, which are either constants or logical variables. An atom is ground if it does not contain any variables. A literal is an atom a or its negation ¬a. A clause is a disjunction l1 ∨ ... ∨ lk of literals. If k = 1, it is a unit clause. An expression is an atom, literal or clause. The pred(a) function maps an atom to its predicate and the vars(e) function maps an expression to its logical variables. A theory in conjunctive normal form (CNF) is a conjunction of clauses. We often represent theories by their set of clauses and clauses by their set of literals. Furthermore, we will assume that all logical variables are universally quantified. In addition, we associate a set of constraints with each clause or atom, either of the form X = t, where X is a logical variable and t is a constant or variable, or of the form X ∈ D, where D is a domain, or the negation of these constraints. These define a finite domain for each logical variable. Abusing notation, we will use constraints of the form X = t to denote a substitution of X by t. The function atom(e) maps an expression e to its atoms, now associating the constraints on e with each atom individually. To add the constraint c to an expression e, we use the notation e ∧ c. Two atoms unify if there is a substitution which makes them identical and if the conjunction of the constraints on both atoms with the substitution is satisfiable. Two expressions e1 and e2 are independent, written e1 ⊥ e2 , if no atom a1 ∈ atom(e1 ) unifies with an atom a2 ∈ atom(e2 ). ⊥ We adopt the Weighted First-Order Model Counting (WFOMC) [10] formalism to represent probabilistic logic models, building on the notion of a Herbrand interpretation. Herbrand interpretations are subsets of the Herbrand base HB (T ), which consists of all ground atoms that can be constructed with the available predicates and constant symbols in T . The atoms in a Herbrand interpretation are assumed to be true. All other atoms in HB (T ) are assumed to be false. An interpretation I satisfies a theory T , written as I |= T , if it satisfies all the clauses c ∈ T . The WFOMC problem is defined on a weighted logic theory T , which is a logic theory augmented with a positive weight function w and a negative weight function w, which assign a weight to each predicate. The WFOMC problem involves computing wmc(T, w, w) = w(pred(a)) I|=T a∈I 3 3.1 w(pred(a)). (1) a∈HB(T )\I First-order knowledge compilation for lifted probabilistic inference Lifted probabilistic inference A first-order probabilistic model defines a probability distribution P over the set of Herbrand interpretations H. Probabilistic inference in these models is concerned with computing the posterior probability P(q|e) of query q given evidence e, where q and e are logical expressions in general: P(q|e) = h∈H,h|=q∧e P(h) h∈H,h|=e P(h) (2) We propose one notion of lifted inference for first-order probabilistic models, defined in terms of the computational complexity of inference w.r.t. the domains of logical variables. It is clear that other notions of lifted inference are conceivable, especially in the case of approximate inference. Definition 1 (Domain Lifted Probabilistic Inference). A probabilistic inference procedure is domain lifted for a model m, query q and evidence e iff the inference procedure runs in polynomial time in |D1 |, . . . , |Dk | with Di the domain of the logical variable vi ∈ vars(m, q, e). Domain lifted inference does not prohibit the algorithm to be exponential in the size of the vocabulary, that is, the number of predicates, arguments and constants, of the probabilistic model, query and evidence. In fact, the definition allows inference to be exponential in the number of constants which occur in arguments of atoms in the theory, query or evidence, as long as it is polynomial in the cardinality of the logical variable domains. This definition of lifted inference stresses the ability to efficiently deal with the domains of the logical variables that arise, regardless of their size, and formalizes what seems to be generally accepted in the lifted inference literature. 2 A class of probabilistic models is a set of probabilistic models expressed in a particular formalism. As examples, consider Markov logic networks (MLN) [12] or parfactors [4], or the weighted FOL theories for WFOMC that we introduced above, when the weights are normalized. Definition 2 (Completeness). Restricting queries to atoms and evidence to a conjunction of literals, a procedure that is domain lifted for all probabilistic models m in a class of models M and for all queries q and evidence e, is called complete for M . 3.2 First-order knowledge compilation First-order knowledge compilation is an approach to lifted probabilistic inference consisting of the following three steps (see Van den Broeck et al. [10] for details): 1. Convert the probabilistic logical model to a weighted CNF. Converting MLNs or parfactors requires adding new atoms to the theory that represent the (truth) value of each factor or formula. set-disjunction 2 friends(X, Y ) ∧ smokes(X) ⇒ smokes(Y ) Smokers ⊆ People decomposable conjunction unit clause leaf (a) MLN Model ∧ smokes(X), X ∈ Smokers smokes(Y ) ∨ ¬ smokes(X) ∨¬ friends(X, Y ) ∨ ¬ f(X, Y ) friends(X, Y ) ∨ f(X, Y ) smokes(X) ∨ f(X, Y ) ¬ smokes(Y ) ∨ f(X, Y ). ∧ f(X, Y ), Y ∈ Smokers ∧ ¬ smokes(Y ), Y ∈ Smokers / ∧ f(X, Y ), X ∈ Smokers, Y ∈ Smokers / / x ∈ Smokers (b) CNF Theory deterministic disjunction Predicate friends smokes f w 1 1 e2 w 1 1 1 y ∈ Smokers / ∨ set-conjunction ∧ f(x, y) (c) Weight Functions ¬ friends(x, y) ∧ friends(x, y) ¬ f(x, y) (d) First-Order d-DNNF Circuit Figure 1: Friends-smokers example (taken from [10]) Example 1. The MLN in Figure 1a assigns a weight to a formula in FOL. Figure 1b represents the same model as a weighted CNF, introducing a new atom f(X, Y ) to encode the truth value of the MLN formula. The probabilistic information is captured by the weight functions in Figure 1c. 2. Compile the logical theory into a First-Order d-DNNF (FO d-DNNF) circuit. Figure 1d shows an example of such a circuit. Leaves represent unit clauses. Inner nodes represent the disjunction or conjunction of their children l and r, but with the constraint that disjunctions must be deterministic (l ∧ r is unsatisfiable) and conjunctions must be decomposable (l ⊥ r). ⊥ 3. Perform WFOMC inference to compute posterior probabilities. In a FO d-DNNF circuit, WFOMC is polynomial in the size of the circuit and the cardinality of the domains. To compile the CNF theory into a FO d-DNNF circuit, Van den Broeck et al. [10] propose a set of compilation rules, which we will refer to as CR 1 . We will now briefly describe these rules. Unit Propagation introduces a decomposable conjunction when the theory contains a unit clause. Independence creates a decomposable conjunction when the theory contains independent subtheories. Shannon decomposition applies when the theory contains ground atoms and introduces a deterministic disjunction between two modified theories: one where the ground atom is true, and one where it is false. Shattering splits clauses in the theory until all pairs of atoms represent either a disjoint or identical set of ground atoms. Example 2. In Figure 2a, the first two clauses are made independent from the friends(X, X) clause and split off in a decomposable conjunction by unit propagation. The unit clause becomes a leaf of the FO d-DNNF circuit, while the other operand requires further compilation. 3 dislikes(X, Y ) ∨ friends(X, Y ) fun(X) ∨ ¬ friends(X, Y ) friends(X, Y ) ∨ dislikes(X, Y ) ¬ friends(X, Y ) ∨ likes(X, Y ) friends(X, X) fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) ∧ FunPeople ⊆ People x ∈ People friends(X, X) friends(X, Y ) ∨ dislikes(X, Y ), X = Y ¬ friends(X, Y ) ∨ likes(X, Y ), X = Y likes(X, X) dislikes(x, Y ) ∨ friends(x, Y ) fun(x) ∨ ¬ friends(x, Y ) fun(X), X ∈ FunPeople ¬ fun(X), X ∈ FunPeople / fun(X) ∨ ¬ friends(X, Y ) fun(X) ∨ ¬ friends(Y, X) (a) Unit propagation of friends(X, X) (b) Independent partial grounding (c) Atom counting of fun(X) Figure 2: Examples of compilation rules. Circles are FO d-DNNF inner nodes. White rectangles show theories before and after applying the rule. All variable domains are People. (taken from [10]) Independent Partial Grounding creates a decomposable conjunction over a set of child circuits, which are identical up to the value of a grounding constant. Since they are structurally identical, only one child circuit is actually compiled. Atom Counting applies when the theory contains an atom with a single logical variable X ∈ D. It explicitly represents the domain D ⊆ D of X for which the atom is true. It compiles the theory into a deterministic disjunction between all possible such domains. Again, these child circuits are identical up to the value of D and only one is compiled. Example 3. The theory in Figure 2b is compiled into a decomposable set-conjunction of theories that are independent and identical up to the value of the x constant. The theory in Figure 2c contains an atom with one logical variable: fun(X). Atom counting compiles it into a deterministic setdisjunction over theories that differ in FunPeople, which is the domain of X for which fun(X) is true. Subsequent steps of unit propagation remove the fun(X) atoms from the theory entirely. 3.3 Completeness We will now characterize those theories where the CR 1 compilation rules cannot be used, and where the inference procedure has to resort to grounding out the theory to propositional logic. For these, first-order knowledge compilation using CR 1 is not yet domain lifted. When a logical theory contains symmetric, anti-symmetric or total relations, such as friends(X, Y ) ⇒ friends(Y, X), parent(X, Y ) ⇒ ¬ parent(Y, X), X = Y, ≤ (X, Y) ∨ ≤ (Y, X), or more general formulas, such as enemies(X, Y ) ⇒ ¬ friend(X, Y ) ∧ ¬ friend(Y, X), (3) (4) (5) (6) none of the CR 1 rules apply. Intuitively, the underlying problem is the presence of either: • Two unifying (not independent) atoms in the same clause which contain the same logical variable in different positions of the argument list. Examples include (the CNF of) Formulas 3, 4 and 5, where the X and Y variable are bound by unifying two atoms from the same clause. • Two logical variables that bind when unifying one pair of atoms but appear in different positions of the argument list of two other unifying atoms. Examples include Formula 6, which in CNF is ¬ friend(X, Y ) ∨ ¬ enemies(X, Y ) ¬ friend(Y, X) ∨ ¬ enemies(X, Y ) Here, unifying the enemies(X, Y ) atoms binds the X variables from both clauses, which appear in different positions of the argument lists of the unifying atoms friend(X, Y ) and friend(Y, X). Both of these properties preclude the use of CR 1 rules. Also in the context of other model classes, such as MLNs, probabilistic versions of the above formulas cannot be processed by CR 1 rules. 4 Even though first-order knowledge compilation with CR 1 rules does not have a clear completeness result, we can show some properties of theories to which none of the compilation rules apply. First, we need to distinguish between the arity of an atom and its dimension. A predicate with arity two might have atoms with dimension one, when one of the arguments is ground or both are identical. Definition 3 (Dimension of an Expression). The dimension of an expression e is the number of logical variables it contains: dim(e) = | vars(e)|. Lemma 1 (CR 1 Postconditions). The CR 1 rules remove all atoms from the theory T which have zero or one logical variable arguments, such that afterwards ∀a ∈ atom(T ) : dim(a) > 1. When no CR 1 rule applies, the theory is shattered and contains no independent subtheories. Proof. Ground atoms are removed by the Shannon decomposition operator followed by unit propagation. Atoms with a single logical variable (including unary relations) are removed by the atom counting operator followed by unit propagation. If T contains independent subtheories, the independence operator can be applied. Shattering is always applied when T is not yet shattered. 4 Extending first-order knowledge compilation In this section we introduce a new operator which does apply to the theories from Section 3.3. 4.1 Logical variable properties To formally define the operator we propose, and prove its correctness, we first introduce some mathematical concepts related to the logical variables in a theory (partly after Jha et al. [8]). Definition 4 (Binding Variables). Two logical variables X, Y are directly binding b(X, Y ) if they are bound by unifying a pair of atoms in the theory. The binding relationship b+ (X, Y ) is the transitive closure of the directly binding relation b(X, Y ). Example 4. In the theory ¬ p(W, X) ∨ ¬ q(X) r(Y ) ∨ ¬ q(Y ) ¬ r(Z) ∨ s(Z) the variable pairs (X, Y ) and (Y, Z) are directly binding. The variables X, Y and Z are binding. Variable W does not bind to any other variable. Note that the binding relationship b+ (X, Y ) is an equivalence relation that defines two equivalence classes: {X, Y, Z} and {W }. Lemma 2 (Binding Domains). After shattering, binding logical variables have identical domains. Proof. During shattering (see Section 3.2), when two atoms unify, binding two variables with partially overlapping domains, the atoms’ clauses are split up into clauses where the domain of the variables is identical, and clauses where the domains are disjoint and the atoms no longer unify. Definition 5 (Root Binding Class). A root variable is a variable that appears in all the atoms in its clause. A root binding class is an equivalence class of binding variables where all variables are root. Example 5. In the theory of Example 4, {X, Y, Z} is a root binding class and {W } is not. 4.2 Domain recursion We will now introduce the new domain recursion operator, starting with its preconditions. Definition 6. A theory allows for domain recursion when (i) the theory is shattered, (ii) the theory contains no independent subtheories and (iii) there exists a root binding class. From now on, we will denote with C the set of clauses of the theory at hand and with B a root binding class guaranteed to exist if C allows for domain recursion. Lemma 2 states that all variables in B have identical domains. We will denote the domain of these variables with D. The intuition behind the domain recursion operator is that it modifies D by making one element explicit: D = D ∪ {xD } with xD ∈ D . This explicit domain element is introduced by the S PLIT D / function, which splits clauses w.r.t. the new subdomain D and element xD . 5 Definition 7 (S PLIT D). For a clause c and given set of variables Vc ⊆ vars(c) with domain D, let S PLIT D(c, Vc ) = c, if Vc = ∅ S PLIT D(c1 , Vc \ {V }) ∪ S PLIT D(c2 , Vc \ {V }), if Vc = ∅ (7) where c1 = c ∧ (V = xD ) and c2 = c ∧ (V = xD ) ∧ (V ∈ D ) for some V ∈ Vc . For a set of clauses C and set of variables V with domain D: S PLIT D(C, V) = c∈C S PLIT D(c, V ∩ vars(c)). The domain recursion operator creates three sets of clauses: S PLIT D(C, B) = Cx ∪ Cv ∪ Cr , with Cx = {c ∧ (V = xD )|c ∈ C}, (8) V ∈B∩vars(c) Cv = {c ∧ (V = xD ) ∧ (V ∈ D )|c ∈ C}, (9) V ∈B∩vars(c) Cr = S PLIT D(C, B) \ Cx \ Cv . (10) Proposition 3. The conjunction of the domain recursion sets is equivalent to the original theory: c∈Cr c . c∈Cv c ∧ c∈C c ≡ c∈S PLIT D(C,B) c and therefore c∈C c ≡ c∈Cx c ∧ We will now show that these sets are independent and that their conjunction is decomposable. Theorem 4. The theories Cx , Cv and Cr are independent: Cx ⊥ Cv , Cx ⊥ Cr and Cv ⊥ Cr . ⊥ ⊥ ⊥ The proof of Theorem 4 relies on the following Lemma. Lemma 5. If the theory allows for domain recursion, all clauses and atoms contain the same number of variables from B: ∃n, ∀c ∈ C, ∀a ∈ atom(C) : | vars(c) ∩ B | = | vars(a) ∩ B | = n. c Proof. Denote with Cn the clauses in C that contain n logical variables from B and with Cn its compliment in C. If C is nonempty, there is a n > 0 for which Cn is nonempty. Then every atom in Cn contains exactly n variables from B (Definition 5). Since the theory contains no independent c c subtheories, there must be an atom a in Cn which unifies with an atom ac in Cn , or Cn is empty. After shattering, all unifications bind one variable from a to a single variable from ac . Because a contains exactly n variables from B, ac must also contain exactly n (Definition 4), and because B is c a root binding class, the clause of ac also contains exactly n, which contradicts the definition of Cn . c Therefore, Cn is empty, and because the variables in B are root, they also appear in all atoms. Proof of Theorem 4. From Lemma 5, all atoms in C contain the same number of variables from B. In Cx , these variables are all constrained to be equal to xD , while in Cv and Cr at least one variable is constrained to be different from xD . An attempt to unify an atom from Cx with an atom from Cv or Cr therefore creates an unsatisfiable set of constraints. Similarly, atoms from Cv and Cr cannot be unified. Finally, we extend the FO d-DNNF language proposed in Van den Broeck et al. [10] with a new node, the recursive decomposable conjunction ∧ r , and define the domain recursion compilation rule. Definition 8 ( ∧ r ). The FO d-DNNF node ∧ r (nx , nr , D, D , V) represents a decomposable conjunction between the d-DNNF nodes nx , nr and a d-DNNF node isomorphic to the ∧ r node itself. In particular, the isomorphic operand is identical to the node itself, except for the size of the domain of the variables in V, which becomes one smaller, going from D to D in the isomorphic operand. We have shown that the conjunction between sets Cx , Cv and Cr is decomposable (Theorem 4) and logically equivalent to the original theory (Proposition 3). Furthermore, Cv is identical to C, up to the constraints on the domain of the variables in B. This leads us to the following definition of domain recursion. Definition 9 (Domain Recursion). The domain recursion compilation rule compiles C into ∧ r (nx , nr , D, D , B), where nx , nr are the compiled circuits for Cx , Cr . The third set Cv is represented by the recursion on D, according to Definition 8. 6 nv Cr ∧r ¬ friends(x, X) ∨ friends(X, x), X = x ¬ friends(X, x) ∨ friends(x, X), X = x P erson ← P erson \ {x} nr nx ∨ Cx ¬ friends(x, x) ∨ friends(x, x) x ∈P erson x =x ¬ friends(x, x) friends(x, x) ∨ ∧ ¬ friends(x, x ) ¬ friends(x , x) ∧ friends(x, x ) friends(x , x) Figure 3: Circuit for the symmetric relation in Equation 3, rooted in a recursive conjunction. Example 6. Figure 3 shows the FO d-DNNF circuit for Equation 3. The theory is split up into three independent theories: Cr and Cx , shown in the Figure 3, and Cv = {¬ friends(X, Y ) ∨ friends(Y, X), X = x, Y = x}. The conjunction of these theories is equivalent to Equation 3. Theory Cv is identical to Equation 3, up to the inequality constraints on X and Y . Theorem 6. Given a function size, which maps domains to their size, the weighted first-order model count of a ∧ r (nx , nr , D, D , V) node is size(D) wmc( ∧ r (nx , nr , D, D , V), size) = wmc(nx , size)size(D) wmc(nr , size ∪{D → s}), s=0 (11) where size ∪{D → s} adds to the size function that the subdomain D has cardinality s. Proof. If C allows for domain recursion, due to Theorem 4, the weighted model count is wmc(C, size) = 1, if size(D) = 0 wmc(Cx ) · wmc(Cv , size ) · wmc(Cr , size ) if size(D) > 0 (12) where size = size ∪{D → size(D) − 1}. Theorem 7. The Independent Partial Grounding compilation rule is a special case of the domain recursion rule, where ∀c ∈ C : | vars(c) ∩ B | = 1 (and therefore Cr = ∅). 4.3 Completeness In this section, we introduce a class of models for which first-order knowledge compilation with domain recursion is complete. Definition 10 (k-WFOMC). The class of k-WFOMC consist of WFOMC theories with clauses that have up to k logical variables. A first completeness result is for 2-WFOMC, using the set of knowledge compilation rules CR 2 , which are the rules in CR 1 extended with domain recursion. Theorem 8 (Completeness for 2-WFOMC). First-order knowledge compilation using the CR 2 compilation rules is a complete domain lifted probabilistic inference algorithm for 2-WFOMC. Proof. From Lemma 1, after applying the CR 1 rules, the theory contains only atoms with dimension larger than or equal to two. From Definition 10, each clause has dimension smaller than or equal to two. Therefore, each logical variable in the theory is a root variable and according to Definition 5, every equivalence class of binding variables is a root binding class. Because of Lemma 1, the theory allows for domain recursion, which requires further compilation of two theories: Cx and Cr into nx and nr . Both have dimension smaller than 2 and can be lifted by CR 1 compilation rules. The properties of 2-WFOMC are a sufficient but not necessary condition for first-order knowledge compilation to be domain lifted. We can obtain a similar result for MLNs or parfactors by reducing them to a WFOMC problem. If an MLN contains only formulae with up to k logical variables, then its WFOMC representation will be in k-WFOMC. 7 This result for 2-WFOMC is not trivial. Van den Broeck et al. [10] showed in their experiments that counting first-order variable elimination (C-FOVE) [6] fails to lift the “Friends Smoker Drinker” problem, which is in 2-WFOMC. We will show in the next section that the CR 1 rules fail to lift the theory in Figure 4a, which is in 2-WFOMC. Note that there are also useful theories that are not in 2-WFOMC, such as those containing the transitive relation friends(X, Y ) ∧ friends(Y, Z) ⇒ friends(X, Z). 5 Empirical evaluation To complement the theoretical results of the previous section, we extended the WFOMC implementation1 with the domain recursion rule. We performed experiments with the theory in Figure 4a, which is a version of the friends and smokers model [11] extended with the symmetric relation of Equation 3. We evaluate the performance querying P(smokes(bob)) with increasing domain size, comparing our approach to the existing WFOMC implementation and its propositional counterpart, which first grounds the theory and then compiles it with the c2d compiler [13] to a propositional d-DNNF circuit. We did not compare to C-FOVE [6] because it cannot perform lifted inference on this model. 2 smokes(X) ∧ friends(X, Y ) ⇒ smokes(Y ) friends(X, Y ) ⇒ friends(Y, X). Runtime [s] Propositional inference quickly becomes intractable when there are more than 20 people. The lifted inference algorithms scale much better. The CR 1 rules can exploit some regularities in the model. For example, they eliminate all the smokes(X) atoms from the theory. They do, however, resort to grounding at a later stage of the compilation process. With the domain recursion rule, there is no need for grounding. This advantage is clear in the experiments, our approach having an almost constant inference time in this range of domains sizes. Note that the runtimes for c2d include compilation and evaluation of the circuit, whereas the WFOMC runtimes only represent evaluation of the FO d-DNNF. After all, propositional compilation depends on the domain size but first-order compilation does not. First-order compilation takes a constant two seconds for both rule sets. 10000 1000 100 10 1 0.1 0.01 c2d WFOMC - CR1 WFOMC - CR2 10 20 30 40 50 60 Number of People 70 80 (b) Evaluation Runtime (a) MLN Model Figure 4: Symmetric friends and smokers experiment, comparing propositional knowledge compilation (c2d) to WFOMC using compilation rules CR 1 and CR 2 (which includes domain recursion). 6 Conclusions We proposed a definition of complete domain lifted probabilistic inference w.r.t. classes of probabilistic logic models. This definition considers algorithms to be lifted if they are polynomial in the size of logical variable domains. Existing first-order knowledge compilation turns out not to admit an intuitive completeness result. Therefore, we generalized the existing Independent Partial Grounding compilation rule to the domain recursion rule. With this one extra rule, we showed that first-order knowledge compilation is complete for a significant class of probabilistic logic models, where the WFOMC representation has up to two logical variables per clause. Acknowledgments The author would like to thank Luc De Raedt, Jesse Davis and the anonymous reviewers for valuable feedback. This work was supported by the Research Foundation-Flanders (FWO-Vlaanderen). 1 http://dtai.cs.kuleuven.be/wfomc/ 8 References [1] Lise Getoor and Ben Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007. [2] Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors. Probabilistic inductive logic programming: theory and applications. Springer-Verlag, Berlin, Heidelberg, 2008. [3] Daan Fierens, Guy Van den Broeck, Ingo Thon, Bernd Gutmann, and Luc De Raedt. Inference in probabilistic logic programs using weighted CNF’s. In Proceedings of UAI, pages 256–265, 2011. [4] David Poole. First-order probabilistic inference. In Proceedings of IJCAI, pages 985–991, 2003. [5] Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Proceedings of IJCAI, pages 1319–1325, 2005. [6] Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted Probabilistic Inference with Counting Formulas. In Proceedings of AAAI, pages 1062–1068, 2008. [7] Vibhav Gogate and Pedro Domingos. Exploiting Logical Structure in Lifted Probabilistic Inference. In Proceedings of StarAI, 2010. [8] Abhay Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. Lifted Inference Seen from the Other Side: The Tractable Features. In Proceedings of NIPS, 2010. [9] Vibhav Gogate and Pedro Domingos. Probabilistic theorem proving. In Proceedings of UAI, pages 256–265, 2011. [10] Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, and Luc De Raedt. Lifted Probabilistic Inference by First-Order Knowledge Compilation. In Proceedings of IJCAI, pages 2178–2185, 2011. [11] Parag Singla and Pedro Domingos. Lifted first-order belief propagation. In Proceedings of AAAI, pages 1094–1099, 2008. [12] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1): 107–136, 2006. [13] Adnan Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328–332, 2004. 9
5 0.45413437 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
6 0.44918343 130 nips-2011-Inductive reasoning about chimeric creatures
7 0.43977973 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
8 0.43877137 19 nips-2011-Active Classification based on Value of Classifier
9 0.4293009 277 nips-2011-Submodular Multi-Label Learning
10 0.42737225 33 nips-2011-An Exact Algorithm for F-Measure Maximization
11 0.40976599 53 nips-2011-Co-Training for Domain Adaptation
12 0.37705433 231 nips-2011-Randomized Algorithms for Comparison-based Search
13 0.37359142 8 nips-2011-A Model for Temporal Dependencies in Event Streams
14 0.370078 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
15 0.35748443 275 nips-2011-Structured Learning for Cell Tracking
16 0.35636622 229 nips-2011-Query-Aware MCMC
17 0.35163841 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
18 0.3462202 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
19 0.34220043 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
20 0.34088218 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories
topicId topicWeight
[(0, 0.024), (4, 0.042), (20, 0.032), (26, 0.029), (31, 0.054), (33, 0.02), (43, 0.042), (44, 0.404), (45, 0.073), (57, 0.031), (65, 0.015), (74, 0.036), (83, 0.044), (99, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.77900273 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
Author: Cristina Savin, Peter Dayan, Máté Lengyel
Abstract: Storing a new pattern in a palimpsest memory system comes at the cost of interfering with the memory traces of previously stored items. Knowing the age of a pattern thus becomes critical for recalling it faithfully. This implies that there should be a tight coupling between estimates of age, as a form of familiarity, and the neural dynamics of recollection, something which current theories omit. Using a normative model of autoassociative memory, we show that a dual memory system, consisting of two interacting modules for familiarity and recollection, has best performance for both recollection and recognition. This finding provides a new window onto actively contentious psychological and neural aspects of recognition memory. 1
3 0.43500432 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
4 0.31089103 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1
5 0.30984733 231 nips-2011-Randomized Algorithms for Comparison-based Search
Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer
Abstract: This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.
6 0.30967343 227 nips-2011-Pylon Model for Semantic Segmentation
7 0.30934936 22 nips-2011-Active Ranking using Pairwise Comparisons
8 0.30916235 186 nips-2011-Noise Thresholds for Spectral Clustering
9 0.30878922 180 nips-2011-Multiple Instance Filtering
10 0.30826357 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
11 0.30788362 258 nips-2011-Sparse Bayesian Multi-Task Learning
12 0.30785388 303 nips-2011-Video Annotation and Tracking with Active Learning
13 0.30740929 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
14 0.30706543 276 nips-2011-Structured sparse coding via lateral inhibition
15 0.30625325 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
16 0.30599135 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
17 0.30574465 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
18 0.30569965 64 nips-2011-Convergent Bounds on the Euclidean Distance
19 0.30545905 102 nips-2011-Generalised Coupled Tensor Factorisation
20 0.30541232 219 nips-2011-Predicting response time and error rates in visual search