nips nips2010 nips2010-159 knowledge-graph by maker-knowledge-mining

159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features


Source: pdf

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e. [sent-4, score-0.436]

2 In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. [sent-8, score-0.142]

3 In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. [sent-9, score-0.251]

4 Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. [sent-10, score-0.129]

5 1 Introduction Recently, there has been a push towards combining logical and probabilistic approaches in Artificial Intelligence. [sent-11, score-0.135]

6 Many representations that combine logic and graphical models, a popular probabilistic representation [1, 2], have been proposed over the last few years. [sent-15, score-0.142]

7 In its simplest form, an MLN is a set of weighted first-order logic formulas, and can be viewed as a template for generating a Markov network. [sent-17, score-0.207]

8 Specifically, given a set of constants that model objects in the domain, it represents a ground Markov network that has one (propositional) feature for each grounding of each (first-order) formula with constants in the domain. [sent-18, score-0.267]

9 Until recently, most inference schemes for MLNs were propositional: inference was carried out by first constructing a ground Markov network and then running a standard probabilistic inference algorithm over it. [sent-19, score-0.288]

10 Fortunately, in some cases, one can perform lifted inference in MLNs without grounding out the domain. [sent-22, score-0.471]

11 Lifted inference treats sets of indistinguishable objects as one, and can yield exponential speed-ups over propositional inference. [sent-23, score-0.146]

12 Many lifted inference algorithms have been proposed over the last few years (c. [sent-24, score-0.393]

13 In other words, these algorithms are basically lifted versions of standard probabilistic inference algorithms. [sent-34, score-0.436]

14 For example, first-order variable elimination [4, 5, 7] lifts the standard variable elimination algorithm [8, 9], while lifted Belief propagation [10] lifts Pearl’s Belief propagation [11, 12]. [sent-35, score-0.543]

15 In this paper, we depart from existing approaches, and present a new approach to lifted inference from the other, logical side. [sent-36, score-0.485]

16 In particular, we propose a set of rewriting rules that exploit the structure of the logical formulas for inference. [sent-37, score-0.227]

17 Each rule takes an MLN as input and expresses its partition function as a combination of partition functions of simpler MLNs (if the preconditions of the rule are satisfied). [sent-38, score-0.376]

18 Our work derives heavily from database literature in which inference techniques based on manipulating logical formulas (queries) have been investigated rigorously [13, 14]. [sent-41, score-0.204]

19 Our algorithm extends their techniques to lifted inference, and thus can be applied to a strictly larger class of probabilistic models. [sent-43, score-0.368]

20 However, the set of tractable MLNs is quite large, and includes MLNs that cannot be solved in PTIME by any of the existing lifted approaches. [sent-46, score-0.363]

21 This MLN is also out of reach of state-of-the-art propositional inference approaches such as variable elimination [8, 9], which are exponential in treewidth. [sent-48, score-0.205]

22 This is because the treewidth of the ground Markov network is polynomial in the number of constants in the domain. [sent-49, score-0.163]

23 An atom is a predicate symbol applied to a tuple of variables or constants. [sent-62, score-0.405]

24 ¯ A conjunctive feature is of the form ∀X r1 ∧ r2 ∧ · · · ∧ rk , where each ri is an atom or the negation ¯ of an atom, and X are the variables used in the atoms. [sent-64, score-0.326]

25 The former asserts everyone in the domain of X has asthma and does not smoke. [sent-67, score-0.302]

26 The latter says that if a person smokes, he/she cannot be friends with Bob. [sent-68, score-0.297]

27 A grounding of a feature is an assignment of the variables to constants from their domain. [sent-69, score-0.207]

28 For example, ¬Smokes(alice) ∨ ¬Friends(bob,alice) is a grounding of the disjunctive feature fd . [sent-70, score-0.21]

29 We assume that no predicate symbol occurs more than once in a feature i. [sent-71, score-0.221]

30 There is an implicit typesafety assumption in the MLNs, that if a predicate symbol occurs in more than one feature, then the variables used at the same position must have same domain. [sent-77, score-0.189]

31 Consider the possible world ω below : Smokes bob Asthma bob Friends (bob,bob) (bob,alice) alice (alice,alice) 1 Then from Equation (1): P r(ω) = Z exp (1. [sent-80, score-0.298]

32 A Counting Program is a set of features f along with indeterminates α, ¯ where αi is the indeterminate for fi . [sent-89, score-0.373]

33 k , we define its generating function(GF) FP as follows: FP (¯ ) α N (fi ,ω) = αi ω (2) i The generating function as expressed in Eq. [sent-93, score-0.216]

34 The program (R(X, Y ), α) has GF (1 + α)|∆X ||∆Y | , which is in closed form. [sent-101, score-0.139]

35 While the program |∆ | X (R(X) ∧ S(X, Y ) ∧ T (Y ), α) has GF 2|∆X ||∆Y | i=0 |∆iX | 1 + polynomial expression. [sent-102, score-0.149]

36 |∆Y | 1+α i , 2 which is a In the following section we demonstrate an algorithm that computes the generating function, and allows us to identify cases where the generating function falls under one of the above categories. [sent-104, score-0.216]

37 In this section, we present some rules that can be used to compute the GF of a CP from simpler CPs. [sent-106, score-0.113]

38 1 A counting program is called normal if it satisfies the following properties : 1. [sent-114, score-0.241]

39 If two distinct atoms with the same predicate symbol have variables X and Y in the same position, then ∆X = ∆Y . [sent-117, score-0.238]

40 2 Computing the partition function of an MLN can be reduced in PTIME to computing the generating function of a normal CP. [sent-119, score-0.185]

41 To normalize it, we can replace the two features by: (i) Friends1(Y ) ≡ Friends(bob, Y ), and (ii) Friends2(Z, Y ) ≡ Friends(X, Y ), X = bob, where the domain of Z is ∆Z = ∆X \ bob. [sent-124, score-0.156]

42 Furthermore, without loss of generality, we assume numeric domains for each logical variable, namely ∆X = {1, . [sent-132, score-0.116]

43 We define a substitution f [a/X], where X ∈ V ars(f ) and a ∈ ∆X , as the replacement of X with a in every atom of f . [sent-136, score-0.238]

44 P [a/X] applies the substitution fi [a/X] to every feature fi in P . [sent-137, score-0.614]

45 Define a relation U among the variables of a CP as follows : U (X, Y ) iff there exist two atoms ri , rj with the same predicate, such that X ∈ V ars(ri ), Y ∈ V ars(rj ), and X and Y appear at the same position in ri and rj respectively. [sent-139, score-0.167]

46 Given a feature, a variable is a root variable iff it appears in every atom of the feature. [sent-144, score-0.203]

47 For some variable X, the set X = Unify(X) is a separator if ∀Y ∈ X : Y ∈ V ars(fi ) implies Y must be a root variable for fi . [sent-145, score-0.281]

48 We define a process Split(Y, k) that splits every feature in the CP that contains the variable Y into two features with disjoint domains: one with ∆Y = {k} and the other with ∆Y c = ∆Y − {k}. [sent-150, score-0.121]

49 Also, Cond(i, r, k) defines a process that removes an atom r from feature fi . [sent-152, score-0.49]

50 Denote fi = fi \ {r}; then Cond(i, r, k) replaces fi with (i) two features k (T RU E, αi ) and (fi , 1) if r ⇒ fi , (ii) one feature (fi , 1) if r ⇒ ¬fi , and (iii) (fi , αi ) otherwise. [sent-153, score-1.1]

51 3 The Algorithm Our algorithm is basically a recursive application of a series of rewriting rules (see rules R1-R6 given below). [sent-155, score-0.182]

52 Each (non-trivial) rule takes a CP as input and if the preconditions for applying it are satisfied, then it expresses the generating function of the input CP as a combination of generating functions of a few simpler CPs. [sent-156, score-0.384]

53 The recursion terminates when the generating function of the CP is trivial to compute (SUCCESS) or when none of the rules can be applied (FAILURE). [sent-158, score-0.199]

54 Given a CP, we go through the rules in order and apply the first applicable rule, which may require us to recursively compute the GF of simpler CPs, for which we continue in the same way. [sent-161, score-0.113]

55 Our first rule uses feature and variable equivalence to reduce the size of the CP. [sent-162, score-0.198]

56 Formally, Rule R1 (Variable and Feature Equivalence Rule) If variables X and Y are equivalent, replace the pair with a single new variable Z in every atom where they occur. [sent-163, score-0.251]

57 If two features fi , fj are identical, then we replace them with a single feature fi with indeterminate αi αj that is the product of their individual indeterminates. [sent-165, score-0.703]

58 If a program P is just a tuple then FP = 1 + α, where α is the indeterminate. [sent-174, score-0.137]

59 If some feature fi has indeterminate αi = 1 (due to R6), then remove all the atoms in fi of a predicate symbol that is present in some other feature. [sent-175, score-0.851]

60 Intuitively, given two CPs which are independent, namely they have no atoms in common, the generating function of the joint CP is simply the product of the generating function of the two CPs. [sent-178, score-0.265]

61 Formally, Rule R3 (Independence Rule) If a CP P can be split into two programs P1 and P2 such that the two programs don’t have any predicate symbols in common, then FP = FP1 · FP2 . [sent-179, score-0.294]

62 The correctness of Rule R3 follows from the fact that every world ω of P can be written as a concatenation of two disjoint worlds, namely ω = (ω1 ∪ ω2 ) where ω1 and ω2 are the worlds from P1 and P2 respectively. [sent-180, score-0.141]

63 Hence the GF can be written as: N (fi ,ω1 ) FP = N (fi ,ω2 ) αi ω1 ∪ω2 fi ∈P1 αi fi ∈P2 N (fi ,ω1 ) = N (fi ,ω2 ) αi αi ω1 fi ∈P1 = FP1 · FP2 (3) ω2 fi ∈P2 The next rule allows us to split a feature if it has a component that is independent of the rest of the program. [sent-181, score-1.196]

64 Note that while the previous rule splits the program into two independent sets of features, this feature enables us to split a single feature. [sent-182, score-0.299]

65 Rule R4 (Dirichlet Convolution Rule) If the program contains feature f = f1 ∧ f2 , s. [sent-183, score-0.161]

66 f1 doesn’t share any variables or symbols with any atom in the program, then FP = Ff1 ∗FP −f +f2 . [sent-185, score-0.239]

67 Hence C(f, i)αi = Ff (α) = i C(f1 , i1 )C(f2 , i2 )αi = Ff1 ∗Ff2 i1 ,i2 |i1 i2 =i Our next rule utilizes the similarity property in addition to the independence property. [sent-190, score-0.114]

68 Given a set P of independent but equivalent CPs, the generating function of the joint CP equals the generating function of any CP, Pi ∈ P raised to the power |P|. [sent-191, score-0.216]

69 By definition, every instantiation a of a separator ¯ ¯ ¯ X defines a CP that has no tuple in common with other programs for X = ¯ a = ¯ Moreover, all b, ¯ b. [sent-192, score-0.148]

70 Our final rule generalizes the counting arguments presented in [5, 7]. [sent-197, score-0.222]

71 Rule R6 (Generalized Binomial Rule) Let P red(X) be a singleton atom in some feature. [sent-203, score-0.214]

72 Then for every feature fi in the new program containing an atom r = P red(Y ) apply (fi , αi ) ← Cond(i, r, k) and similarly (fi , αi ) ← Cond(i, ¬r, ∆Y c − k) ∆X for those containing r = P red(Y c ). [sent-205, score-0.614]

73 • If P can be evaluated using only rules R1, R2, R3 and R5, then it has a closed form. [sent-212, score-0.127]

74 • If P can be evaluated using only rules R1, R2, R3, R4, and R5, then it has a polynomial expression. [sent-213, score-0.137]

75 When we want to recurse over a set of features, simply keeping the partition function for smaller features is not enough; we need more information than that. [sent-220, score-0.117]

76 We will use simple predicate symbols like R, S, T and assume the domain of all variables as [n]. [sent-237, score-0.264]

77 Note that for a single tuple, say R(a) with indeterminate α, GF = 1 + α from rule R2. [sent-238, score-0.195]

78 Now suppose we have a simple program like P = {(R(X), α)} (a n single feature R(X) with indeterminate α). [sent-239, score-0.242]

79 Now assume the following program P with multiple features : R(X1 ) ∧ S(X1 , Y1 ) α S(X2 , Y2 ) ∧ T (X2 ) β Note that (X1 , X2 ) form a separator. [sent-244, score-0.145]

80 5 Experiments The algorithm that we described is based on computing the generating functions of counting programs to perform lifted inference, which approaches the problem from a completely different angle than existing techniques. [sent-250, score-0.603]

81 Due to this novelty, we can solve MLNs that are intractable for other existing lifted algorithms such as first-order variable elimination (FOVE) [5, 6, 7]. [sent-251, score-0.384]

82 The set of features used in this MLN fall into the class of counting programs having a pseudo-polynomial generating function. [sent-254, score-0.32]

83 As the size of domain increases, our approach should scale better than the existing techniques which can’t do lifted inference on this MLN. [sent-258, score-0.485]

84 5 displays the execution time of our CP algorithm vs the FOVE approach for domain sizes varying from 5 to 100, at the presence of 30% evidence. [sent-262, score-0.115]

85 FOVE cannot do lifted inference on this MLN and resorts to grounding. [sent-264, score-0.393]

86 The figure also displays the extrapolated data points for FOVE’s behavior in larger domain sizes, and shows its runtime growing exponentially. [sent-266, score-0.141]

87 We chose a domain size of 13 to run FOVE, since it couldn’t terminate for higher domain sizes. [sent-272, score-0.184]

88 The figure displays the runtime of our algorithm for domain sizes of 13 and 100. [sent-273, score-0.141]

89 This is because the main time-consuming rule used in this MLN is R4. [sent-276, score-0.114]

90 R4 chooses a singleton atom in the last feature, say Asthma, and eliminates it. [sent-277, score-0.214]

91 This involves time complexity proportional to the domain of the atom and the running time of the smaller MLN obtained after removing that atom. [sent-278, score-0.274]

92 As evidence increases, the atom corresponding to Asthma may be split into many smaller predicates; but the domain size of each predicate also keeps getting smaller. [sent-279, score-0.466]

93 6 Conclusion and Future Work We have presented a novel approach to lifted inference that uses the theory of generating functions to do efficient inference. [sent-281, score-0.501]

94 This is the first work that tries to address the complexity of lifted inference in terms of only the features (formulas). [sent-283, score-0.435]

95 This is beneficial because using a set of tractable features ensures that inference is always efficient and hence it will scale to large domains. [sent-284, score-0.148]

96 Another avenue for future work is extending other lifted inference approaches [5, 7] with rules that we have developed in this paper. [sent-290, score-0.484]

97 Namely, when lifted inference is not possible, they ground the domain and resort to propositional inference. [sent-292, score-0.604]

98 In particular, ground networks generated by logical formulas have some repetition in their structure that is difficult to capture after grounding. [sent-294, score-0.177]

99 This feature is in PTIME by our algorithm, but if we create a ground markov network by grounding this feature then it can have unbounded treewidth (as big as the domain itself). [sent-296, score-0.386]

100 We think our approach can provide an insight about how to best construct a graphical model from the groundings of a logical formula. [sent-297, score-0.14]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lifted', 0.325), ('fp', 0.306), ('friends', 0.297), ('gf', 0.273), ('fi', 0.25), ('mln', 0.247), ('asthma', 0.21), ('cp', 0.208), ('fove', 0.193), ('atom', 0.182), ('smokes', 0.177), ('unify', 0.15), ('mlns', 0.143), ('predicate', 0.115), ('rule', 0.114), ('counting', 0.108), ('generating', 0.108), ('bob', 0.104), ('program', 0.103), ('logic', 0.099), ('domain', 0.092), ('logical', 0.092), ('rules', 0.091), ('cps', 0.091), ('indeterminate', 0.081), ('propositional', 0.078), ('grounding', 0.078), ('ars', 0.071), ('inference', 0.068), ('cond', 0.064), ('ptime', 0.064), ('programs', 0.062), ('elimination', 0.059), ('alice', 0.058), ('feature', 0.058), ('predicates', 0.057), ('evidence', 0.053), ('disjunctive', 0.052), ('atoms', 0.049), ('worlds', 0.049), ('symbol', 0.048), ('groundings', 0.048), ('partition', 0.047), ('polynomial', 0.046), ('constants', 0.045), ('formulas', 0.044), ('probabilistic', 0.043), ('features', 0.042), ('ground', 0.041), ('correctness', 0.039), ('tractable', 0.038), ('conjunctive', 0.037), ('closed', 0.036), ('pedro', 0.035), ('substitution', 0.035), ('dan', 0.034), ('tuple', 0.034), ('convolution', 0.032), ('dalvi', 0.032), ('gfs', 0.032), ('nilesh', 0.032), ('preconditions', 0.032), ('smoke', 0.032), ('singleton', 0.032), ('world', 0.032), ('symbols', 0.031), ('dirichlet', 0.031), ('separator', 0.031), ('treewidth', 0.031), ('normal', 0.03), ('operators', 0.029), ('recurse', 0.028), ('lifts', 0.028), ('pods', 0.028), ('markov', 0.028), ('aaai', 0.028), ('runtime', 0.026), ('relational', 0.026), ('equivalence', 0.026), ('variables', 0.026), ('morgan', 0.025), ('cial', 0.025), ('people', 0.025), ('ru', 0.024), ('split', 0.024), ('domains', 0.024), ('arti', 0.023), ('displays', 0.023), ('rj', 0.023), ('closure', 0.023), ('transitive', 0.023), ('ri', 0.023), ('arithmetic', 0.022), ('fd', 0.022), ('replace', 0.022), ('belief', 0.022), ('propagation', 0.022), ('simpler', 0.022), ('kaufmann', 0.022), ('every', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1

2 0.15877354 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1

3 0.13868167 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes

Author: Sohan Seth, Park Il, Austin Brockmeier, Mulugeta Semework, John Choi, Joseph Francis, Jose Principe

Abstract: Hypothesis testing on point processes has several applications such as model fitting, plasticity detection, and non-stationarity detection. Standard tools for hypothesis testing include tests on mean firing rate and time varying rate function. However, these statistics do not fully describe a point process, and therefore, the conclusions drawn by these tests can be misleading. In this paper, we introduce a family of non-parametric divergence measures for hypothesis testing. A divergence measure compares the full probability structure and, therefore, leads to a more robust test of hypothesis. We extend the traditional Kolmogorov–Smirnov and Cram´ r–von-Mises tests to the space of spike trains via stratification, and e show that these statistics can be consistently estimated from data without any free parameter. We demonstrate an application of the proposed divergences as a cost function to find optimally matched point processes. 1

4 0.08079382 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

5 0.080364265 144 nips-2010-Learning Efficient Markov Networks

Author: Vibhav Gogate, William Webb, Pedro Domingos

Abstract: We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context-specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed-form weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many state-of-the-art Markov network structure learners. 1

6 0.080041282 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

7 0.078759052 149 nips-2010-Learning To Count Objects in Images

8 0.068316162 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

9 0.06739983 9 nips-2010-A New Probabilistic Model for Rank Aggregation

10 0.066097654 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

11 0.060934171 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection

12 0.05578386 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

13 0.047085438 94 nips-2010-Feature Set Embedding for Incomplete Data

14 0.045702796 274 nips-2010-Trading off Mistakes and Don't-Know Predictions

15 0.045600016 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts

16 0.044967625 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities

17 0.042700287 283 nips-2010-Variational Inference over Combinatorial Spaces

18 0.041956197 217 nips-2010-Probabilistic Multi-Task Feature Selection

19 0.038800742 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

20 0.038358048 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.12), (1, 0.029), (2, 0.01), (3, 0.017), (4, -0.056), (5, 0.006), (6, -0.007), (7, 0.023), (8, 0.081), (9, 0.012), (10, -0.101), (11, 0.007), (12, -0.006), (13, 0.007), (14, -0.003), (15, -0.044), (16, -0.028), (17, -0.029), (18, -0.002), (19, -0.076), (20, -0.085), (21, 0.053), (22, 0.067), (23, -0.086), (24, -0.007), (25, -0.069), (26, -0.009), (27, -0.043), (28, 0.02), (29, 0.104), (30, -0.119), (31, -0.137), (32, -0.09), (33, -0.098), (34, -0.111), (35, 0.075), (36, -0.101), (37, 0.107), (38, 0.044), (39, -0.089), (40, -0.093), (41, -0.032), (42, 0.037), (43, 0.085), (44, -0.046), (45, -0.147), (46, 0.053), (47, 0.006), (48, -0.06), (49, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93615997 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1

2 0.70365238 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present a simple and effective approach to learning tractable conditional random fields with structure that depends on the evidence. Our approach retains the advantages of tractable discriminative models, namely efficient exact inference and arbitrarily accurate parameter learning in polynomial time. At the same time, our algorithm does not suffer a large expressive power penalty inherent to fixed tractable structures. On real-life relational datasets, our approach matches or exceeds state of the art accuracy of the dense models, and at the same time provides an order of magnitude speedup. 1

3 0.63349885 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection

Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen

Abstract: Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation tree-based RMN (treeRMN), and an efficient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efficiency; and on the other hand, the CVI algorithm efficiently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efficient yet does not sacrifice its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the state-of-the-art approaches, but is significantly more efficient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs.

4 0.54327387 144 nips-2010-Learning Efficient Markov Networks

Author: Vibhav Gogate, William Webb, Pedro Domingos

Abstract: We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context-specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed-form weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many state-of-the-art Markov network structure learners. 1

5 0.49818256 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

6 0.48409349 9 nips-2010-A New Probabilistic Model for Rank Aggregation

7 0.44885167 118 nips-2010-Implicit Differentiation by Perturbation

8 0.42191011 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

9 0.41084504 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

10 0.387077 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies

11 0.37627456 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis

12 0.37550446 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

13 0.37402111 283 nips-2010-Variational Inference over Combinatorial Spaces

14 0.36634541 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

15 0.35400185 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

16 0.35291117 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes

17 0.32171619 107 nips-2010-Global seismic monitoring as probabilistic inference

18 0.31630278 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting

19 0.30700505 1 nips-2010-(RF)^2 -- Random Forest Random Field

20 0.30441287 215 nips-2010-Probabilistic Deterministic Infinite Automata


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.026), (17, 0.01), (27, 0.048), (30, 0.077), (35, 0.023), (45, 0.212), (50, 0.041), (52, 0.021), (59, 0.012), (60, 0.03), (74, 0.352), (77, 0.035), (78, 0.014), (90, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83637327 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

Author: Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka

Abstract: Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition. In this paper, we show that, despite the fact that the optimization problem is non-convex, the global optimal solution of variational Bayesian (VB) MF can be computed analytically by solving a quartic equation. This is highly advantageous over a popular VBMF algorithm based on iterated conditional modes since it can only find a local optimal solution after iterations. We further show that the global optimal solution of empirical VBMF (hyperparameters are also learned from data) can also be analytically computed. We illustrate the usefulness of our results through experiments.

same-paper 2 0.75805593 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu

Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1

3 0.69715452 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

Author: Yanjun Han, Qing Tao, Jue Wang

Abstract: In multi-instance learning, there are two kinds of prediction failure, i.e., false negative and false positive. Current research mainly focus on avoiding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.

4 0.56493217 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1

5 0.56446159 155 nips-2010-Learning the context of a category

Author: Dan Navarro

Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.

6 0.56227452 243 nips-2010-Smoothness, Low Noise and Fast Rates

7 0.56015921 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

8 0.56003606 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs

9 0.55952972 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

10 0.55928379 290 nips-2010-t-logistic regression

11 0.55857807 148 nips-2010-Learning Networks of Stochastic Differential Equations

12 0.55839843 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

13 0.55817717 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

14 0.55817151 27 nips-2010-Agnostic Active Learning Without Constraints

15 0.55806237 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression

16 0.55799663 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

17 0.5579741 201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

18 0.55791771 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information

19 0.55787301 287 nips-2010-Worst-Case Linear Discriminant Analysis

20 0.55764639 1 nips-2010-(RF)^2 -- Random Forest Random Field