nips nips2013 nips2013-220 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. [sent-5, score-0.497]
2 Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. [sent-6, score-0.608]
3 In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. [sent-7, score-1.098]
4 In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. [sent-8, score-0.645]
5 This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. [sent-9, score-0.355]
6 1 Introduction Statistical relational models are capable of representing both probabilistic dependencies and relational structure [1, 2]. [sent-10, score-0.243]
7 Lifted inference algorithms [3] attempt to overcome this problem by exploiting symmetries found in the relational structure of the model. [sent-12, score-0.313]
8 In the absence of evidence, exact lifted inference algorithms can work well. [sent-13, score-0.594]
9 For large classes of statistical relational models [4], they perform inference that is polynomial in the number of objects in the model [5], and are therein exponentially faster than classical inference algorithms. [sent-14, score-0.311]
10 When conditioning a query on a set of evidence literals, however, these lifted algorithms lose their advantage over classical ones. [sent-15, score-0.908]
11 The intuitive reason is that evidence breaks the symmetries in the model. [sent-16, score-0.468]
12 This issue is implicitly reflected in the experiment sections of exact lifted inference papers. [sent-18, score-0.594]
13 Others found ways to efficiently deal with evidence on only unary predicates. [sent-21, score-0.511]
14 They perform experiments without evidence on binary or higher-arity relations. [sent-22, score-0.386]
15 This evidence problem has largely been ignored in the exact lifted inference literature, until recently, when Bui et al. [sent-24, score-0.919]
16 [10] and Van den Broeck and Davis [11] showed that conditioning on unary evidence is tractable. [sent-25, score-0.708]
17 More precisely, conditioning on unary evidence is polynomial in the size of evidence. [sent-26, score-0.646]
18 This type of evidence expresses attributes of objects in the world, but not relations between them. [sent-27, score-0.399]
19 Unfortunately, Van den Broeck and Davis [11] also showed that this tractability does not extend to 1 evidence on binary relations, for which conditioning on evidence is #P-hard. [sent-28, score-0.908]
20 It is clear that some binary evidence is easy to condition on, even if it talks about a large number of objects, for example when all atoms are true (∀X, Y p(X, Y )) or false (∀X, Y ¬ p(X, Y )). [sent-30, score-0.476]
21 As our first main contribution, we formalize this intuition and characterize the complexity of conditioning more precisely in terms of the Boolean rank of the evidence. [sent-31, score-0.295]
22 Despite the limitations, useful applications of exact lifted inference were found by sidestepping the evidence problem. [sent-33, score-0.919]
23 For example, in lifted generative learning [14], the most challenging task is to compute partition functions without evidence. [sent-34, score-0.478]
24 Regardless, the lack of symmetries in real applications is often cited as a reason for rejecting the idea of lifted inference entirely (informally called the “death sentence for lifted inference”). [sent-35, score-1.162]
25 This problem has been avoided for too long, and as lifted inference gains maturity, solving it becomes paramount. [sent-36, score-0.565]
26 As our second main contribution, we present a first general solution to the evidence problem. [sent-37, score-0.325]
27 We propose to approximate evidence by an over-symmetric matrix, and will show that this can be achieved by minimizing Boolean rank. [sent-38, score-0.356]
28 The need for approximating evidence is new and specific to lifted inference: in (undirected) probabilistic graphical models, more evidence typically makes inference easier. [sent-39, score-1.244]
29 The evidence problem is less pronounced in the approximate lifted inference literature. [sent-41, score-0.921]
30 These algorithms often introduce approximations that lead to symmetries in their computation, even when there are no symmetries in the model. [sent-42, score-0.276]
31 Also for approximate methods, however, the benefits of lifting will decrease with the amount of symmetry-breaking evidence (e. [sent-43, score-0.399]
32 We will show experimentally that over-symmetric evidence approximation is also a viable technique for approximate lifted inference. [sent-47, score-0.859]
33 2 Encoding Binary Relations in Unary Our analysis of conditioning is based on a reduction, turning evidence on a binary relation into evidence on several unary predicates. [sent-48, score-1.046]
34 w profpage(X) ∧ linkto(X, Y ) ⇒ coursepage(Y ) In this context, evidence e is a truth-value assignment to a set of ground atoms, and is often represented as a conjunction of literals. [sent-68, score-0.385]
35 Without loss of generality, we assume full evidence on certain predicates (i. [sent-74, score-0.386]
36 1 We will sometimes represent unary evidence as a Boolean vector and binary evidence as a Boolean matrix. [sent-77, score-0.897]
37 1 Partial evidence on the relation p can be encoded as full evidence on predicates p0 and p1 by adding formulas ∀X, Y p(X, Y ) ⇐ p1 (X, Y ) and ∀X, Y ¬ p(X, Y ) ⇐ p0 (X, Y ) to the model. [sent-78, score-0.777]
38 2 Vector-Product Binary Evidence Certain binary relations can be represented by a pair of unary predicates. [sent-83, score-0.321]
39 By adding the formula ∀X, ∀Y, p(X, Y ) ⇔ q(X) ∧ r(Y ) (1) to our statistical relational model and conditioning on the q and r relations, we can condition on certain types of binary p relations. [sent-84, score-0.343]
40 If we now represent these unary relations by vectors q and r, and the binary relation by the binary matrix P, the above technique allows us to condition on any relation P that can be factorized in the outer vector product P = q r . [sent-87, score-0.546]
41 3 Matrix-Product Binary Evidence This idea of encoding a binary relation in unary relations can be generalized to n pairs of unary relations, by adding the following formula to our model. [sent-92, score-0.6]
42 The Boolean and real-valued rank are incomparable, and the Boolean rank can be exponentially smaller than the real-valued rank. [sent-107, score-0.308]
43 4 Complexity of Binary Evidence Our goal in this section is to provide a new complexity result for reasoning with binary evidence in the context of lifted inference. [sent-128, score-0.9]
44 Consider an MLN ∆ and let Γm contain a set of ground literals representing binary evidence. [sent-134, score-0.272]
45 That is, for some binary predicate p(X, Y ), evidence Γm contains precisely one literal (positive or negative) for each grounding of predicate p(X, Y ). [sent-135, score-0.53]
46 Our analysis will apply to classes of models ∆ that are domain-liftable [4], which means that the complexity of computing Prm (q) without evidence is polynomial in m. [sent-140, score-0.391]
47 Our task is then to compute the posterior probability Prm (q|em ), where em is a conjunction of the ground literals in binary evidence Γm . [sent-142, score-0.617]
48 Moreover, our goal here is to characterize the complexity of this computation as a function of evidence size m. [sent-143, score-0.361]
49 Then there exists a domain-liftable MLN ∆ with a corresponding distribution Prm , and a posterior marginal Prm (q|em ) that cannot be computed by any algorithm whose complexity grows polynomially in evidence size m, unless P = N P . [sent-147, score-0.384]
50 Yet, for these models, the complexity of inference can be parametrized, allowing one to bound the complexity of inference on some models. [sent-149, score-0.246]
51 We now provide a similar parameterized complexity result, but for evidence in lifted inference. [sent-153, score-0.839]
52 Suppose that evidence Γm is binary and has a bounded Boolean rank. [sent-156, score-0.386]
53 Then for every domain-liftable MLN ∆ and corresponding distribution Prm , the complexity of computing posterior marginal Prm (q|em ) grows polynomially in evidence size m. [sent-157, score-0.384]
54 The proof of this theorem is based on the reduction from binary to unary evidence, which is described in Section 2. [sent-158, score-0.247]
55 In particular, our reduction first extends the MLN ∆ with Formula 2, leading to the new MLN ∆ and new pairs of unary predicates qi and ri . [sent-159, score-0.301]
56 We then replace binary evidence Γm by unary evidence Γ . [sent-161, score-0.897]
57 That is, the ground literals of the binary predicate p are replaced by ground literals of the unary predicates qi and ri (see Example 4). [sent-162, score-0.845]
58 This unary evidence is obtained by Boolean matrix factorization. [sent-163, score-0.541]
59 The complexity of Boolean matrix factorization for matrices with bounded Boolean rank is polynomial in their size. [sent-166, score-0.307]
60 Hence, when the Boolean rank n is bounded by a constant, the size of the extended MLN ∆ is independent of the evidence size and is proportional to the size of the original MLN ∆. [sent-168, score-0.479]
61 We have now reduced inference on MLN ∆ and binary evidence Γm into inference on an extended MLN ∆ and unary evidence Γ . [sent-169, score-1.071]
62 Then for every domain-liftable MLN ∆ and corresponding distribution Prm , the complexity of computing posterior marginal Prm (q|em ) grows polynomially in evidence size m. [sent-173, score-0.384]
63 Hence, computing posterior probabilities can be done in time which is polynomial in the size of unary evidence m, which completes our proof. [sent-174, score-0.541]
64 Whereas 5 treewidth is a measure of tree-likeness and sparsity of the graphical model, Boolean rank seems to be a fundamentally different property, more related to the presence of symmetries in evidence. [sent-179, score-0.33]
65 Even for evidence with high Boolean rank, it is possible to find a low-rank approximate BMF of the evidence, as is commonly done for other data mining and machine learning problems. [sent-181, score-0.38]
66 The evidence matrix from Example 4 has Boolean rank three. [sent-185, score-0.509]
67 By paying this price, the evidence has more symmetries, and we can condition on the binary relation by introducing only two instead of three new pairs (qi , ri ) of unary predicates. [sent-188, score-0.66]
68 Low-rank approximate BMF is an instance of a more general idea; that of over-symmetric evidence approximation. [sent-189, score-0.356]
69 This means that when we want to compute Pr(q | e), we approximate it by computing Pr(q | e ) instead, with evidence e that permits more efficient inference. [sent-190, score-0.356]
70 Because all lifted inference algorithms, exact or approximate, exploit symmetries, we expect this general idea, and low-rank approximate BMF in particular, to improve the performance of any lifted inference algorithm. [sent-192, score-1.19]
71 Having a small amount of incorrect evidence in the approximation need not be a problem. [sent-193, score-0.376]
72 Hence, a low-rank approximation may actually improve the performance of, for example, a lifted collective classification algorithm. [sent-195, score-0.503]
73 On the other hand, the approximation made in Example 6 may not be desirable if we are querying attributes of the constant c, and we may prefer to approximate other areas of the evidence matrix instead. [sent-196, score-0.411]
74 There are many challenges in finding appropriate evidence approximations, which makes the task all the more interesting. [sent-197, score-0.325]
75 Q3 Is over-symmetric evidence approximation a viable technique for approximate lifted inference? [sent-201, score-0.859]
76 To answer Q1, we compute approximations of the linkto binary relation in the WebKB data set using the ASSO algorithm for approximate BMF [20]. [sent-202, score-0.284]
77 The exact evidence matrix for the linkto relation ranges in size from 861 by 861 to 1240 by 1240. [sent-206, score-0.538]
78 Figure 1 plots the approximation error for increasing Boolean ranks, measured as the number of incorrect evidence literals. [sent-209, score-0.376]
79 The error goes down quickly for low rank, and is reduced by half after Boolean rank 70 to 80, even though the matrix dimensions and real-valued rank are much higher. [sent-210, score-0.338]
80 Note that these evidence matrices contain around a million entries, and are sparse. [sent-211, score-0.325]
81 Figure 1: Approximation BMF error in terms of the number of incorrect literals for the WebKB linkto relation. [sent-216, score-0.287]
82 From top to bottom, the lines represent exact evidence (blue), and approximations (red) of rank 150, 100, 75, 50, 20, 10, 5, 2, and 1. [sent-222, score-0.546]
83 Firstly, we look at exact lifted inference and investigate the influence of adding Formula 2 to the “peer-to-peer” and “hierarchical” MLNs from Example 1. [sent-224, score-0.594]
84 The goals is to condition on linkto relations with increasing rank n. [sent-225, score-0.359]
85 Since the connection between rank and exact inference is obvious from Theorem 2, the more interesting question in Q2 is whether Boolean rank is indicative of the complexity of approximate lifted inference as well. [sent-232, score-1.056]
86 We learn its parameters with the Alchemy package and obtain evidence sets of varying Boolean rank from the factorizations of Figure 1. [sent-237, score-0.479]
87 For these, we run both vanilla and lifted MCMC, and measure the KL divergence (KLD) between the marginal distribution at each iteration4 , and a ground truth obtained from 3 million iterations on the corresponding evidence set. [sent-239, score-0.884]
88 To answer Q3, we look at the KLD between different evidence approximations Pr(. [sent-242, score-0.363]
89 For two approximations ea and eb such that rank a < b, we expect LMCMC to converge faster to Pr(. [sent-247, score-0.246]
90 | eb ) is, the KLD at convergence should be worse for a 3 4 When synthetically generating evidence of these ranks, results are comparable. [sent-253, score-0.353]
91 In Figure 4(a), rank 2 and 10 outperform LMCMC with the exact evidence at first. [sent-264, score-0.508]
92 Exact evidence overtakes rank 2 after 40k iterations, and rank 10 after 50k. [sent-265, score-0.633]
93 Note here that at around iteration 50k, rank 75 in turn outperforms the rank 150 approximation, which has fewer symmetries and does not permit as much lifting. [sent-269, score-0.427]
94 The second is a technique to approximate binary evidence by a low-rank Boolean matrix factorization. [sent-277, score-0.447]
95 This is a first type of over-symmetric evidence approximation that can speed up lifted inference. [sent-278, score-0.828]
96 For future work, we want to evaluate the practical implications of the theory developed for other lifted inference algorithms, such as lifted BP, and look at the performance of over-symmetric evidence approximation on machine learning tasks such as collective classification. [sent-280, score-1.393]
97 Furthermore, we want to investigate other subsets of binary relations for which conditioning could be efficient, in particular functional relations p(X, Y ), where each X has at most a limited number of associated Y values. [sent-286, score-0.314]
98 On the completeness of first-order knowledge compilation for lifted probabilistic inference. [sent-304, score-0.527]
99 Exact lifted inference with distinct soft evidence on every object. [sent-335, score-0.89]
100 Conditioning in first-order knowledge compilation and lifted probabilistic inference. [sent-338, score-0.527]
wordName wordTfidf (topN-words)
[('lifted', 0.478), ('boolean', 0.434), ('evidence', 0.325), ('bmf', 0.261), ('unary', 0.186), ('mln', 0.165), ('rank', 0.154), ('literals', 0.151), ('kld', 0.137), ('lmcmc', 0.124), ('prm', 0.124), ('symmetries', 0.119), ('mcmc', 0.111), ('linkto', 0.11), ('relational', 0.107), ('conditioning', 0.105), ('broeck', 0.097), ('den', 0.092), ('inference', 0.087), ('webkb', 0.082), ('relations', 0.074), ('kersting', 0.073), ('ranks', 0.071), ('atoms', 0.069), ('pauli', 0.069), ('guy', 0.064), ('binary', 0.061), ('predicate', 0.061), ('predicates', 0.061), ('ground', 0.06), ('davis', 0.058), ('factorization', 0.057), ('treewidth', 0.057), ('mlns', 0.055), ('wfomc', 0.055), ('logical', 0.053), ('formula', 0.049), ('salvo', 0.048), ('pr', 0.047), ('miettinen', 0.045), ('relation', 0.044), ('lifting', 0.043), ('propositional', 0.042), ('jesse', 0.042), ('studentpage', 0.041), ('taneli', 0.041), ('circuit', 0.04), ('approximations', 0.038), ('van', 0.037), ('complexity', 0.036), ('braz', 0.036), ('mathias', 0.036), ('bui', 0.036), ('web', 0.033), ('qi', 0.031), ('algebra', 0.031), ('approximate', 0.031), ('atom', 0.03), ('polynomial', 0.03), ('matrix', 0.03), ('probabilistic', 0.029), ('exact', 0.029), ('cornell', 0.028), ('eb', 0.028), ('fove', 0.027), ('nnf', 0.027), ('wisconsin', 0.027), ('incorrect', 0.026), ('ea', 0.026), ('parametrized', 0.026), ('outer', 0.025), ('approximation', 0.025), ('aaai', 0.025), ('breaks', 0.024), ('gautam', 0.024), ('heikki', 0.024), ('ahmadi', 0.024), ('luc', 0.024), ('meert', 0.024), ('taghipour', 0.024), ('wannes', 0.024), ('aristides', 0.024), ('peer', 0.024), ('mining', 0.024), ('polynomially', 0.023), ('ri', 0.023), ('kl', 0.023), ('tiling', 0.022), ('literal', 0.022), ('mielik', 0.022), ('formulas', 0.022), ('decomposition', 0.022), ('divergence', 0.021), ('condition', 0.021), ('gogate', 0.021), ('ijcai', 0.021), ('em', 0.02), ('compilation', 0.02), ('arti', 0.019), ('gionis', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
2 0.34763312 122 nips-2013-First-order Decomposition Trees
Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel
Abstract: Lifting attempts to speedup probabilistic inference by exploiting symmetries in the model. Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree. 1
3 0.081892252 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors
Author: Amit Daniely, Nati Linial, Shai Shalev-Shwartz
Abstract: The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a natural supervised learning problem — we consider agnostic PAC learning of halfspaces over 3-sparse vectors in {−1, 1, 0}n . This class is inefficiently learnable using O n/ 2 examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random 3CNF formulas is hard, it is impossible to efficiently learn this class using only O n/ 2 examples. We further show that under stronger hardness assumptions, even O n1.499 / 2 examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently ˜ using Ω n2 / 2 examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem. 1
4 0.080969848 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
Author: James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel
Abstract: This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM’s unnormalized log-likelihood function as a type of neural network, and through a series of simulation results relate these networks to ones whose representational properties are better understood. We show the surprising result that RBMs can efficiently capture any distribution whose density depends on the number of 1’s in their input. We also provide the first known example of a particular type of distribution that provably cannot be efficiently represented by an RBM, assuming a realistic exponential upper bound on the weights. By formally demonstrating that a relatively simple distribution cannot be represented efficiently by an RBM our results provide a new rigorous justification for the use of potentially more expressive generative models, such as deeper ones. 1
5 0.065037258 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling
Author: Matthew Johnson, James Saunderson, Alan Willsky
Abstract: Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going “Hogwild” and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean. 1
6 0.062346943 186 nips-2013-Matrix factorization with binary components
7 0.058145586 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
8 0.053992078 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
9 0.05246897 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
10 0.052364036 185 nips-2013-Matrix Completion From any Given Set of Observations
11 0.049162943 161 nips-2013-Learning Stochastic Inverses
12 0.046385184 143 nips-2013-Integrated Non-Factorized Variational Inference
13 0.046029869 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
14 0.042922776 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
15 0.040438108 75 nips-2013-Convex Two-Layer Modeling
16 0.040389333 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
17 0.039842047 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
18 0.03706862 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
19 0.036756098 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion
20 0.035933044 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
topicId topicWeight
[(0, 0.109), (1, 0.044), (2, -0.006), (3, 0.045), (4, 0.022), (5, 0.023), (6, 0.049), (7, -0.03), (8, 0.048), (9, -0.01), (10, 0.02), (11, 0.016), (12, 0.066), (13, 0.02), (14, -0.036), (15, 0.004), (16, -0.046), (17, -0.017), (18, -0.014), (19, -0.006), (20, -0.007), (21, -0.01), (22, -0.003), (23, -0.028), (24, 0.036), (25, 0.003), (26, -0.008), (27, -0.02), (28, 0.057), (29, 0.071), (30, -0.055), (31, -0.162), (32, -0.025), (33, 0.058), (34, 0.001), (35, -0.107), (36, 0.031), (37, 0.033), (38, 0.146), (39, 0.017), (40, -0.371), (41, 0.091), (42, 0.142), (43, -0.25), (44, -0.127), (45, 0.104), (46, 0.053), (47, 0.239), (48, 0.01), (49, -0.19)]
simIndex simValue paperId paperTitle
same-paper 1 0.95760429 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
2 0.92565817 122 nips-2013-First-order Decomposition Trees
Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel
Abstract: Lifting attempts to speedup probabilistic inference by exploiting symmetries in the model. Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree. 1
3 0.34471166 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
Author: Mijung Park, Jonathan W. Pillow
Abstract: The receptive field (RF) of a sensory neuron describes how the neuron integrates sensory stimuli over time and space. In typical experiments with naturalistic or flickering spatiotemporal stimuli, RFs are very high-dimensional, due to the large number of coefficients needed to specify an integration profile across time and space. Estimating these coefficients from small amounts of data poses a variety of challenging statistical and computational problems. Here we address these challenges by developing Bayesian reduced rank regression methods for RF estimation. This corresponds to modeling the RF as a sum of space-time separable (i.e., rank-1) filters. This approach substantially reduces the number of parameters needed to specify the RF, from 1K-10K down to mere 100s in the examples we consider, and confers substantial benefits in statistical power and computational efficiency. We introduce a novel prior over low-rank RFs using the restriction of a matrix normal prior to the manifold of low-rank matrices, and use “localized” row and column covariances to obtain sparse, smooth, localized estimates of the spatial and temporal RF components. We develop two methods for inference in the resulting hierarchical model: (1) a fully Bayesian method using blocked-Gibbs sampling; and (2) a fast, approximate method that employs alternating ascent of conditional marginal likelihoods. We develop these methods for Gaussian and Poisson noise models, and show that low-rank estimates substantially outperform full rank estimates using neural data from retina and V1. 1
4 0.34175122 186 nips-2013-Matrix factorization with binary components
Author: Martin Slawski, Matthias Hein, Pavlo Lutsik
Abstract: Motivated by an application in computational biology, we consider low-rank matrix factorization with {0, 1}-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size 2m·r , where m is the dimension of the data points and r the rank of the factorization. Despite apparent intractability, we provide − in the line of recent work on non-negative matrix factorization by Arora et al. (2012)− an algorithm that provably recovers the underlying factorization in the exact case with O(mr2r + mnr + r2 n) operations for n datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.
5 0.33851773 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum
Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difficult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that define flexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the fidelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1
6 0.33834183 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
7 0.33340669 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
8 0.31787592 350 nips-2013-Wavelets on Graphs via Deep Learning
9 0.3048723 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
10 0.29694664 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors
11 0.29497758 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
12 0.28754562 184 nips-2013-Marginals-to-Models Reducibility
13 0.28671086 134 nips-2013-Graphical Models for Inference with Missing Data
14 0.27937597 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
15 0.27798635 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
16 0.27370796 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling
17 0.27219751 352 nips-2013-What do row and column marginals reveal about your dataset?
18 0.26576909 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
19 0.24146332 167 nips-2013-Learning the Local Statistics of Optical Flow
20 0.23532261 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
topicId topicWeight
[(16, 0.022), (33, 0.097), (34, 0.116), (41, 0.021), (49, 0.022), (56, 0.059), (70, 0.028), (85, 0.026), (89, 0.018), (93, 0.022), (95, 0.472)]
simIndex simValue paperId paperTitle
same-paper 1 0.80430353 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
2 0.72725832 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
3 0.65934086 309 nips-2013-Statistical Active Learning Algorithms
Author: Maria-Florina Balcan, Vitaly Feldman
Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1
4 0.61526012 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
Author: Po-Ling Loh, Martin J. Wainwright
Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1
5 0.58109009 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
6 0.5300746 185 nips-2013-Matrix Completion From any Given Set of Observations
7 0.49195981 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
8 0.48430946 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
9 0.47509813 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs
10 0.46512839 122 nips-2013-First-order Decomposition Trees
11 0.44212317 184 nips-2013-Marginals-to-Models Reducibility
12 0.44126558 149 nips-2013-Latent Structured Active Learning
13 0.43616408 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
14 0.43066645 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
15 0.42894879 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
16 0.4258413 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
17 0.42563528 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
18 0.41995606 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
19 0.41284651 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
20 0.41037029 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search