nips nips2013 nips2013-291 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel S. Levine, Jonathan P. How
Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. [sent-4, score-0.235]
2 For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. [sent-5, score-0.686]
3 We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. [sent-6, score-0.434]
4 Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. [sent-7, score-0.145]
5 We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. [sent-8, score-0.141]
6 1 Introduction This paper addresses the problem of focused active inference: selecting a subset of observable random variables that is maximally informative with respect to a specified subset of latent random variables. [sent-9, score-0.227]
7 The subset selection problem is motivated by the desire to reduce the overall cost of inference while providing greater inferential accuracy. [sent-10, score-0.136]
8 For example, in the context of sensor networks, control of the data acquisition process can lead to lower energy expenses in terms of sensing, computation, and communication [1, 2]. [sent-11, score-0.096]
9 On their own, nuisances are not of any extrinsic importance to the uncertainty reduction task and merely serve as intermediaries when describing statistical relationships, as encoded with the joint distribution, between variables. [sent-13, score-0.104]
10 The structure in the joint can be represented parsimoniously with a probabilistic graphical model, often leading to efficient inference algorithms [3, 4, 5]. [sent-14, score-0.121]
11 However, marginalization of nuisance variables is potentially expensive and can mar the very sparsity of the graphical model that permitted efficient inference. [sent-15, score-0.187]
12 Therefore, we seek methods for selecting informative subsets of observations in graphical models that retain nuisance variables. [sent-16, score-0.15]
13 Observation random variables and relevant latent variables may be nonadjacent in the graphical model due to the interposition of nuisances between them, requiring the development of information measures that extend beyond adjacency (alternatively, locality) in the graph. [sent-18, score-0.277]
14 More generally, the absence of certain conditional independencies, particularly between observations conditioned on the relevant latent variable set, means that one cannot directly apply the performance bounds associated with submodularity [6, 7, 8]. [sent-19, score-0.217]
15 1 In an effort to pave the way for analyzing focused active inference on the class of general distributions, this paper specifically examines multivariate Gaussian distributions – which exhibit a number of properties amenable to analysis – and later specializes to Gaussian trees. [sent-20, score-0.224]
16 This paper presents a decomposition of pairwise nonlocal mutual information (MI) measures on Gaussian graphs that permits efficient information valuation, e. [sent-21, score-0.429]
17 Both the valuation and subsequent selection may be distributed over nodes in the network, which can be of benefit for high-dimensional distributions and/or large-scale distributed sensor networks. [sent-24, score-0.311]
18 It is also shown how an augmentation to the relevant set can lead to an online-computable performance bound for general distributions with nuisances. [sent-25, score-0.095]
19 The nonlocal MI decomposition extensively exploits properties of Gaussian distributions, Markov random fields, and Gaussian belief propagation (GaBP), which are reviewed in Section 2. [sent-26, score-0.382]
20 The formal problem statement of focused active inference is stated in Section 3, along with an example that contrasts focused and unfocused selection. [sent-27, score-0.585]
21 Section 4 presents pairwise nonlocal MI decompositions for scalar and vectoral Gaussian Markov random fields. [sent-28, score-0.723]
22 Section 5 shows how to integrate pairwise nonlocal MI into a distributed greedy selection algorithm for the focused active inference problem; this algorithm is benchmarked in Section 6. [sent-29, score-0.696]
23 A performance bound applicable to any focused selector is presented in Section 7. [sent-30, score-0.24]
24 A u-v path is a finite sequence of adjacent vertices, starting with vertex u and terminating at vertex v, that does not repeat any vertex. [sent-34, score-0.116]
25 If |PG (u, v)| = 1, then there is a unique path between u and v, and denote the sole element of PG (u, v) ¯ by Pu:v . [sent-37, score-0.17]
26 A chain is said to be embedded in graph G if the nodes in the chain comprise a unique path in G. [sent-43, score-0.461]
27 For MRFs, the global Markov property relates connectivity in the graph to implied conditional independencies. [sent-44, score-0.099]
28 (Note that the conditional precision matrix is independent of the value of the realized x2 . [sent-54, score-0.096]
29 3 Gaussian MRFs (GMRFs) If x ∼ N −1 (h, J), the conditional independence structure of px (·) can be represented with a Gaussian MRF (GMRF) G = (V, E), where E is determined by the sparsity pattern of J and the pairwise Markov property: {i, j} ∈ E iff Jij = 0. [sent-66, score-0.183]
30 In a scalar GMRF, V indexes scalar components of x. [sent-67, score-0.314]
31 In a vectoral GMRF, V indexes disjoint subvectors of x, each of potentially different dimension. [sent-68, score-0.378]
32 The block submatrix Jii can be thought of as specifying the sparsity pattern of the scalar micro-network within the vectoral macro-node i ∈ V. [sent-69, score-0.381]
33 4 Gaussian Belief Propagation (GaBP) If x can be partitioned into n subvectors of dimension at most d, and the resulting graph is treeshaped, then all marginal precision matrices Ji , i ∈ V can be computed by Gaussian belief propagation (GaBP) [10] in O(n · d3 ). [sent-71, score-0.288]
34 In light of (3), pairwise MI quantities between adjacent nodes i and j may be expressed as I(xi ; xj ) = H(xi ) + H(xj ) − H(xi , xj ), 1 1 1 = − ln det(Ji ) − ln det(Jj ) + ln det(J{i,j} ), 2 2 2 {i, j} ∈ E, (4) i. [sent-73, score-0.237]
35 , purely in terms of node and edge marginal precision matrices. [sent-75, score-0.154]
36 Moreover, the graphical inference community appears to best understand the convergence of message passing algorithms for continuous distributions on subclasses of multivariate Gaussians (e. [sent-78, score-0.241]
37 3 Problem Statement Let px (·) = N −1 (·; h, J) be represented by GMRF G = (V, E), and consider a partition of V into the subsets of latent nodes U and observable nodes S, with R ⊆ U denoting the subset of relevant latent variables (i. [sent-81, score-0.434]
38 Given a cost function c : 2S → R≥0 over subsets of observations, and a budget β ∈ R≥0 , the focused active inference problem is maximizeA⊆S s. [sent-84, score-0.224]
39 (5) The focused active inference problem in (5) is distinguished from the unfocused active inference problem maximizeA⊆S s. [sent-87, score-0.559]
40 By the chain rule and nonnegativity of MI, I(xU ; xA ) = I(xR ; xA ) + I(xU \R ; xA | xR ) ≥ I(xR ; xA ), for any A ⊆ S. [sent-91, score-0.138]
41 Therefore, maximizing unfocused MI does not imply maximizing focused MI. [sent-92, score-0.361]
42 Focused active inference must be posed as a separate problem to avoid the situation where the observation selector becomes fixated on inferring nuisance variables as a result of I(xU \R ; xA | xR ) being included implicitly in the valuation. [sent-93, score-0.286]
43 In fact, an unfocused selector can perform arbitrarily poorly with respect to a focused metric, as the following example illustrates. [sent-94, score-0.476]
44 Consider a scalar GMRF over a four-node chain (Figure 1a), whereby J13 = J14 = J24 = 0 by the pairwise Markov property, with R = {2}, S = {1, 4}, c(A) = |A| (i. [sent-96, score-0.27]
45 The optimal unfocused decision rule A∗ F ) = argmaxa∈{1,4} I(x2 , x3 ; xa ) (U can be shown, by conditional independence and positive definiteness of J, to reduce to A∗ F ) ={4} (U |J34 | |J12 |, A∗ F ) ={1} (U independent of J23 , which parameterizes the edge potential between nodes 2 and 3. [sent-99, score-0.863]
46 The reason for this loss is that as |J23 | → 0+ , the information that node 3 can convey about node 2 also approaches zero, although the unfocused decision rule is oblivious to this fact. [sent-102, score-0.394]
47 There exists a range of values for |J23 | such that the unfocused and focused policies coincide; however, as |J23 | → 0+ , the unfocused policy approaches complete performance loss with respect to the focused measure. [sent-126, score-0.763]
48 ¯ Figure 2: (a) Example of a nontree graph G with a unique path P1:k between nodes 1 and k. [sent-132, score-0.372]
49 (b) Example of a vectoral graph with ˜ “sidegraph” attached to each node i ∈ P thin edges, with internal (scalar) structure depicted. [sent-134, score-0.411]
50 4 Nonlocal MI Decomposition For GMRFs with n nodes indexing d-dimensional random subvectors, I(xR ; xA ) can be computed exactly in O((nd)3 ) via Schur complements/inversions on the precision matrix J. [sent-135, score-0.185]
51 However, certain graph structures permit the computation via belief propagation of all local pairwise MI terms I(xi ; xj ), for adjacent nodes i, j ∈ V in O(n · d3 ) – a substantial savings for large networks. [sent-136, score-0.385]
52 This section describes a transformation of nonlocal MI between uniquely path-connected nodes that permits a decomposition into the sum of transformed local MI quantities, i. [sent-137, score-0.422]
53 Furthermore, the local MI terms can be transformed in constant time, yielding an O(n · d3 ) for computing any pairwise nonlocal MI quantity coinciding with a unique path. [sent-140, score-0.382]
54 For disjoint subsets A, B, C ⊆ V, the warped mutual information measure W : 2V × 2V × 2V → (−∞, 0] is defined such that W (A; B|C) 1 2 log (1 − exp {−2I(xA ; xB |xC )}). [sent-142, score-0.192]
55 For i, j ∈ V indexing scalar nodes, the warped MI of Definition 1 reduces to W (i; j) = log |ρij |, where ρij ∈ [−1, 1] is the correlation coefficient between scalar r. [sent-145, score-0.358]
56 The measure log |ρij | has long been known to the graphical model learning community as an “additive tree distance” [15, 16], and our decomposition for vectoral graphs is a novel application for sensor selection problems. [sent-148, score-0.477]
57 Proposition 3 requires only that the path between vertices u and v be unique. [sent-154, score-0.132]
58 However, the result holds on any graph for which: the subgraph ¯ ¯ ¯ ¯ induced by Pu:v is a chain; and every i ∈ Pu:v separates N (i) \ Pu:v from Pu:v \ {i}, where N (i) {j : {i, j} ∈ E} is the neighbor set of i. [sent-156, score-0.12]
59 See Figure 2a for an example of a nontree graph with a unique path. [sent-157, score-0.167]
60 An edge {i, j} ∈ E of GMRF G = (V, E; J) is thin if the corresponding submatrix Jij has exactly one nonzero scalar component. [sent-159, score-0.283]
61 ) For vectoral problems, each node may contain a subnetwork of arbitrarily connected scalar random variables (see Figure 2b). [sent-161, score-0.405]
62 Under the assumption of thin edges (Definition 5), a unique path between nodes u and v must enter interstitial nodes through one scalar r. [sent-162, score-0.728]
63 s of ¯ interstitial vectoral node i on Pu:v , with conditioning set C ⊆ V \ {u, v}. [sent-168, score-0.41]
64 For any GMRF G = (V, E) where V indexes random vectors of dimension at most d and the edges in E are thin, if |PG (u, v)| = 1 for distinct vertices u, v ∈ V, then for any C ⊆ V \ {u, v}, I(xu ; xv |xC ) can be decomposed as W (u; v|C) = W (i; j|C) + 5 ζi (u, v|C). [sent-171, score-0.147]
65 Running GaBP on the graph G conditioned on A and subsequently computing all terms W (i; j|A), ∀{i, j} ∈ E incurs a computational cost of O(n · d3 ). [sent-174, score-0.101]
66 Each neighbor i ∈ N (r) receives that message with value modified by W (r; i|A); there is no ζ term because there are no interstitial nodes between r and its neighbors. [sent-176, score-0.276]
67 Since there are at most n−1 edges in a forest, the total cost of dissemination is still O(n·d3 ), after which all nodes y in the same component as r will have received an r-message whose value on arrival is W (r; y|A), from which I(xr ; xy |A) can be computed in constant time. [sent-179, score-0.282]
68 Thus, for |R| = 1, all scores I(xR ; xy |xA ) for y ∈ S \ A can collectively be computed at each iteration of the greedy algorithm in O(n · d3 ). [sent-180, score-0.213]
69 Then, by the chain rule of mutual information, I(xR ; xy | xA ) = |R| k=1 I(xrk ; xy | xA∪Rk−1 ), y ∈ S \ A, where each term in the sum is a pairwise (potentially nonlocal) MI evaluation. [sent-186, score-0.463]
70 The implication is that one can run |R| separate instances of GaBP, each using a different conditioning set A ∪ Rk−1 , to compute “node and edge weights” (W and ζ terms) for the r-message passing scheme outlined above. [sent-187, score-0.145]
71 The chain rule suggests one should then sum the unwarped r-scores of these |R| instances to yield the scores I(xR ; xy |xA ) for y ∈ S \ A. [sent-188, score-0.216]
72 The total cost of a greedy update is then O |R| · nd3 . [sent-189, score-0.101]
73 One of the benefits of the focused greedy selection algorithm is its amenability to parallelization. [sent-190, score-0.275]
74 1 As node i may have additional neighbors that are not on the u-v path, using the notation ζi (u, v|C) is a convenient way to implicitly specify the enter/exit scalar r. [sent-195, score-0.193]
75 Any unique path subsuming u-v, or any unique path subsumed in u-v for which i is interstitial, will have equivalent ζi terms. [sent-198, score-0.274]
76 2 If i is in the conditioning set, its outgoing message can be set to be −∞, so that the nodes it blocks from reaching r see an apparent information score of 0. [sent-199, score-0.247]
77 6 It should also be noted that if the quantification is instead performed using serial BP – which can be conceptualized as choosing an arbitrary root, collecting messages from the leaves up to the root, and disseminating messages back down again – a factor of 2 savings can be achieved for R2 , . [sent-201, score-0.197]
78 , R|R| by noting that in moving between instances k and k + 1, only rk is added to the conditioning set. [sent-204, score-0.108]
79 , A ∪ Rk as the conditioning set), only the second half of the message passing schedule (disseminating messages from the root to the leaves) is necessary. [sent-207, score-0.249]
80 We compare our algorithm with greedy selectors that use matrix inversion (with cubic complexity) to compute nonlocal mutual information measures. [sent-210, score-0.46]
81 At each iteration of the greedy selector, the blocked inversion-based quantifier computes first JR∪Sfeas |A (entailing a block marginalization of nuisances), from which JR|A and JR|A∪y , ∀y ∈ Sfeas , are computed. [sent-212, score-0.138]
82 Then I(xR ; xy | xA ), ∀y ∈ Sfeas , are computed via a variant of (3). [sent-213, score-0.112]
83 The na¨ve ı inversion-based quantifier computes I(xR ; xy | xA ), ∀y ∈ Sfeas , “from scratch” by using separate Schur complements of J submatrices and not storing intermediate results. [sent-214, score-0.112]
84 For each n, the mean of the runtimes over 20 random scalar problem instances is displayed. [sent-217, score-0.132]
85 Figure 3 shows the comparative mean runtime performance of each of the quantifiers for scalar networks of size n, where the mean is taken over the 20 problem instances proposed for each value of n. [sent-219, score-0.179]
86 Each problem instance consists of a randomly generated, symmetric, positive-definite, treeshaped precision matrix J, along with a randomly labeled S (such that, arbitrarily, |S| = 0. [sent-220, score-0.104]
87 Note that all selectors return the same greedy selection; we are concerned with how the decompositions proposed in this paper aid in the computational performance. [sent-222, score-0.2]
88 Conversely, the behavior of the BP-based quantifiers empirically confirms the asymptotic O(n) complexity of our method for scalar networks. [sent-224, score-0.132]
89 7 7 Performance Bounds Due to the presence of nuisances in the model, even if the subgraph induced by S is completely disconnected, it is not always the case that the nodes in S are conditionally independent when conditioned on only the relevant latent set R. [sent-225, score-0.428]
90 Lack of conditional independence means one cannot guarantee submodularity of the information measure, as per [6]. [sent-226, score-0.118]
91 ˆ ˆ Let R be any subset such that R ⊂ R ⊆ U and such that nodes in S are conditionally independent ˆ Then, by Corollary 4 of [6], I(x ˆ ; xA ) is submodular and nondecreasing on S. [sent-228, score-0.128]
92 (12) ¯ ˆ δR (A,R) ¯ ˆ Proposition 7 can be used at runtime to determine what percentage δR (A, R) of the optimal objective is guaranteed, for any focused selector, despite the lack of conditional independence of S conditioned on R. [sent-236, score-0.289]
93 In order to compute the bound, a greedy heuristic running on a separate, surroˆ ˆ gate problem with R as the relevant set is required. [sent-237, score-0.15]
94 8 Conclusion In this paper, we have considered the sensor selection problem on multivariate Gaussian distributions that, in order to preserve a parsimonious representation, contain nuisances. [sent-239, score-0.145]
95 For pairs of nodes connected in the graph by a unique path, there exist decompositions of nonlocal mutual information into local MI measures that can be computed efficiently from the output of message passing algorithms. [sent-240, score-0.742]
96 For tree-shaped models, we have presented a greedy selector where the computational expense of quantification can be distributed across nodes in the network. [sent-241, score-0.377]
97 Despite deficiency in conditional independence of observations, we have derived an online-computable performance bound based on an augmentation of the relevant set. [sent-242, score-0.171]
98 An information-based approach to sensor management in large dynamic networks. [sent-254, score-0.096]
99 Correctness of belief propagation in Gaussian graphical models of arbitrary topology. [sent-323, score-0.166]
100 Feedback message passing for inference in gaussian graphical models. [sent-340, score-0.289]
wordName wordTfidf (topN-words)
[('xa', 0.351), ('nonlocal', 0.252), ('gabp', 0.236), ('unfocused', 0.236), ('mi', 0.214), ('xr', 0.213), ('vectoral', 0.212), ('gmrf', 0.166), ('pg', 0.158), ('xb', 0.133), ('scalar', 0.132), ('nodes', 0.128), ('pu', 0.126), ('focused', 0.125), ('diam', 0.118), ('sfeas', 0.118), ('selector', 0.115), ('xy', 0.112), ('det', 0.106), ('nuisances', 0.104), ('greedy', 0.101), ('quanti', 0.098), ('sensor', 0.096), ('quant', 0.094), ('warped', 0.094), ('interstitial', 0.083), ('subvectors', 0.083), ('graphical', 0.078), ('thin', 0.078), ('path', 0.077), ('messages', 0.075), ('nuisance', 0.072), ('pairwise', 0.07), ('chain', 0.068), ('mutual', 0.065), ('message', 0.065), ('schur', 0.063), ('ag', 0.063), ('ja', 0.062), ('xc', 0.061), ('node', 0.061), ('graph', 0.06), ('subgraph', 0.06), ('unique', 0.06), ('precision', 0.057), ('decompositions', 0.057), ('active', 0.056), ('passing', 0.055), ('vertices', 0.055), ('conditioning', 0.054), ('rk', 0.054), ('bp', 0.052), ('jr', 0.051), ('indexes', 0.05), ('selection', 0.049), ('relevant', 0.049), ('gaussian', 0.048), ('propagation', 0.047), ('mrfs', 0.047), ('runtime', 0.047), ('disseminating', 0.047), ('inv', 0.047), ('maximizea', 0.047), ('nontree', 0.047), ('treeshaped', 0.047), ('trees', 0.047), ('xu', 0.046), ('augmentation', 0.046), ('latent', 0.046), ('inferential', 0.044), ('inference', 0.043), ('decomposition', 0.042), ('submodularity', 0.042), ('choi', 0.042), ('edges', 0.042), ('gmrfs', 0.042), ('lids', 0.042), ('selectors', 0.042), ('belief', 0.041), ('policy', 0.041), ('ji', 0.041), ('conditioned', 0.041), ('proposition', 0.04), ('eu', 0.04), ('adjacent', 0.039), ('conditional', 0.039), ('jij', 0.038), ('valuation', 0.038), ('independence', 0.037), ('px', 0.037), ('submatrix', 0.037), ('marginalization', 0.037), ('edge', 0.036), ('rule', 0.036), ('markov', 0.036), ('nonnegativity', 0.034), ('java', 0.034), ('expense', 0.033), ('disjoint', 0.033), ('sole', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
Author: Daniel S. Levine, Jonathan P. How
Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1
2 0.11444416 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar
Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1
3 0.11188758 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
Author: Marcelo Fiori, Pablo Sprechmann, Joshua Vogelstein, Pablo Muse, Guillermo Sapiro
Abstract: Graph matching is a challenging problem with very important applications in a wide range of fields, from image and video analysis to biological and biomedical problems. We propose a robust graph matching algorithm inspired in sparsityrelated techniques. We cast the problem, resembling group or collaborative sparsity formulations, as a non-smooth convex optimization problem that can be efficiently solved using augmented Lagrangian techniques. The method can deal with weighted or unweighted graphs, as well as multimodal data, where different graphs represent different types of data. The proposed approach is also naturally integrated with collaborative graph inference techniques, solving general network inference problems where the observed variables, possibly coming from different modalities, are not in correspondence. The algorithm is tested and compared with state-of-the-art graph matching techniques in both synthetic and real graphs. We also present results on multimodal graphs and applications to collaborative inference of brain connectivity from alignment-free functional magnetic resonance imaging (fMRI) data. The code is publicly available. 1
4 0.1071578 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
Author: Bogdan Savchynskyy, Jörg Hendrik Kappes, Paul Swoboda, Christoph Schnörr
Abstract: We consider energy minimization for undirected graphical models, also known as the MAP-inference problem for Markov random fields. Although combinatorial methods, which return a provably optimal integral solution of the problem, made a significant progress in the past decade, they are still typically unable to cope with large-scale datasets. On the other hand, large scale datasets are often defined on sparse graphs and convex relaxation methods, such as linear programming relaxations then provide good approximations to integral solutions. We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem. Based on the information obtained from the solution of the convex relaxation, our method confines application of the combinatorial solver to a small fraction of the initial graphical model, which allows to optimally solve much larger problems. We demonstrate the efficacy of our approach on a computer vision energy minimization benchmark. 1
5 0.097501568 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
6 0.093864687 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
7 0.093800187 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
8 0.087048151 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
9 0.084542021 149 nips-2013-Latent Structured Active Learning
10 0.08266025 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
11 0.080395706 289 nips-2013-Scalable kernels for graphs with continuous attributes
12 0.077010371 50 nips-2013-Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search
13 0.07650166 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
14 0.074286282 215 nips-2013-On Decomposing the Proximal Map
15 0.073747516 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
16 0.073636778 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
17 0.071897961 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
18 0.070393629 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
19 0.069937743 66 nips-2013-Computing the Stationary Distribution Locally
20 0.069454372 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
topicId topicWeight
[(0, 0.194), (1, 0.006), (2, -0.018), (3, 0.055), (4, 0.047), (5, 0.106), (6, 0.034), (7, -0.133), (8, 0.037), (9, -0.005), (10, 0.056), (11, -0.118), (12, 0.128), (13, -0.083), (14, -0.021), (15, -0.034), (16, -0.032), (17, 0.029), (18, -0.003), (19, -0.025), (20, 0.019), (21, 0.054), (22, 0.038), (23, -0.1), (24, 0.071), (25, 0.02), (26, 0.013), (27, 0.128), (28, 0.041), (29, 0.094), (30, 0.057), (31, 0.029), (32, -0.025), (33, 0.064), (34, 0.084), (35, 0.05), (36, 0.033), (37, 0.059), (38, -0.014), (39, -0.013), (40, 0.017), (41, -0.043), (42, -0.058), (43, -0.11), (44, 0.074), (45, -0.017), (46, 0.001), (47, -0.005), (48, -0.047), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.9481644 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
Author: Daniel S. Levine, Jonathan P. How
Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1
2 0.77017021 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar
Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1
3 0.70482606 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
4 0.6675145 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
Author: Nils E. Napp, Ryan P. Adams
Abstract: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
5 0.65547442 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
6 0.64148402 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
7 0.62036777 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
8 0.6060521 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
9 0.59430814 332 nips-2013-Tracking Time-varying Graphical Structure
10 0.57938707 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
11 0.57626402 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
12 0.5435096 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
13 0.53569275 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.52672148 340 nips-2013-Understanding variable importances in forests of randomized trees
15 0.5239563 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
16 0.51889688 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
17 0.50193805 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
18 0.49693644 47 nips-2013-Bayesian Hierarchical Community Discovery
19 0.48511812 149 nips-2013-Latent Structured Active Learning
20 0.47661421 168 nips-2013-Learning to Pass Expectation Propagation Messages
topicId topicWeight
[(2, 0.013), (16, 0.043), (33, 0.142), (34, 0.067), (41, 0.048), (43, 0.231), (49, 0.044), (56, 0.101), (70, 0.033), (85, 0.08), (89, 0.034), (93, 0.053), (95, 0.036)]
simIndex simValue paperId paperTitle
1 0.80684376 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
Author: Vibhav Vineet, Carsten Rother, Philip Torr
Abstract: Many methods have been proposed to solve the problems of recovering intrinsic scene properties such as shape, reflectance and illumination from a single image, and object class segmentation separately. While these two problems are mutually informative, in the past not many papers have addressed this topic. In this work we explore such joint estimation of intrinsic scene properties recovered from an image, together with the estimation of the objects and attributes present in the scene. In this way, our unified framework is able to capture the correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene. For example, our model is able to enforce the condition that if a set of pixels take same object label, e.g. table, most likely those pixels would receive similar reflectance values. We cast the problem in an energy minimization framework and demonstrate the qualitative and quantitative improvement in the overall accuracy on the NYU and Pascal datasets. 1
same-paper 2 0.80107903 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
Author: Daniel S. Levine, Jonathan P. How
Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1
3 0.79603589 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality
Author: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
4 0.6683889 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
5 0.66570079 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack
Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1
6 0.66526031 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
7 0.66388202 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
8 0.66248661 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
9 0.66104257 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
10 0.66066509 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
11 0.66027683 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
12 0.65984154 232 nips-2013-Online PCA for Contaminated Data
13 0.65871459 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
14 0.65823239 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
15 0.65806121 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
16 0.65768123 350 nips-2013-Wavelets on Graphs via Deep Learning
17 0.65689361 352 nips-2013-What do row and column marginals reveal about your dataset?
18 0.65656924 233 nips-2013-Online Robust PCA via Stochastic Optimization
19 0.6563409 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
20 0.65583813 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs