nips nips2013 nips2013-184 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. [sent-5, score-0.276]
2 We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. [sent-6, score-0.281]
3 Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. [sent-7, score-0.579]
4 1 Introduction The movement between the specification of “local” marginals and models for complete joint distributions is ingrained in the language and methods of modern probabilistic inference. [sent-9, score-0.414]
5 For instance, in Bayesian networks, we begin with a (perhaps partial) specification of local marginals or CPTs, which then allows us to construct a graphical model for the full joint distribution. [sent-10, score-0.411]
6 In turn, this allows us to make inferences (perhaps conditioned on observed evidence) regarding marginals that were not part of the original specification. [sent-11, score-0.371]
7 In many applications, the specification of marginals is derived from some combination of (noisy) observed data and (imperfect) domain expertise. [sent-12, score-0.329]
8 As such, even before the passage to models for the full joint distribution, there are a number of basic computational questions we might wish to ask of given marginals, such as whether they are consistent with any joint distribution, and if not, what the nearest consistent marginals are. [sent-13, score-0.655]
9 In this paper, we prove a number of general, polynomial time reductions between such problems regarding data or marginals, and problems of inference in graphical models. [sent-15, score-0.471]
10 By “general” we mean the reductions are not restricted to particular classes of graphs or algorithmic approaches, but show that any computational progress on the target problem immediately transfers to progress on the source problem. [sent-16, score-0.313]
11 Thus, for instance, we immediately obtain that the tractability of MAP inference in trees or tree-like graphs yields an efficient algorithm for marginal consistency in tree data graphs; and any future progress in MAP inference for other classes G will similarly transfer. [sent-18, score-0.386]
12 For instance, for any class G for which we can prove the intractability of marginal consistency, we can immediately infer the intractability of MAP inference as well. [sent-20, score-0.246]
13 One, as we have already suggested, is the fact that given marginals may not be consistent with any joint 1 Figure 1: Summary of main results. [sent-22, score-0.483]
14 Arrows indicate that the source problem can be reduced to the target problem for any class of graphs G, and in polynomial time. [sent-23, score-0.279]
15 At the other extreme, given marginals may be consistent with many joint distributions, with potentially very different properties. [sent-26, score-0.483]
16 We thus consider four natural algorithmic problems involving (partially) specified marginals: • C ONSISTENCY: Is there any joint distribution consistent with given marginals? [sent-28, score-0.229]
17 • C LOSEST C ONSISTENCY: What are the consistent marginals closest to given inconsistent marginals? [sent-29, score-0.503]
18 • S MALL S UPPORT: Of the consistent distributions with the closest marginals, can we compute one with support size polynomial in the data (i. [sent-30, score-0.317]
19 The consistency problem has been studied before as the membership problem for the marginal polytope (see Related Work); in the case of inconsistency, the closest consistency problem seeks the minimal perturbation to the data necessary to recover coherence. [sent-34, score-0.272]
20 For example, consider the three features “votes Republican”, “supports universal healthcare”, and “supports tougher gun control”, and suppose the single marginals are 0. [sent-37, score-0.329]
21 We also consider two standard algorithmic inference problems on full joint distributions (models): 1 For a simple example, consider three random variables for which each pairwise marginal specifies that the settings (0,1) and (1,0) each occurs with probability 1/2. [sent-45, score-0.377]
22 Suppose the pairwise marginals for X and Y and for Y and Z specify that all four binary settings are equally likely. [sent-49, score-0.381]
23 No pairwise marginals for X and Z are given, so the data graph is a two-hop path. [sent-50, score-0.461]
24 All six of these problems are parameterized by a class of graphs G — for the four marginals problems, this is the graph induced by the given pairwise marginals, while for the models problems, it is the graph of the given Markov network. [sent-55, score-0.689]
25 All of our reductions are of the form “for every class G, if there is a polynomial-time algorithm for solving inference problem B for (model) graphs in G, then there is a polynomial-time algorithm for marginals problem A for (marginal) graphs in G” — that is, A reduces to B. [sent-56, score-0.689]
26 All of our reductions share a common and powerful technique: the use of the ellipsoid method for Linear Programming (LP), with the key step being the articulation of an appropriate separation oracle. [sent-62, score-0.501]
27 Since our goal is to run in time polynomial in the input length (the number and size of given marginals), the straightforward LP formulation will not suffice. [sent-64, score-0.19]
28 However, by passing to the dual LP, we instead obtain an LP with only a polynomial number of variables, but an exponential number of constraints that can be represented implicitly. [sent-65, score-0.264]
29 For each of the reductions above, we show that the required separation oracle for these implicit constraints is provided exactly by the corresponding inference problem (MAP I NFERENCE or G ENERALIZED PARTITION). [sent-66, score-0.452]
30 For the marginal problems, our reductions (via the ellipsoid method) effectively create a series of “fictitious” Markov networks such that the solutions to corresponding inference problems (MAP I NFERENCE and G ENERALIZED PARTITION) indirectly lead to a solution to the original marginal problems. [sent-69, score-0.681]
31 In contrast, here we develop general and efficient reductions between marginal and inference problems that hold regardless of the graph structure or algorithmic approach; we are not aware of prior efforts in this vein. [sent-71, score-0.462]
32 In particular, [8] shows that finding the MAP assignment for Markov random fields with pairwise potentials can be cast as an integer linear program over the marginal polytope — that is, algorithms for the C ONSISTENCY problem are useful subroutines for inference. [sent-74, score-0.402]
33 3 first to show a converse, that inference algorithms are useful subroutines for decision and optimization problems for the marginal polytope. [sent-77, score-0.251]
34 Our approach dodges this ambitious requirement, in that it only needs a polynomial-time separation oracle (which, for this problem, turns out to be MAP inference). [sent-79, score-0.216]
35 These works provide reductions of some form, but not ones that are both general (independent of graph structure) and polynomial time. [sent-82, score-0.357]
36 The work in [6, 7] makes the point that maximizing entropy subject to an (approximate) consistency condition yields a distribution that can be represented as a Markov network over the graph induced by the original data or marginals. [sent-85, score-0.219]
37 The input is at most one real-valued single marginal value µis for every variable i ∈ [n] and value s ∈ [k], and at most one real-valued pairwise marginal value µijst for every ordered variable pair i, j ∈ [n]×[n] with i < j and every pair s, t ∈ [k]. [sent-98, score-0.266]
38 The data graph induced by a set of marginals has one vertex per random variable Xi , and an undirected edge (i, j) if and only if at least one of the given pairwise marginal values involves the variables Xi and Xj . [sent-100, score-0.728]
39 Let M1 and M2 denote the sets of indices (i, s) and (i, j, s, t) of the given single and pairwise marginal values, and m = |M1 | + |M2 | the total number of marginal values. [sent-101, score-0.266]
40 We say that the given marginals µ are consistent if there exists a (joint) probability distribution consistent with all of them (i. [sent-103, score-0.539]
41 With these basic definitions, we can now give formal definitions for the marginals problems we consider. [sent-106, score-0.37]
42 • C ONSISTENCY (G): Given marginals µ such that the induced data graph falls in G, are they consistent? [sent-108, score-0.502]
43 • C LOSEST C ONSISTENCY (G): Given (possibly inconsistent) marginals µ such that the induced data graph falls in G, compute the consistent marginals ν minimizing ||ν − µ||1 . [sent-109, score-0.936]
44 • S MALL S UPPORT (G): Given (consistent or inconsistent) marginals µ such that the induced data graph falls in G, compute a distribution that has a polynomial-size support and marginals ν that minimize ||ν − µ||1 . [sent-110, score-0.853]
45 • M AX E NTROPY (G): Given (consistent or inconsistent) marginals µ such that the induced data graph falls in G, compute the maximum entropy distribution that has marginals ν that minimize ||ν − µ||1 . [sent-111, score-0.897]
46 The first, which has been addressed in previous work, is to circumvent the exponential number of decision variables via a separation oracle. [sent-113, score-0.195]
47 5 All of our results generalize to the case of higher-order marginals in a straightforward manner. [sent-119, score-0.346]
48 4 It is important to emphasize that all of the problems above are “model-free”, in that we do not assume that the marginals are consistent with, or generated by, any particular model (such as a Markov network). [sent-120, score-0.475]
49 Our contribution is in showing a strong connection between tractability for these marginals problems and the following inference problems for any class G. [sent-124, score-0.511]
50 • MAP I NFERENCE (G): Given a Markov network whose graph falls in G, find the maximum a posteriori (MAP) or most probable joint assignment. [sent-125, score-0.228]
51 6 • G ENERALIZED PARTITION: Given a Markov network whose graph falls in G, compute the partition function or normalization constant for the full joint distribution, possibly after conditioning on the value of a single vertex or edge. [sent-126, score-0.309]
52 There are a number of algorithms that solve explictly described linear programs in time polynomial in the description size. [sent-129, score-0.237]
53 To address this challenge, we turn to the ellipsoid method [9], which can solve in polynomial time linear programs that have an exponential number of implicitly described constraints, provided there is a polynomial-time “separation oracle” for these constraints. [sent-131, score-0.459]
54 The ellipsoid method is discussed exhaustively in [10, 11]; we record in this section the facts necessary for our results. [sent-132, score-0.243]
55 A separation oracle for P is an algorithm n that takes as input a vector x ∈ R , and either (i) verifies that x ∈ P; or (ii) returns a constraint i such that at x > bi . [sent-138, score-0.262]
56 A polynomial-time separation oracle runs in time polynomial in n, the maximum i description length of a single constraint, and the description length of the input x. [sent-139, score-0.47]
57 One obvious separation oracle is to simply check, given a candidate solution x, each of the m constraints in turn. [sent-140, score-0.252]
58 More interesting and relevant are constraint sets that have size exponential in the dimension n but admit a polynomial-time separation oracle. [sent-141, score-0.209]
59 , aT x ≤ bm } admits a polynomial-time separation oracle and cT x is a linear m 1 objective function. [sent-146, score-0.292]
60 Then, the ellipsoid method solves the optimization problem {max cT x : x ∈ P} in time polynomial in n and the maximum description length of a single constraint or objective function. [sent-147, score-0.498]
61 Moreover, if P is non-empty and bounded, the ellipsoid method returns a vertex of P. [sent-149, score-0.333]
62 2 provides a general reduction from a problem to an intuitively easier one: if the problem of verifying membership in P can be solved in polynomial time, then the problem of optimizing an arbitrary linear function over P can also be solved in polynomial time. [sent-151, score-0.473]
63 This reduction is “many-toone,” meaning that the ellipsoid method invokes the separation oracle for P a large (but polynomial) number of times, each with a different candidate point x. [sent-152, score-0.459]
64 1 for a high-level description of the ellipsoid method and [10, 11] for a detailed treatment. [sent-154, score-0.287]
65 The ellipsoid method also applies to convex programming problems under some additional technical conditions. [sent-155, score-0.33]
66 6 Formally, the input is a graph G = (V, E) with a log-potential log φi (s) and log φij (s, t) for each vertex i ∈ V and edge (i, j) ∈ E, and each value s ∈ [k] = {0, 1, 2 . [sent-159, score-0.179]
67 If the the MAP I NFERENCE (G) problem can be solved in polynomial time, then the C ONSISTENCY (G) problem can be solved in polynomial time. [sent-169, score-0.408]
68 Solving (P) using the ellipsoid method (Theorem 2. [sent-173, score-0.243]
69 With an eye toward applying the ellipsoid method (Theorem 2. [sent-178, score-0.243]
70 Given a vector y indexed by M1 ∪ M2 , we define y(a) = yis + (i,s)∈M1 : ai =s yijst (1) (i,j,s,t)∈M2 : ai =s,aj =t for each assignment a ∈ A, and µT y = µis yis + (i,s)∈M1 µijst yijst . [sent-181, score-0.459]
71 3 (Dual Linear Programming Formulation) An instance of the C ONSISTENCY problem admits a consistent distribution if and only if the optimal value of the following linear program (D) is 0: (D) max y,z µT y + z subject to: y(a) + z ≤ 0 y, z unrestricted. [sent-184, score-0.282]
72 for all a ∈ A The number of variables in (D) — one per constraint of the primal linear program — is polynomial in the size of the C ONSISTENCY input. [sent-185, score-0.346]
73 4 (Map Inference as a Separation Oracle) Let G be a set of graphs and suppose that the MAP I NFERENCE (G) problem can be solved in polynomial time. [sent-189, score-0.254]
74 Consider an instance of the C ONSISTENCY problem with a data graph in G, and a candidate solution y, z to the corresponding 6 dual linear program (D). [sent-190, score-0.285]
75 Then, there is a polynomial-time algorithm that checks whether or not there is an assignment a ∈ A that satisfies yijst > −z, yis + (i,s)∈M1 : ai =s (3) (i,j,s,t)∈M2 : ai =s,aj =t and produces such an assignment if one exists. [sent-191, score-0.396]
76 The vertex set V and edge set E correspond to the random variables and edge set of the data graph of the C ONSISTENCY instance. [sent-194, score-0.227]
77 The / underlying graph of N is the same as the data graph of the given C ONSISTENCY instance and hence is a member of G. [sent-198, score-0.188]
78 (i,j)∈E : (i,j,ai ,aj )∈M2 That is, the MAP assignment for the Markov network N is the assignment that maximizes the lefthand size of (3). [sent-200, score-0.182]
79 Deciding whether or not this instance has a consistent distribution is equivalent to solving the program (D) in Lemma 3. [sent-206, score-0.214]
80 2, the ellipsoid method can be used to solve (D) in polynomial time, provided the constraint set admits a polynomial-time separation oracle. [sent-209, score-0.577]
81 4 shows that the relevant separation oracle is equivalent to computing the MAP assignment of a Markov network with graph G ∈ G. [sent-211, score-0.401]
82 By assumption, the latter problem can be solved in polynomial time. [sent-212, score-0.204]
83 ” For instances that admit a consistent distribution, we can also ask for a succinct representation of a distribution that witnesses the marginals’ consistency. [sent-214, score-0.177]
84 1 by showing that for consistent instances, under the same hypothesis, we can compute a small-support consistent distribution in polynomial time. [sent-216, score-0.358]
85 If the MAP I NFERENCE (G) problem can be solved in polynomial time, then for every consistent instance of the C ONSISTENCY (G) problem with m = |M1 | + |M2 | marginal values, a consistent distribution with support size at most m + 1 can be computed in polynomial time. [sent-220, score-0.736]
86 Proof: Consider a consistent instance of C ONSISTENCY with data graph G ∈ G. [sent-221, score-0.213]
87 2), using the given polynomial-time algorithm for MAP I NFERENCE (G) to implement the ellipsoid separation oracle (see Lemma 3. [sent-231, score-0.459]
88 Explicitly form the reduced primal linear program (P-red), obtained from (P) by retaining only the variables that correspond to the dual inequalities generated by the separation oracle in Step 1. [sent-236, score-0.491]
89 Figure 2: High-level description of the polynomial-time reduction from C ONSISTENCY (G) to MAP I NFER ENCE (G) (Steps 1 and 2) and postprocessing to extract a small-support distribution that witnesses consistent marginals (Steps 3 and 4). [sent-239, score-0.521]
90 In particular, this reduced primal linear program is feasible. [sent-241, score-0.179]
91 The reduced primal linear program has a polynomial number of variables and constraints, so it can be solved by the ellipsoid method (or any other polynomial-time method) to obtain a feasible point p. [sent-242, score-0.63]
92 2 that p is a vertex of the feasible region, meaning it satisfies K linearly independent constraints of the reduced primal linear program with equality. [sent-245, score-0.287]
93 This linear program has at most one constraint for each of the m given marginal values, at most one normalization constraint a∈A pa = 1, and non-negativity constraints. [sent-246, score-0.299]
94 The input to these problems is the same as in the C ONSISTENCY problem — single marginal values µis for (i, s) ∈ M1 and pairwise marginal values µijst for (i, j, s, t) ∈ M2 . [sent-250, score-0.324]
95 The goal is to compute sets of marginals {νis }M1 and {νijst }M2 that are consistent and, subject to this constraint, minimize the 1 norm ||µ − ν||1 with respect to the given marginals. [sent-251, score-0.434]
96 An algorithm for the C LOSEST C ONSIS TENCY problem solves the C ONSISTENCY problem as a special case, since a given set of marginals is consistent if and only if the corresponding C LOSEST C ONSISTENCY problem has optimal objective function value 0. [sent-252, score-0.485]
97 Despite this greater generality, the C LOSEST C ONSISTENCY problem also reduces in polynomial time to the MAP I NFERENCE problem, as does the still more general S MALL S UPPORT problem. [sent-253, score-0.191]
98 If the MAP I NFERENCE (G) problem can be solved in polynomial time, then the C LOSEST C ONSISTENCY (G) problem can be solved in polynomial time. [sent-256, score-0.408]
99 Moreover, a distribution consistent with the optimal marginals with support size at most 3m + 1 can be computed in polynomial time, where m = |M1 | + |M2 | denotes the number of marginal values. [sent-257, score-0.711]
100 2, except with the given marginals µ replaced by the computed consistent marginals ν — but a nonlinear objective function ||µ − ν||1 . [sent-259, score-0.763]
wordName wordTfidf (topN-words)
[('onsistency', 0.634), ('marginals', 0.329), ('nference', 0.282), ('ellipsoid', 0.243), ('losest', 0.211), ('polynomial', 0.148), ('mall', 0.141), ('reductions', 0.129), ('separation', 0.129), ('upport', 0.123), ('map', 0.109), ('marginal', 0.107), ('eneralized', 0.106), ('ntropy', 0.106), ('consistent', 0.105), ('ijst', 0.088), ('oracle', 0.087), ('program', 0.081), ('graph', 0.08), ('assignment', 0.077), ('vertex', 0.072), ('yijst', 0.07), ('yis', 0.07), ('dual', 0.057), ('inference', 0.054), ('markov', 0.053), ('ax', 0.053), ('falls', 0.053), ('pairwise', 0.052), ('ai', 0.051), ('graphs', 0.05), ('joint', 0.049), ('entropy', 0.048), ('lp', 0.047), ('primal', 0.046), ('programming', 0.046), ('inconsistent', 0.046), ('description', 0.044), ('witnesses', 0.043), ('problems', 0.041), ('induced', 0.04), ('solved', 0.039), ('lemma', 0.037), ('constraints', 0.036), ('intractability', 0.034), ('algorithmic', 0.034), ('graphical', 0.033), ('pa', 0.033), ('progress', 0.032), ('reduced', 0.03), ('admit', 0.029), ('admits', 0.029), ('tractability', 0.029), ('constraint', 0.028), ('network', 0.028), ('instance', 0.028), ('edge', 0.027), ('subroutines', 0.027), ('partition', 0.027), ('membership', 0.026), ('perhaps', 0.026), ('reduces', 0.026), ('tim', 0.026), ('ips', 0.026), ('regarding', 0.025), ('formulation', 0.025), ('bm', 0.025), ('coin', 0.025), ('consistency', 0.023), ('closest', 0.023), ('programs', 0.023), ('exponential', 0.023), ('support', 0.022), ('linear', 0.022), ('appendix', 0.022), ('theorem', 0.022), ('decision', 0.022), ('variables', 0.021), ('johnson', 0.021), ('chandrasekaran', 0.02), ('informally', 0.02), ('distributions', 0.019), ('classes', 0.019), ('duality', 0.019), ('polytope', 0.019), ('ij', 0.018), ('retaining', 0.018), ('returns', 0.018), ('maximum', 0.018), ('arrows', 0.018), ('aj', 0.018), ('trees', 0.018), ('wish', 0.018), ('aware', 0.017), ('inferences', 0.017), ('probabilistic', 0.017), ('supports', 0.017), ('straightforward', 0.017), ('class', 0.017), ('problem', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
2 0.092685491 352 nips-2013-What do row and column marginals reveal about your dataset?
Author: Behzad Golshan, John Byers, Evimaria Terzi
Abstract: Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i, j) of interest. We do this for all the cells of H simultaneously, without generating realizations, but rather via implicitly sampling the datasets that satisfy the input marginals. The end result is an efficient algorithm with asymptotic running time the same as that required by standard sampling techniques to generate a single dataset from the same dataspace. Our experimental evaluation demonstrates the efficiency and the efficacy of our framework in multiple settings. 1
3 0.084649198 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
Author: Tamir Hazan, Subhransu Maji, Tommi Jaakkola
Abstract: In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions. Specifically, we provide means for drawing either approximate or unbiased samples from Gibbs’ distributions by introducing low dimensional perturbations and solving the corresponding MAP assignments. Our approach also leads to new ways to derive lower bounds on partition functions. We demonstrate empirically that our method excels in the typical “high signal high coupling” regime. The setting results in ragged energy landscapes that are challenging for alternative approaches to sampling and/or lower bounds. 1
4 0.080973014 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We consider the problem of sampling from a probability distribution defined over a high-dimensional discrete set, specified for instance by a graphical model. We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods. Our scheme can leverage fast combinatorial optimization tools as a blackbox and, unlike MCMC methods, samples produced are guaranteed to be within an (arbitrarily small) constant factor of the true probability distribution. We demonstrate that by using state-of-the-art combinatorial search tools, PAWS can efficiently sample from Ising grids with strong interactions and from software verification instances, while MCMC and variational methods fail in both cases. 1
5 0.076255955 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
Author: Justin Domke, Xianghang Liu
Abstract: Inference in general Ising models is difficult, due to high treewidth making treebased algorithms intractable. Moreover, when interactions are strong, Gibbs sampling may take exponential time to converge to the stationary distribution. We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. We find that Gibbs sampling using the projected parameters is more accurate than with the original parameters when interaction strengths are strong and when limited time is available for sampling.
6 0.071967758 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
7 0.068312712 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
8 0.061535154 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
9 0.060375504 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
10 0.058547303 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
11 0.05615196 318 nips-2013-Structured Learning via Logistic Regression
12 0.049618665 149 nips-2013-Latent Structured Active Learning
13 0.047106426 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
14 0.046865519 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
15 0.04648222 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
16 0.045539226 268 nips-2013-Reflection methods for user-friendly submodular optimization
17 0.044053033 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
18 0.042376507 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
19 0.041404199 161 nips-2013-Learning Stochastic Inverses
20 0.039079405 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
topicId topicWeight
[(0, 0.125), (1, 0.023), (2, 0.012), (3, 0.022), (4, 0.029), (5, 0.069), (6, 0.014), (7, -0.05), (8, 0.01), (9, 0.015), (10, 0.042), (11, -0.021), (12, 0.07), (13, -0.032), (14, -0.002), (15, 0.002), (16, -0.036), (17, -0.011), (18, -0.028), (19, 0.052), (20, -0.054), (21, 0.007), (22, 0.03), (23, -0.008), (24, -0.026), (25, 0.014), (26, 0.02), (27, 0.002), (28, 0.007), (29, 0.098), (30, 0.004), (31, -0.059), (32, 0.025), (33, 0.107), (34, 0.063), (35, 0.036), (36, -0.074), (37, 0.032), (38, -0.012), (39, 0.03), (40, -0.065), (41, 0.018), (42, -0.119), (43, -0.048), (44, 0.048), (45, 0.082), (46, -0.024), (47, -0.0), (48, 0.029), (49, 0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.92287284 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
2 0.7972123 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
Author: Jinwoo Shin, Andrew E. Gelfand, Misha Chertkov
Abstract: Max-product ‘belief propagation’ (BP) is a popular distributed heuristic for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM). It was recently shown that BP converges to the correct MAP assignment for a class of loopy GMs with the following common feature: the Linear Programming (LP) relaxation to the MAP problem is tight (has no integrality gap). Unfortunately, tightness of the LP relaxation does not, in general, guarantee convergence and correctness of the BP algorithm. The failure of BP in such cases motivates reverse engineering a solution – namely, given a tight LP, can we design a ‘good’ BP algorithm. In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. We prove that the algorithm converges to the correct optimum if the respective LP relaxation, which may include inequalities associated with non-intersecting odd-sized cycles, is tight. The most significant part of our approach is the introduction of a novel graph transformation designed to force convergence of BP. Our theoretical result suggests an efficient BP-based heuristic for the MWM problem, which consists of making sequential, “cutting plane”, modifications to the underlying GM. Our experiments show that this heuristic performs as well as traditional cutting-plane algorithms using LP solvers on MWM problems. 1
3 0.68451464 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
Author: Bogdan Savchynskyy, Jörg Hendrik Kappes, Paul Swoboda, Christoph Schnörr
Abstract: We consider energy minimization for undirected graphical models, also known as the MAP-inference problem for Markov random fields. Although combinatorial methods, which return a provably optimal integral solution of the problem, made a significant progress in the past decade, they are still typically unable to cope with large-scale datasets. On the other hand, large scale datasets are often defined on sparse graphs and convex relaxation methods, such as linear programming relaxations then provide good approximations to integral solutions. We propose a novel method of combining combinatorial and convex programming techniques to obtain a global solution of the initial combinatorial problem. Based on the information obtained from the solution of the convex relaxation, our method confines application of the combinatorial solver to a small fraction of the initial graphical model, which allows to optimally solve much larger problems. We demonstrate the efficacy of our approach on a computer vision energy minimization benchmark. 1
4 0.67724258 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
Author: Srikrishna Sridhar, Stephen Wright, Christopher Re, Ji Liu, Victor Bittorf, Ce Zhang
Abstract: Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the exact one. These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality. 1
5 0.64287388 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
Author: Nils E. Napp, Ryan P. Adams
Abstract: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
6 0.6103363 168 nips-2013-Learning to Pass Expectation Propagation Messages
7 0.60280216 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
8 0.57070208 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
10 0.55695379 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
11 0.55421758 134 nips-2013-Graphical Models for Inference with Missing Data
12 0.54090852 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
13 0.53125322 318 nips-2013-Structured Learning via Logistic Regression
14 0.49027592 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
15 0.48991522 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
16 0.48948106 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
17 0.48515058 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
18 0.47240421 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
19 0.45269868 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
20 0.45039615 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
topicId topicWeight
[(16, 0.021), (33, 0.11), (34, 0.094), (41, 0.076), (49, 0.028), (56, 0.124), (70, 0.045), (81, 0.235), (85, 0.041), (89, 0.032), (93, 0.035), (95, 0.066)]
simIndex simValue paperId paperTitle
1 0.77706057 148 nips-2013-Latent Maximum Margin Clustering
Author: Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
Abstract: We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches. 1
same-paper 2 0.77052796 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
3 0.66313034 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
4 0.66176659 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf
Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identifiability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufficient or the model is misspecified, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artificial and real data and code is provided. 1
5 0.65995276 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
Author: Ryosuke Matsushita, Toshiyuki Tanaka
Abstract: We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. Numerical experiments show that the proposed algorithm outperforms Lloyd’s K-means algorithm. 1
6 0.65919244 11 nips-2013-A New Convex Relaxation for Tensor Completion
7 0.65629566 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
8 0.65263557 249 nips-2013-Polar Operators for Structured Sparse Estimation
9 0.65179622 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
10 0.65121108 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
11 0.64848018 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
12 0.64661455 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
13 0.64588529 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.64435077 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
15 0.64402205 318 nips-2013-Structured Learning via Logistic Regression
16 0.6420266 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
17 0.64159322 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
18 0.641047 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
19 0.64072603 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
20 0.64029199 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables