nips nips2012 nips2012-302 knowledge-graph by maker-knowledge-mining

302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization


Source: pdf

Author: Stephen Bach, Matthias Broecheler, Lise Getoor, Dianne O'leary

Abstract: Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. [sent-10, score-0.066]

2 In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. [sent-12, score-0.205]

3 We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. [sent-14, score-0.085]

4 In this paper, we focus on scaling up most-probable-explanation (MPE) inference for a particular class of graphical models called constrained continuous Markov random fields (CCMRFs) [2]. [sent-18, score-0.136]

5 Like other Markov random fields (MRFs), CCMRFs define a joint distribution over a collection of random variables and capture local dependencies through potential functions. [sent-19, score-0.179]

6 1 MPE inference for CCMRFs is tractable under mild convexity assumptions because it can be cast as a convex numeric optimization problem, which can be solved by interior-point methods [3]. [sent-23, score-0.095]

7 1 We show how hinge-loss potential functions that are often used to model real world problems in CCMRFs (see, e. [sent-26, score-0.144]

8 , [3, 2, 4, 5, 6, 7]) can be exploited to significantly speed up the numeric optimization and therefore MPE inference. [sent-28, score-0.071]

9 To do so, we rely on a consensus optimization framework [8]. [sent-29, score-0.29]

10 Consensus optimization has recently been shown to perform well on relaxations of discrete optimization problems, like MRF MPE inference [8, 9, 10]. [sent-30, score-0.144]

11 The contributions of this paper are as follows: First, we derive algorithms for the MPE problem in CCMRFs with piecewise-linear and piecewise-quadratic dependencies in Section 3. [sent-31, score-0.078]

12 Next, we improve the performance of consensus optimization by deriving an algorithm that exploits opportunities for closed-form solutions to subproblems, based on the current optimization iterate, before resorting to an iterative solver when the closed-form solution is not applicable. [sent-32, score-0.391]

13 Then, we present an experimental evaluation (Section 4) that demonstrates superior performance of our approach over a commercial interior-point method, the current state-of-the-art for CCMRF MPE inference. [sent-33, score-0.069]

14 In a voter-preference modeling problem, our algorithms scaled linearly in the number of dependencies and constraints. [sent-34, score-0.085]

15 To the best of our knowledge, we are the first to show results on MPE inference for any MRF variant using consensus optimization with iterative methods to solve subproblems. [sent-38, score-0.368]

16 2 Background In this section we formally introduce the class of probabilistic graphical models for which we derive inference algorithms and present a simple running example (this is the same example used in our experiments in Section 4). [sent-39, score-0.115]

17 We also give an overview of consensus optimization [8], the abstract framework we will use to derive our algorithms in Section 3. [sent-40, score-0.315]

18 1 Constrained continuous Markov random fields and the MPE problem A constrained continuous Markov random field (CCMRF) is a probabilistic graphical model defined over continuous random variables with a constrained domain [2]. [sent-42, score-0.271]

19 In this paper, we focus on a common subclass in which dependencies among continuous random variables are defined in terms of hinge-loss functions and linear constraints: Definition 1. [sent-43, score-0.154]

20 A hinge-loss constrained continuous Markov random field f is a probability density over a finite set of n random variables X = {X1 , . [sent-44, score-0.119]

21 , φm } be a finite set of m continuous potential functions of the form φj (X) = [max { j (X), 0}]pj where j is a linear function of X and pj ∈ {1, 2}. [sent-51, score-0.192]

22 , Cr } be a finite set of r linear constraint functions associated with two index sets denoting equality and inequality constraints, E ˜ and I, which define the feasible set D = {X ∈ D|Ck (X) = 0, ∀k ∈ E and Ck (X) ≥ 0, ∀k ∈ I}. [sent-55, score-0.091]

23 It says that hinge-loss CCMRFs are models in which densities of assignments to variables are defined by an exponential of the negated, weighted sum of functions over those assignments, unless any constraint is violated, in which case the density is zero. [sent-62, score-0.098]

24 In a hinge-loss CCMRF, the normalizing function Z(Λ) is constant over X for fixed parameters and the exponential is maximized by minimizing its negated argument, so the MPE problem is m arg max f (X) ≡ arg min X Λj φj (X) s. [sent-64, score-0.193]

25 Instances of hinge-loss CCMRFs have been used previously to model many problems, including link prediction, collective classification [3, 2], prediction of opinion diffusion [4], medical decision making [5], trust analysis in social networks [6], and group detection in social networks [7]. [sent-71, score-0.505]

26 Consider a social network S ≡ (V, E) of voters in a set V with relationships defined by annotated, unweighted, directed edges (va , vb )τ ∈ E. [sent-77, score-0.201]

27 Here, va , vb ∈ V and τ is an annotation denoting the type of relationship: friend, boss, etc. [sent-78, score-0.187]

28 To reason about voter’s opinions towards two hypothetical political parties, liberal (L) and conservative (C), we introduce two nonnegative random variables Xa,L and Xa,C , summing to at most one, representing the strength of voter va ’s preferences for each political party. [sent-79, score-0.35]

29 We assume that va ’s preference results from an intrinsic opinion and the influence of va ’s social group. [sent-80, score-0.591]

30 We represent the intrinsic opinion by opinion(va ), ranging from −1 (strongly favoring L) to 1 (strongly favoring C). [sent-81, score-0.241]

31 The influence of the social group is modeled by potential functions that we generically denote as φ. [sent-82, score-0.243]

32 If opinion(va ) < 0, then φ ≡ [max{|opinion(va )| − Xa,L , 0}]p , which penalizes preferences that are weaker than intrinsic opinions. [sent-84, score-0.076]

33 These hinge-loss potential functions are weighted by a fixed parameter Λopinion . [sent-87, score-0.121]

34 Next we penalize disagreements between voters in a social group. [sent-88, score-0.177]

35 For each edge (va , vb )τ we introduce potential functions φ ≡ [max{Xb,L − Xa,L , 0}]p and φ ≡ [max{Xb,C − Xa,C , 0}]p , penalizing preferences of va that are not as strong as those of vb . [sent-89, score-0.392]

36 These potential functions are weighted by parameters Λτ defining the relative influence of the τ relationship. [sent-90, score-0.121]

37 We consider p = 1, meaning that the model has no preference between distributing the loss and accumulating it on a single potential function, and p = 2, meaning that that the model prefers to distribute the loss among multiple hinge-loss functions. [sent-92, score-0.093]

38 To illustrate the choice, consider a single voter in a CCMRF with two equally-weighted potential functions φ1 ≡ [max{0. [sent-93, score-0.19]

39 We see that, all else being equal, squared potential functions “respect” the minima of individual potential functions if they cannot all be minimized. [sent-105, score-0.284]

40 As we demonstrate in Section 4, scaling MPE inference for CCMRFs with piecewise-quadratic potential functions is one of the contributions of our work. [sent-107, score-0.145]

41 2 Consensus optimization Consensus optimization is a framework that optimizes an objective by dividing it into independent subproblems and then iterating to reach a consensus on the optimum [8]. [sent-109, score-0.475]

42 In this subsection we present an abstract consensus optimization algorithm for Problem (1), the MPE problem for CCMRFs. [sent-110, score-0.29]

43 In Section 3 we will derive specialized versions for different potential functions. [sent-111, score-0.118]

44 Given a CCMRF (X, φ, C, E, I, Λ) and parameter ρ > 0, the algorithm first constructs a modified MPE problem in which each potential and constraint is a function of different variables. [sent-112, score-0.13]

45 The variables are constrained to make the new and original MPE problems equivalent. [sent-113, score-0.102]

46 We let xj be a copy of the variables in X that are used in the potential function φj , j = 1, . [sent-114, score-0.22]

47 , m and xk+m be a copy of those used in the constraint function Ck , k = 1, . [sent-117, score-0.081]

48 We also introduce an indicator function Ik for each constraint function where Ik [Ck (xk+m )] = 0 if the constraint is satisfied and ∞ if it is not. [sent-121, score-0.074]

49 Consensus optimization solves the new MPE problem m arg min xi ∈[0,1]ni r Λj φj (xj ) + j=1 Ik [Ck (xk+m )] subject to xi = Xi k=1 3 (2) Algorithm Consensus optimization Input: CCMRF (X, φ, C, E, I, Λ), ρ > 0 Initialize xj as a copy of the variables in X that appear in φj , j = 1, . [sent-126, score-0.285]

50 , m Initialize xk+m as a copy of the variables in X that appear in Ck , k = 1, . [sent-129, score-0.077]

51 , m do 1 xj ← arg minxj ∈[0,1]nj Λj φj (xj ) + ρ xj − Xj + ρ yj 2 2 2 end for for k = 1, . [sent-143, score-0.166]

52 , r do 1 xk+m ← arg minxk+m ∈[0,1]nk+m Ik [Ck (xk+m )] + ρ xk+m − Xk+m + ρ yk+m 2 end for Set each variable in X to the average of its copies end while 2 2 where i = 1, . [sent-146, score-0.066]

53 ADMM can be viewed as an approach to combining the scalability of dual decomposition and the convergence properties of augmented Lagrangian methods [8]. [sent-152, score-0.072]

54 It then averages the copies of variables to get the consensus variables X for the next iteration. [sent-155, score-0.31]

55 In the next section we derive algorithms with specific methods for updating each xj . [sent-161, score-0.075]

56 3 Solving the MPE problem with consensus optimization We now derive algorithms to update xj for each potential function φj . [sent-162, score-0.493]

57 At this point we drop the more complex notation and view each update as an instance of the problem arg min Λ[max {cT x + c0 , 0}]p + (ρ/2) x − d x∈[0,1]n 2 2 (3) where c, d ∈ Rn , c0 ∈ R, Λ ≥ 0, p ∈ {1, 2}, and ρ > 0. [sent-163, score-0.101]

58 To map an update to Problem (3) for a potential function φj and parameter Λj , let n = nj , cT x + c0 = (xj ), d = Xj − (1/ρ)yj , Λ = Λj , p = pj , and keep ρ the same. [sent-164, score-0.159]

59 , each potential function has at most two unknowns and is piecewise-linear. [sent-167, score-0.118]

60 We present the update in terms of the intermediate optimization problems it solves. [sent-168, score-0.104]

61 For each component αj of α(1)  0 if dj < 0 (1) αj = dj if 0 ≤ dj ≥ 1  1 if dj > 1 where j = 1, . [sent-171, score-0.18]

62 We refer to this procedure as clipping the vector d to the interval [0, 1]. [sent-175, score-0.064]

63 In this section, when we refer to clipping to [a, b], we mean an identical vector except that any component outside a bound a or b is changed to that bound. [sent-176, score-0.064]

64 Inspect the reduced objective and clip the unconstrained minimizer to [min, max]. [sent-183, score-0.074]

65 In the second case, if cT α(1) + c0 > 0, but cT α(2) + c0 ≥ 0, then observe that α(2) minimizes an objective which bounds the update objective below, but the two objectives are equal at α(2) . [sent-187, score-0.087]

66 We know ∃x ∈ [0, 1]n such that cT x + c0 = 0, so the problem can be split into two feasible problems: β (1) ≡ (ρ/2) x − d arg min x∈[0,1]n s. [sent-190, score-0.092]

67 cT x+c0 ≤0 β (2) ≡ 2 2 ΛcT x + (ρ/2) x − d arg min x∈[0,1]n s. [sent-192, score-0.066]

68 4 Experiments We evaluated the scalability of CO-Linear and CO-Quad by generating social networks of varying sizes, constructing CCMRFs with them, and measuring the running time required to find a MPE. [sent-216, score-0.225]

69 We compared our approach to the previous state-of-the-art approach for finding MPEs in CCMRFs, which uses an interior point method implemented in MOSEK, a commercial optimization package (http://www. [sent-217, score-0.091]

70 [4] to generate social networks using power-law degree distributions. [sent-224, score-0.146]

71 We used six edge types with various parameters to represent relationships in social networks with different combinations of abundance and exclusivity, choosing γ between 2 and 3, and α between 0 and 1, as suggested by Broecheler et. [sent-228, score-0.146]

72 We generated social networks with between 22,050 and 66,150 vertices, which induced CCMRFs with between 130,082 and 397,494 total potential functions and constraints. [sent-231, score-0.267]

73 In all the CCMRFs, between 83% and 85% of those totals were potential functions and between 15% and 17% were constraints. [sent-232, score-0.121]

74 For each social network, we created both a CCMRF to test CO-Linear (p = 1 in Definition 1) and one to test CO-Quad (p = 2). [sent-233, score-0.122]

75 We used the interior-point method in MOSEK to find α3 in the update for CO-Quad when necessary by passing the problem via MOSEK’s Java native interface wrapper. [sent-241, score-0.074]

76 005 on problems with more than 175,000 potential functions and constraints. [sent-251, score-0.144]

77 Although we do not show it in the figure, the average running time on the largest problem was about 4,900 seconds (over 1 hour, 20 minutes). [sent-259, score-0.085]

78 The average running time on the largest problem was about 130 seconds (2 minutes, 10 seconds). [sent-262, score-0.085]

79 Further, the running time grows linearly in the number of potential functions and constraints in the CCMRF, i. [sent-263, score-0.216]

80 , the number of subproblems that must be solved at each iteration. [sent-265, score-0.113]

81 We emphasize that the implementation of CO-Linear is research code written in Java and the interior-point method is a commercial package running as native code. [sent-269, score-0.124]

82 We could only test it on the three smallest problems, the largest of which took an average of about 56,500 seconds to solve (over 15 hours, 40 minutes). [sent-274, score-0.068]

83 We also evaluated a naive variant of CO-Quad which immediately updates xj via the interior-point method when there are two unknowns. [sent-279, score-0.082]

84 Although this is often acceptable, we quantified the mix of infeasibility and suboptimality by repairing the infeasibility and measuring the resulting total suboptimality. [sent-283, score-0.078]

85 We first projected the solutions returned by consensus optimization onto the feasible region, which took a negligible amount of computational time. [sent-284, score-0.342]

86 Let pC be the value of the objective in Problem (1) at such a point and let pIP M be 7 the value of the objective at the point returned by the interior-point method. [sent-285, score-0.078]

87 This shows that consensus optimization was accurate, in addition to being dramatically faster (lower absolute time) and more scalable (smaller growth in time with problem size). [sent-293, score-0.29]

88 With specialized algorithms, consensus optimization offers far superior scalability. [sent-295, score-0.314]

89 In our experiments the computational cost grew linearly with the number of potential functions and constraints. [sent-296, score-0.153]

90 The well-understood theory of consensus optimization can help here. [sent-299, score-0.29]

91 [4], which used heuristics to solve the MPE problem by partitioning CCMRFs, fixing values of variables at the boundaries, solving relatively large subproblems with interior-point methods, and repeating with different partitions. [sent-302, score-0.193]

92 Much work has studied dual decomposition for solving relaxations of discrete MPE problems [15]. [sent-305, score-0.108]

93 [9], and Meshi and Globerson [10] recently studied using consensus optimization to solve convex relaxations of the MPE problem for discrete MRFs. [sent-308, score-0.341]

94 They solved the problem for MRFs which induced subproblems with closed-form solutions. [sent-309, score-0.113]

95 [16], an algorithm for solving a relaxed MPE problem by solving a sequence of subproblems in a process called proximal minimization. [sent-313, score-0.188]

96 The first is to expand the number of unknowns in subproblems that can be solved in closed form. [sent-315, score-0.138]

97 Another is analyzing the Karush-Kuhn-Tucker optimality conditions for the subproblems to eliminate variables when possible and solve them more efficiently. [sent-316, score-0.169]

98 While all (hinge-loss) CCMRF subproblems could be solved with a general-purpose algorithm, such as an interior-point method, we showed that even in cases when an algorithm might have to resort to an interior-point method, exploiting opportunities for closed-form solutions greatly improved speed. [sent-317, score-0.137]

99 A scalable framework for modeling competitive diffusion in social networks. [sent-349, score-0.159]

100 Probabilistic soft logic for trust analysis in social networks. [sent-364, score-0.148]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mpe', 0.536), ('ct', 0.371), ('ccmrfs', 0.353), ('consensus', 0.244), ('ccmrf', 0.235), ('broecheler', 0.196), ('opinion', 0.15), ('va', 0.14), ('social', 0.122), ('subproblems', 0.113), ('ck', 0.102), ('mosek', 0.098), ('minx', 0.093), ('potential', 0.093), ('xxt', 0.071), ('voter', 0.069), ('arg', 0.066), ('inspection', 0.065), ('clipping', 0.064), ('xk', 0.062), ('martins', 0.06), ('getoor', 0.057), ('park', 0.056), ('dependencies', 0.053), ('xj', 0.05), ('college', 0.05), ('java', 0.048), ('clip', 0.048), ('vb', 0.047), ('optimization', 0.046), ('constrained', 0.046), ('mrfs', 0.045), ('dj', 0.045), ('seconds', 0.045), ('commercial', 0.045), ('copy', 0.044), ('meshi', 0.043), ('maryland', 0.043), ('else', 0.042), ('running', 0.04), ('continuous', 0.04), ('scalability', 0.039), ('intrinsic', 0.039), ('native', 0.039), ('impractically', 0.039), ('infeasibility', 0.039), ('mpes', 0.039), ('pip', 0.039), ('diffusion', 0.037), ('preferences', 0.037), ('constraint', 0.037), ('ik', 0.036), ('globerson', 0.036), ('update', 0.035), ('minutes', 0.033), ('variables', 0.033), ('dual', 0.033), ('naive', 0.032), ('friend', 0.032), ('quad', 0.032), ('negated', 0.032), ('iarpa', 0.032), ('voters', 0.032), ('linearly', 0.032), ('iterative', 0.031), ('md', 0.031), ('pj', 0.031), ('bach', 0.029), ('max', 0.029), ('multipliers', 0.028), ('functions', 0.028), ('relaxations', 0.028), ('proximal', 0.027), ('trust', 0.026), ('outgoing', 0.026), ('favoring', 0.026), ('pc', 0.026), ('feasible', 0.026), ('graphical', 0.026), ('objective', 0.026), ('returned', 0.026), ('admm', 0.025), ('unknowns', 0.025), ('uence', 0.025), ('derive', 0.025), ('numeric', 0.025), ('opinions', 0.025), ('solving', 0.024), ('inference', 0.024), ('hour', 0.024), ('par', 0.024), ('opportunities', 0.024), ('networks', 0.024), ('superior', 0.024), ('elds', 0.023), ('constraints', 0.023), ('penalize', 0.023), ('political', 0.023), ('problems', 0.023), ('solve', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

Author: Stephen Bach, Matthias Broecheler, Lise Getoor, Dianne O'leary

Abstract: Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. 1

2 0.15532835 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

3 0.14173277 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

Author: Pieter-jan Kindermans, Hannes Verschore, David Verstraeten, Benjamin Schrauwen

Abstract: The usability of Brain Computer Interfaces (BCI) based on the P300 speller is severely hindered by the need for long training times and many repetitions of the same stimulus. In this contribution we introduce a set of unsupervised hierarchical probabilistic models that tackle both problems simultaneously by incorporating prior knowledge from two sources: information from other training subjects (through transfer learning) and information about the words being spelled (through language models). We show, that due to this prior knowledge, the performance of the unsupervised models parallels and in some cases even surpasses that of supervised models, while eliminating the tedious training session. 1

4 0.091181487 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.078342699 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

6 0.070970766 300 nips-2012-Scalable nonconvex inexact proximal splitting

7 0.068318382 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

8 0.064206086 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

9 0.059812695 204 nips-2012-MAP Inference in Chains using Column Generation

10 0.058000717 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

11 0.05657877 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

12 0.055660676 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

13 0.052959051 27 nips-2012-A quasi-Newton proximal splitting method

14 0.050971821 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

15 0.050035115 194 nips-2012-Learning to Discover Social Circles in Ego Networks

16 0.04947326 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

17 0.048446171 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

18 0.047959957 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

19 0.046640966 286 nips-2012-Random Utility Theory for Social Choice

20 0.046264365 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.136), (1, 0.028), (2, 0.038), (3, -0.041), (4, -0.003), (5, 0.007), (6, -0.01), (7, -0.064), (8, 0.003), (9, 0.062), (10, -0.056), (11, 0.05), (12, 0.031), (13, 0.022), (14, -0.006), (15, -0.03), (16, -0.097), (17, -0.009), (18, -0.023), (19, 0.016), (20, -0.016), (21, 0.004), (22, 0.033), (23, -0.01), (24, 0.011), (25, -0.078), (26, 0.014), (27, 0.068), (28, 0.01), (29, 0.085), (30, 0.015), (31, 0.016), (32, 0.121), (33, 0.05), (34, -0.007), (35, -0.069), (36, 0.05), (37, -0.042), (38, 0.031), (39, 0.033), (40, -0.038), (41, 0.015), (42, 0.08), (43, -0.031), (44, -0.029), (45, -0.054), (46, 0.066), (47, -0.09), (48, -0.073), (49, -0.072)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87243307 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

Author: Stephen Bach, Matthias Broecheler, Lise Getoor, Dianne O'leary

Abstract: Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. 1

2 0.56060773 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

3 0.51763701 288 nips-2012-Rational inference of relative preferences

Author: Nisheeth Srivastava, Paul R. Schrater

Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1

4 0.47715002 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

5 0.47350499 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

6 0.46470878 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

7 0.46347252 204 nips-2012-MAP Inference in Chains using Column Generation

8 0.44566643 298 nips-2012-Scalable Inference of Overlapping Communities

9 0.44399104 27 nips-2012-A quasi-Newton proximal splitting method

10 0.43935862 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

11 0.43920949 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins

12 0.43011197 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

13 0.42930415 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

14 0.41901183 60 nips-2012-Bayesian nonparametric models for ranked data

15 0.41758779 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

16 0.41642094 253 nips-2012-On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks

17 0.41005591 282 nips-2012-Proximal Newton-type methods for convex optimization

18 0.40764108 267 nips-2012-Perceptron Learning of SAT

19 0.40370116 194 nips-2012-Learning to Discover Social Circles in Ego Networks

20 0.40157947 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.064), (17, 0.011), (21, 0.307), (38, 0.14), (42, 0.017), (54, 0.023), (55, 0.022), (74, 0.034), (76, 0.141), (80, 0.085), (92, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9489454 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a specific pattern of recurrent connections learned by the model. 1

2 0.91043031 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

3 0.8977046 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

4 0.86700636 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.8461377 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

same-paper 6 0.82898015 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

7 0.81592381 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

8 0.75749862 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

9 0.75495625 23 nips-2012-A lattice filter model of the visual pathway

10 0.74573034 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

11 0.71698612 190 nips-2012-Learning optimal spike-based representations

12 0.70751721 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

13 0.70730853 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

14 0.7048977 94 nips-2012-Delay Compensation with Dynamical Synapses

15 0.69935602 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

16 0.69766474 152 nips-2012-Homeostatic plasticity in Bayesian spiking networks as Expectation Maximization with posterior constraints

17 0.69230074 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

18 0.69034427 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

19 0.69020599 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

20 0.68496138 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex