nips nips2010 nips2010-126 knowledge-graph by maker-knowledge-mining

126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Source: pdf

Author: Danny Bickson, Carlos Guestrin

Abstract: Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L´ vy and Gaussian distrie butions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. [sent-4, score-0.18]

2 In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. [sent-5, score-0.575]

3 Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L´ vy and Gaussian distrie butions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. [sent-6, score-0.8]

4 LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). [sent-7, score-0.495]

5 Another common property of communication networks is that network traffic tends to be linear [8, 23]. [sent-13, score-0.257]

6 Recently, several works propose to use linear multivariate statistical methods for monitoring network health, performance analysis or intrusion detection [15, 13, 16, 14]. [sent-15, score-0.179]

7 Some of the aspects of network traffic makes the task of modeling it using a probabilistic graphical models challenging. [sent-16, score-0.283]

8 In the current work, we propose a novel linear probabilistic graphical model called linear characteristic model (LCM) to model linear interactions of independent heavy-tailed random variables (Section 3). [sent-19, score-0.304]

9 Using the stable family of distributions (defined in Section 2), a family of heavy-tailed distributions, we show how to compute both exact and approximate inference (Section 4). [sent-20, score-0.632]

10 Using real data from the domain of computer networks we demonstrate the applicability of our proposed methods for computing inference in LCM (Section 5). [sent-21, score-0.176]

11 We summarize our contributions below: 1 • We propose a new linear graphical model called LCM, defined as a product of factors in the cf domain. [sent-22, score-0.551]

12 We show that our model is well defined for any collection of random variables, since any random variable has a matching cf. [sent-23, score-0.198]

13 In this work, we extend the applicability of belief propagation to the stable family of distributions, a generalization of Gaussian, Cauchy and L´ vy distributions. [sent-25, score-0.592]

14 e We analyze both exact and approximate inference algorithms, including convergence and accuracy of the solution. [sent-26, score-0.246]

15 • We demonstrate the applicability of our proposed method, performing inference in real settings, using network tomography data obtained from the PlanetLab network. [sent-27, score-0.259]

16 CFG assume that the probability distribution factorizes as a convolution of potentials, and proposes to use duality to derive a product factorization in the characteristic function (cf) domain. [sent-32, score-0.247]

17 In this work we extend CFG by defining the graphical model as a product of factors in the cf domain. [sent-33, score-0.386]

18 Unlike CFGs, LCMs are always defined, for any probability distribution, while CFG may are not defined when the inverse Fourier transform does not exist. [sent-34, score-0.266]

19 The main difference is that Copulas transform each marginal variable into a uniform distribution and perform inference in the cumulative distribution function (cdf) domain. [sent-37, score-0.337]

20 In contrast, we perform inference in the cf domain. [sent-38, score-0.358]

21 In our case of interest, when the underlying distributions are stable, Copulas can not be used since stable distributions are not analytically expressible in the cdf domain. [sent-39, score-0.467]

22 2 Stable distribution Stable distribution [30] is a family of heavy-tailed distributions, where Cauchy, L´ vy and Gaussian e are special instances of this family (see Figure 1). [sent-43, score-0.225]

23 As we will soon show with our networking example, network flows exhibit empirical distribution which can be modeled remarkably well by stable distributions. [sent-46, score-0.593]

24 We denote a stable distribution by a tuple of four parameters: S(α, β, γ, δ). [sent-47, score-0.336]

25 1), a Gaussian N (μ, σ 2 ) is a stable distribution with the parameters S(2, 0, √2 , μ), a Cauchy distribution Cauchy(γ, δ) is stable with S(1, 0, γ, δ) and a L´ vy distribution L´ vy(γ, δ) is e e stable with S( 1 , 1, γ, δ). [sent-50, score-1.097]

26 We begin by defining 2 a unit scale, zero-centered stable random variable. [sent-52, score-0.371]

27 We formally define characteristic function in the supplementary material. [sent-58, score-0.218]

28 α=1 (1) Next we define a general stable random variable. [sent-60, score-0.401]

29 α=1 A basic property of stable laws is that weighted sums of α-stable random variables is α-stable (and hence the family is called stable). [sent-67, score-0.326]

30 This property will be useful in the next section where we compute inference in a linear graphical model with underlying stable distributions. [sent-68, score-0.518]

31 2 [βγ log γ − β1 γ1 log γ1 − β2 γ2 log γ2 ] α = 1 π Note that both X1 , X2 have to be distributed with the same characteristic exponent α. [sent-78, score-0.181]

32 ξ= 3 Linear characteristic models A drawback of general stable distributions, is that they do not have closed-form equation for the pdf or the cdf. [sent-79, score-0.49]

33 This fact makes the handling of stable distributions more difficult. [sent-80, score-0.365]

34 This is probably one of the reasons stable distribution are rarely used in the probabilistic graphical models community. [sent-81, score-0.403]

35 We propose a novel approach for modeling linear interactions between random variables distributed according to stable distributions, using a new linear probabilistic graphical model called LCM. [sent-82, score-0.483]

36 A new graphical model is needed, since previous approaches like CFG or the Copula method can not be used for computing inference in closed-form in linear models involving stable distribution, because they require computation in the pdf or cdf domains respectively. [sent-83, score-0.631]

37 For example, in linear channel decoding, X are the transmitted codewords, the matrix A is the linear channel transformation and Y is a vector of observations. [sent-90, score-0.162]

38 Besides of the network application we focus on, other potential application to our current work is linear channel decoding with stable, non-Gaussian, noise. [sent-94, score-0.2]

39 In the rest of this section we develop the foundations for computing inference in a linear model using underlying stable distributions. [sent-95, score-0.451]

40 Because stable distributions do not have closed-form equations in the pdf domain, we must work in the cf domain. [sent-96, score-0.689]

41 Hence, we define a dual linear model in the cf domain. [sent-97, score-0.43]

42 (LCM) Given the linear model Y=AX, we define the linear characteristic model (LCM) ϕ(t1 , ∙ ∙ ∙ , tn , s1 , ∙ ∙ ∙ , sm ) i ϕ(ti , s1 , ∙ ∙ ∙ , sm ) , where ϕ(ti , s1 , ∙ ∙ ∙ , sm ) is the characteristic function4 of the joint distribution p(xi , y1 , ∙ ∙ ∙ , ym ). [sent-106, score-0.849]

43 The following two theorems establish duality between the LCM and its dual representation in the pdf domain. [sent-107, score-0.167]

44 Given a LCM, assuming p(x, y) as defined in (2) has a closed form and the Fourier transform F [p(x, y)] exists, then the F [p(x, y)] = ϕ(t1 , ∙ ∙ ∙ , tn , s1 , ∙ ∙ ∙ , sm ). [sent-111, score-0.405]

45 Given a LCM, when the inverse Fourier F −1 (ϕ(t1 , ∙ ∙ ∙ , tn , s1 , ∙ ∙ ∙ , sm )) = p(x, y) as defined in (2). [sent-114, score-0.337]

46 The duality is useful, since it allows us to compute inference in either representations, whenever it is more convenient. [sent-118, score-0.161]

47 4 Main result: exact and approximate inference in LCM This section brings our main result. [sent-119, score-0.209]

48 Typically, exact inference in linear models with continuous variables is limited to the well understood cases of Gaussian and simple regression problem in exponential families. [sent-120, score-0.221]

49 In this section we extend previous results, to show how to compute inference (both exact and approximate) in linear model with underlying stable distributions. [sent-121, score-0.518]

50 1 Exact inference in LCM The inference task typically involves computation of marginal distribution or a conditional distribution of a probability function. [sent-123, score-0.382]

51 Unfortunately, when working with stable distribution, the above integral is intractable. [sent-126, score-0.297]

52 Instead, we propose to use a dual operation called slicing, computed in the cf domain. [sent-127, score-0.285]

53 Given random variables X1 , X2 , the joint cf is ϕX1 ,X2 (t1 , t2 ) = E[eit1 x1 +it2 x2 ]. [sent-132, score-0.245]

54 The marginal cf is derived from the joint cf by ϕX1 (t1 ) = ϕX1 ,X2 (t1 , 0). [sent-134, score-0.568]

55 t2 =0 The following theorem establishes the fact that marginal distribution can be computed in the cf domain, by using the slicing operation. [sent-137, score-0.431]

56 008 Remove φ(tj , s1 , ∙ ∙ ∙ , sm ) and ti from LCM. [sent-145, score-0.217]

57 -2 -1 0 1 2 3 4 Figure 1: The three special cases of stable distribution where closed-form pdf exists. [sent-152, score-0.415]

58 Given a LCM, the marginal cf of the random variable Xi can be computed using ϕ(ti ) = j ϕ(tj , s1 , ∙ ∙ ∙ , sm ) , (3) T \i=0 In case the inverse Fourier transform exists, then the marginal probability of the hidden variable Xi is given by p(xi ) ∼ F −1 {ϕ(ti )} . [sent-155, score-0.572]

59 2 we propose an exact inference algorithm, LCM-Elimination, for computing the marginal cf (shown in Algorithm 1). [sent-158, score-0.503]

60 LCM-Elimination operates in the cf domain, by evaluating one variable at a time, and updating the remaining graphical model accordingly. [sent-162, score-0.312]

61 Once the marginal cf ϕ(ti ), is computed, assuming the inverse Fourier transform exists, we can compute the desired marginal probability p(xi ). [sent-164, score-0.469]

62 2 Exact inference in stable distributions After defining LCM and showing that inference can be computed in the cf domain, we are finally ready to show how to compute exact inference in a linear model with underlying stable distributions. [sent-166, score-1.428]

63 We assume that all observation nodes Yi are distributed according to a stable distribution. [sent-167, score-0.334]

64 From the linearity property of stable distribution, it is clear that the hidden variables Xi are distributed according to a stable distribution as well. [sent-168, score-0.67]

65 Iterate until convergence mij (tj ) = ϕi (ti , s1 , ∙ ∙ ∙ , sm ) mij (xj ) = mki (ti ) k∈N (i)\j xi ti =0 Finally: mki (ti ). [sent-185, score-0.532]

66 3 Approximate Inference in LCM Typically, the cost of exact inference may be expensive. [sent-193, score-0.18]

67 For example, in the related linear model of a multivariate Gaussian (a special case of stable distribution), LCM-Elimination reduces to Gaussian elimination type algorithm with a cost of O(n3 ), where n is the number of variables. [sent-194, score-0.338]

68 Approximate methods for inference like belief propagation [26], usually require less work than exact inference, but may not always converge (or convergence to an unsatisfactory solution). [sent-195, score-0.361]

69 The cost of exact inference motivates us to devise a more efficient approximations. [sent-196, score-0.294]

70 We propose two novel algorithms that are variants of belief propagation for computing approximate inference in LCM. [sent-197, score-0.286]

71 The first, Characteristic-Slice-Product (CSP) is defined in LCM (shown in Algorithm 2(a)). [sent-198, score-0.198]

72 As in belief propagation, our algorithms are exact on tree graphical models. [sent-200, score-0.204]

73 Given an LCM with underlying tree topology (the matrix A is an irreducible adjacency matrix of a tree graph), the CSP and IC algorithms, compute exact inference, resulting in the marginal cf and the marginal distribution respectively. [sent-204, score-0.507]

74 The basic property which allows us to devise the CSP algorithm is that LCM is defined as a product of factor in the cf domain. [sent-205, score-0.474]

75 Typically, belief propagation algorithms are applied to a probability distribution which factors as a product of potentials in the pdf domain. [sent-206, score-0.262]

76 The sum-product algorithm uses the distributivity of the integral and product operation to devise efficient recursive evaluation of the marginal probability. [sent-207, score-0.266]

77 Equivalently, the Characteristic-Slice-Product algorithm uses the distributivity of the slicing and product operations to perform efficient inference to compute the marginal cf in the cf domain, as shown in Theorem 4. [sent-208, score-0.941]

78 In a similar way, the Integral-Convolution algorithm uses distributivity of the integral and convolution operations to perform efficient inference in the pdf domain. [sent-210, score-0.429]

79 4 Approximate inference for stable distributions For the case of stable distributions, we derive an approximation algorithm, Stable-Jacobi (Algorithm 2(c)), out of the CSP update rules. [sent-214, score-0.775]

80 1 0 0 5 10 15 20 25 30 35 Source Port 40 10 10 10 10 10 (a) (b) 2 β γ δ 0 -2 -4 -6 -8 0 5 10 15 iteration 20 25 (c) Pajek Figure 2: (a) Distribution of network flows on a typical PlanetLab host is fitted quite well with a Levy distribution. [sent-227, score-0.269]

81 where ρ(R) is the spectral radius (the largest absolute value of the eigenvalues of R), R I − A, |R| is the entrywise absolute value and |R|α is the entrywise exponentiation. [sent-242, score-0.184]

82 8 5 Application: Network flow monitoring In this section we propose a novel application for inference in LCMs to model network traffic flows of a large operational worldwide testbed. [sent-245, score-0.625]

83 We define a network flow as a directed edge between a transmitting and receiving hosts. [sent-250, score-0.312]

84 Empirically, we found out that network flow distribution in a single PlanetLab node are fitted quite well using L´ vy distribution a e stable distribution with α = 0. [sent-254, score-0.717]

85 For performing the fitting, we use Mark Veillette’s Matlab stable distribution package [31]. [sent-257, score-0.336]

86 In contrast, by approximating network flow distribution with stable distributions, we need only 4 7 When the matrix A is positive definite it is always possible to normalize it to a unit diagonal. [sent-259, score-0.512]

87 8 Note that there is an interesting relation to the walk-summability convergence condition [12] of belief propagation in the Gaussian case: ρ(|R|) < 1. [sent-262, score-0.181]

88 Furthermore, using the developed theory in previous sections, we are able to linearly aggregate distribution of flows in clusters of nodes. [sent-266, score-0.187]

89 We extracted a connected component of traffic flows connecting the core network 652 nodes. [sent-267, score-0.364]

90 We fitted a stable distribution characterizing flow behavior for each machine. [sent-268, score-0.467]

91 A partition of 376 machines as the observed flows Yi (where flow distribution is known). [sent-269, score-0.28]

92 The task is to predict the distribution of the unobserved remaining 376 flows Xi , based on the observed traffic flows (entries of Aij ). [sent-270, score-0.468]

93 We run approximate inference using Stable-Jacobi and compared the results to the exact result computed by LCM-Elimination. [sent-271, score-0.209]

94 We emphasize again, that using related techniques (Copula method , CFG, and ICA) it is not possible to compute exact inference for the problem at hand. [sent-272, score-0.18]

95 6 Conclusion and future work We have presented a novel linear graphical model called LCM, defined in the cf domain. [sent-281, score-0.551]

96 We have shown for the first time how to perform exact and approximate inference in a linear multivariate graphical model when the underlying distributions are stable. [sent-282, score-0.385]

97 We have discussed an application of our construction for computing inference of network flows. [sent-283, score-0.196]

98 We have proposed to borrow ideas from belief propagation, for computing efficient inference, based on the distributivity property of the slice-product operations and the integral-convolution operations. [sent-284, score-0.261]

99 Other families of distributions like geometric stable distributions or Wishart can be analyzed in our model. [sent-287, score-0.433]

100 Nolan (American University) , for sharing parts of his excellent book about stable distribution online, Mark Veillette (Boston University) for sharing his stable distribution code online, to Jason K. [sent-291, score-0.672]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lcm', 0.552), ('stable', 0.297), ('cf', 0.245), ('cfg', 0.21), ('defined', 0.198), ('flows', 0.148), ('planetlab', 0.147), ('aij', 0.134), ('traffic', 0.133), ('ti', 0.114), ('characteristic', 0.114), ('inference', 0.113), ('define', 0.104), ('tan', 0.104), ('sm', 0.103), ('csp', 0.097), ('fourier', 0.095), ('flow', 0.093), ('entrywise', 0.092), ('vy', 0.089), ('cauchy', 0.083), ('efficient', 0.083), ('network', 0.083), ('lcms', 0.081), ('pdf', 0.079), ('marginal', 0.078), ('propagation', 0.074), ('definition', 0.074), ('bickson', 0.074), ('crovella', 0.074), ('defining', 0.074), ('distributivity', 0.074), ('yi', 0.07), ('belief', 0.07), ('slicing', 0.069), ('distributions', 0.068), ('transform', 0.068), ('exact', 0.067), ('graphical', 0.067), ('xi', 0.066), ('sigcomm', 0.065), ('sufficient', 0.065), ('mki', 0.059), ('lakhina', 0.055), ('monitoring', 0.055), ('ica', 0.055), ('sign', 0.052), ('ym', 0.051), ('copulas', 0.05), ('ihler', 0.048), ('xj', 0.048), ('duality', 0.048), ('mij', 0.047), ('convolution', 0.046), ('ax', 0.042), ('linear', 0.041), ('tj', 0.04), ('channel', 0.04), ('dual', 0.04), ('distribution', 0.039), ('copula', 0.038), ('fitted', 0.038), ('convergence', 0.037), ('distributed', 0.037), ('benefit', 0.037), ('diot', 0.037), ('imc', 0.037), ('iuz', 0.037), ('multiuser', 0.037), ('planetflow', 0.037), ('stablejacobi', 0.037), ('veillette', 0.037), ('tn', 0.036), ('decoding', 0.036), ('operations', 0.034), ('ic', 0.034), ('cdf', 0.034), ('applicability', 0.033), ('boston', 0.033), ('transmitting', 0.032), ('devise', 0.031), ('gaussian', 0.03), ('exponent', 0.03), ('domain', 0.03), ('tomography', 0.03), ('artificial', 0.03), ('identification', 0.03), ('approximate', 0.029), ('mutually', 0.029), ('family', 0.029), ('convolutional', 0.028), ('mao', 0.028), ('port', 0.028), ('bandwidth', 0.027), ('fixed', 0.027), ('ny', 0.027), ('graphs', 0.027), ('zi', 0.026), ('levy', 0.026), ('networking', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

2 0.093573317 223 nips-2010-Rates of convergence for the cluster tree

Author: Kamalika Chaudhuri, Sanjoy Dasgupta

Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give ﬁnite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1

3 0.087991163 63 nips-2010-Distributed Dual Averaging In Networks

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is conﬁrmed both by theoretical lower bounds and simulations for various networks. 1

4 0.078046724 118 nips-2010-Implicit Differentiation by Perturbation

Author: Justin Domke

Abstract: This paper proposes a simple and eﬃcient ﬁnite diﬀerence method for implicit diﬀerentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, deﬁned on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to ﬁt graphical models with hidden variables, high treewidth and/or model misspeciﬁcation. 1

5 0.075679451 224 nips-2010-Regularized estimation of image statistics by Score Matching

Author: Diederik P. Kingma, Yann L. Cun

Abstract: Score Matching is a recently-proposed criterion for training high-dimensional density models for which maximum likelihood training is intractable. It has been applied to learning natural image statistics but has so-far been limited to simple models due to the difﬁculty of differentiating the loss with respect to the model parameters. We show how this differentiation can be automated with an extended version of the double-backpropagation algorithm. In addition, we introduce a regularization term for the Score Matching loss that enables its use for a broader range of problem by suppressing instabilities that occur with ﬁnite training sample sizes and quantized input values. Results are reported for image denoising and super-resolution.

6 0.06713862 53 nips-2010-Copula Bayesian Networks

7 0.058793228 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

8 0.058731094 54 nips-2010-Copula Processes

9 0.058058433 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

10 0.057408508 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

11 0.055995606 148 nips-2010-Learning Networks of Stochastic Differential Equations

12 0.055801518 101 nips-2010-Gaussian sampling by local perturbations

13 0.055378433 269 nips-2010-Throttling Poisson Processes

14 0.054224119 257 nips-2010-Structured Determinantal Point Processes

15 0.053314339 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform

16 0.052903511 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

17 0.051053949 258 nips-2010-Structured sparsity-inducing norms through submodular functions

18 0.05094314 108 nips-2010-Graph-Valued Regression

19 0.05064353 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

20 0.050364055 190 nips-2010-On the Convexity of Latent Social Network Inference

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.155), (1, 0.045), (2, 0.014), (3, 0.063), (4, -0.057), (5, -0.002), (6, 0.008), (7, 0.027), (8, 0.032), (9, 0.006), (10, -0.145), (11, -0.058), (12, 0.001), (13, -0.061), (14, -0.051), (15, -0.024), (16, 0.034), (17, -0.04), (18, 0.006), (19, -0.013), (20, -0.034), (21, -0.039), (22, 0.048), (23, -0.043), (24, -0.009), (25, -0.021), (26, 0.08), (27, -0.018), (28, 0.072), (29, -0.074), (30, -0.009), (31, 0.035), (32, 0.049), (33, -0.002), (34, -0.006), (35, -0.01), (36, 0.005), (37, 0.058), (38, 0.049), (39, 0.039), (40, -0.039), (41, -0.001), (42, 0.02), (43, 0.002), (44, -0.005), (45, -0.061), (46, -0.022), (47, -0.086), (48, 0.085), (49, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91905963 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

2 0.73837519 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs

Author: Nebojsa Jojic, Chris Meek, Jim C. Huang

Abstract: Many problem domains including climatology and epidemiology require models that can capture both heavy-tailed statistics and local dependencies. Specifying such distributions using graphical models for probability density functions (PDFs) generally lead to intractable inference and learning. Cumulative distribution networks (CDNs) provide a means to tractably specify multivariate heavy-tailed models as a product of cumulative distribution functions (CDFs). Existing algorithms for inference and learning in CDNs are limited to those with tree-structured (nonloopy) graphs. In this paper, we develop inference and learning algorithms for CDNs with arbitrary topology. Our approach to inference and learning relies on recursively decomposing the computation of mixed derivatives based on a junction trees over the cumulative distribution functions. We demonstrate that our systematic approach to utilizing the sparsity represented by the junction tree yields signiďŹ cant performance improvements over the general symbolic differentiation programs Mathematica and D*. Using two real-world datasets, we demonstrate that non-tree structured (loopy) CDNs are able to provide signiďŹ cantly better ďŹ ts to the data as compared to tree-structured and unstructured CDNs and other heavy-tailed multivariate distributions such as the multivariate copula and logistic models.

3 0.7305162 53 nips-2010-Copula Bayesian Networks

Author: Gal Elidan

Abstract: We present the Copula Bayesian Network model for representing multivariate continuous distributions, while taking advantage of the relative ease of estimating univariate distributions. Using a novel copula-based reparameterization of a conditional density, joined with a graph that encodes independencies, our model offers great ﬂexibility in modeling high-dimensional densities, while maintaining control over the form of the univariate marginals. We demonstrate the advantage of our framework for generalization over standard Bayesian networks as well as tree structured copula models for varied real-life domains that are of substantially higher dimension than those typically considered in the copula literature. 1

4 0.67493677 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-speciﬁc independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the ﬁrst ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we ﬁnd that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean ﬁeld, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

5 0.62252885 118 nips-2010-Implicit Differentiation by Perturbation

Author: Justin Domke

6 0.61326802 54 nips-2010-Copula Processes

7 0.59986514 224 nips-2010-Regularized estimation of image statistics by Score Matching

8 0.59869879 63 nips-2010-Distributed Dual Averaging In Networks

9 0.5706864 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

10 0.56312203 144 nips-2010-Learning Efficient Markov Networks

11 0.53798771 120 nips-2010-Improvements to the Sequence Memoizer

12 0.52996868 190 nips-2010-On the Convexity of Latent Social Network Inference

13 0.52547616 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

14 0.51659405 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

15 0.48206729 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

16 0.45926625 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features

17 0.456543 101 nips-2010-Gaussian sampling by local perturbations

18 0.45477369 223 nips-2010-Rates of convergence for the cluster tree

19 0.44907448 283 nips-2010-Variational Inference over Combinatorial Spaces

20 0.44811001 216 nips-2010-Probabilistic Inference and Differential Privacy

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.023), (17, 0.02), (27, 0.037), (30, 0.039), (45, 0.147), (50, 0.562), (52, 0.024), (60, 0.02), (77, 0.026), (90, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87768382 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

2 0.87632322 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efﬁcient representation, and inference algorithms operating on the new representation. Our derivations are based on precise deﬁnitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

3 0.85142028 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random ﬁelds (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive deﬁnite systems, this yields a very efﬁcient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efﬁcient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the ﬁrst layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as ﬁnite or inﬁnite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly beneﬁt from the proposed methods. 1

4 0.8269698 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classiﬁcation and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classiﬁcation risk and complexity. It searches the space of cascade conﬁgurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

5 0.77972561 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overﬁtting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple ﬁelds and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and ﬂexible way to model various different structures of multiple tasks.

6 0.71794438 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

7 0.55848026 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

8 0.55678558 54 nips-2010-Copula Processes

9 0.55175877 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

10 0.54190016 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

11 0.52635664 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

12 0.51810765 96 nips-2010-Fractionally Predictive Spiking Neurons

13 0.50697786 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

14 0.50293368 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

15 0.49030074 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

16 0.48514134 217 nips-2010-Probabilistic Multi-Task Feature Selection

17 0.48001695 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

18 0.4769215 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

19 0.4768571 158 nips-2010-Learning via Gaussian Herding

20 0.47581455 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers