nips nips2012 nips2012-244 knowledge-graph by maker-knowledge-mining

244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Source: pdf

Author: Zhihua Zhang, Bojun Tu

Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We deﬁne such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We ﬁnd that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. [sent-3, score-0.703]

2 We deﬁne such a penalty as the Laplace exponent of a subordinator. [sent-4, score-0.206]

3 Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. [sent-5, score-0.423]

4 Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. [sent-6, score-0.772]

5 Additionally, we explore the concave conjugate of nonconvex penalties. [sent-7, score-0.47]

6 We ﬁnd that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. [sent-8, score-0.369]

7 Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. [sent-9, score-0.164]

8 1 Introduction Variable selection plays a fundamental role in statistical modeling for high-dimensional data sets, especially when the underlying model has a sparse representation. [sent-10, score-0.074]

9 The approach based on penalty theory has been widely used for variable selection in the literature. [sent-11, score-0.162]

10 A principled approach is to due the lasso of [17], which uses the ℓ1 -norm penalty. [sent-12, score-0.076]

11 There has also been work on nonconvex penalties within a Bayesian framework. [sent-14, score-0.514]

12 In particular, they showed that the bridge penalty can be obtained by mixing the Laplace distribution with a stable distribution. [sent-16, score-0.172]

13 Other authors have shown that the prior induced from the LOG penalty has an interpretation as a scale mixture of Laplace distributions with an inverse gamma density [5, 9, 12, 2]. [sent-18, score-0.316]

14 [22] extended this class of Laplace variance mixtures by using a generalized inverse Gaussian density. [sent-20, score-0.044]

15 Our work is motivated by recent developments of Bayesian nonparametric methods in feature selection [10, 18, 4, 15]. [sent-22, score-0.073]

16 Especially, Polson and Scott [15] proposed a nonparametric approach for normal variance mixtures using L´ vy processes, which embeds ﬁnite dimensional normal variance e mixtures in inﬁnite ones. [sent-23, score-0.352]

17 We develop a Bayesian nonparametric approach for the construction of sparsity-inducing nonconvex penalties. [sent-24, score-0.429]

18 Particularly, we show that Laplace transformations of L´ vy processes can be viewed as pseudo-priors and the corresponding Laplace exponents then form e 1 sparsity-inducing nonconvex penalties. [sent-25, score-0.714]

19 Moreover, we exemplify that the LOG and EXP penalties can be respectively regarded as Laplace exponents of Gamma and compound Poisson subordinators. [sent-26, score-0.423]

20 This construction recovers an inherent connection between LOG and EXP. [sent-28, score-0.068]

21 Moreover, it provides us with an approach for adaptively updating tuning hyperparameters, which is a very important computational issue in nonconvex sparse penalization. [sent-29, score-0.418]

22 Typically, the multi-stage LLA and SparseNet algorithms with nonconvex penalties [21, 13] implement a two-dimensional grid research, so they take more computational costs. [sent-30, score-0.514]

23 2 L´ vy Processes for Nonconvex Penalty Functions e Suppose we are given a set of training data {(xi , yi ) : i = 1, . [sent-32, score-0.219]

24 We aim to ﬁnd a sparse estimate of regression vector b = (b1 , . [sent-44, score-0.053]

25 We particular study the use of Laplace variance mixtures in sparsity modeling. [sent-48, score-0.1]

26 For this purpose, we deﬁne a hierarchical model: [bj |ηj , σ] ∼ L(bj |0, σ(2ηj )−1 ), ind iid [ηj ] ∼ p(ηj ), p(σ) = “Constant”, where the ηj s are known as the local shrinkage parameters and L(b|u, η) denotes a Laplace distribution of the density ( ) 1 1 L(b|u, η) = exp − |b − u| . [sent-49, score-0.334]

27 4η 2η The classical regularization framework is based on a penalty function induced from the margin prior p(bj |σ). [sent-50, score-0.134]

29 Then the penalized regression problem is { min F (b) b p } ∑ 1 2 ∥y−Xb∥2 + λ ψ(|bj |) . [sent-52, score-0.076]

30 This implies d|b| that ψ(|b|) is nondecreasing and concave in |b|. [sent-54, score-0.152]

31 In other words, ψ(|b|) forms a class of nonconvex penalty functions for b. [sent-55, score-0.484]

32 Motivated by use of Bayesian nonparametrics in sparsity modeling, we now explore Laplace scale mixtures by relating η with a subordinator. [sent-56, score-0.1]

33 We thus have a Bayesian nonparametric formulation for the construction of joint priors of the bj ’s. [sent-57, score-0.351]

34 ∫∞ Lemma 1 Let ν be the L´ vy measure such that 0 min(u, 1)ν(du) < ∞. [sent-63, score-0.219]

35 Roughly speaking, a subordinator is an onedimensional L´ vy process that is non-decreasing (a. [sent-67, score-0.461]

36 An important property for subordinators e is given in the following lemma. [sent-70, score-0.266]

37 (2) 0 Here β ≥ 0 and ν is the L´ vy measure deﬁned in Lemma 1. [sent-72, score-0.219]

38 The function ψ in (2) is usually called the Laplace exponent of the subordinator and it satisﬁes ψ(0) = 0. [sent-75, score-0.314]

39 Lemma 1 implies that the Laplace exponent ψ is a Bernstein function and the corresponding Laplace transformation exp(−tψ(s)) is completely monotone. [sent-76, score-0.072]

40 Recall that the Laplace exponent ψ(s) is nonnegative, nondecreasing and concave on (0, ∞). [sent-77, score-0.224]

41 Thus, if we let s = |b|, then ψ(|b|) deﬁnes a nonconvex penalty function of b on (−∞, ∞). [sent-78, score-0.484]

42 If ψ(s) = 0, then ψ(|b|) is a nondifferentiable and nonconvex function of b on (−∞, ∞). [sent-85, score-0.384]

43 The subordinator T (t) plays the same role as the local shrinkage parameter η, which is also called a latent variable. [sent-87, score-0.343]

44 Moreover, we will see that t plays the role of a tuning hyperparameter. [sent-88, score-0.07]

45 Theorem 1 shows an explicit relationship between the local shrinkage parameter and the corresponding tuning hyperparameter; i. [sent-89, score-0.097]

46 ∫∞ 0 −1 Thus, if 0 T (t) p(T (t))dT (t) = 1/C < ∞, p∗ (T (t)) CT (t)−1 p(T (t)) deﬁnes a new proper density for T (t). [sent-93, score-0.072]

47 If 0 T (t)−1 p(T (t))dT (t) = ∞, then p∗ (T (t)) T (t)−1 p(T (t)) deﬁnes an improper density for T (t). [sent-97, score-0.1]

48 Thus, the improper prior exp(−tψ(|b|)) is a mixture of L(b|0, (2T (t))−1 ) with p∗ (T (t)). [sent-98, score-0.09]

49 ∫∞ If 0 exp(−tψ(s))ds is inﬁnite, exp(−tψ(|b|)) is an improper density w. [sent-99, score-0.1]

50 2 The MAP Estimation Based on the subordinator given in the previous subsection, we rewrite the hierarchical representation for joint prior of the bj under the regression framework. [sent-106, score-0.545]

52 The joint marginal pseudo-prior of the bj ’s is p p ) ( ( |b | )) ( η ∏ ∏∫ ∞ j j exp − tj ψ . [sent-109, score-0.567]

53 p∗ (b|σ) = exp − |bj | P (ηj )dηj = σ σ 0 j=1 j=1 Thus, the MAP estimate of b is based on the following optimization problem min b {1 2 ∥y − Xb∥2 + σ 2 p ∑ } tj ψ(|bj |/σ) . [sent-110, score-0.295]

54 j=1 Clearly, the tj ’s are tuning hyperparameters and the ηj ’s are latent variables. [sent-111, score-0.214]

55 Moreover, it is interesting that ηj (T (tj )) is deﬁned as a subordinator w. [sent-112, score-0.242]

56 3 Gamma and Compound Poisson Subordinators In [15], the authors discussed the use of α-stable subordinators and inverted-beta subordinators. [sent-116, score-0.266]

57 In this section we study applications of Gamma and Compound Poisson subordinators in constructing nonconvex penalty functions. [sent-117, score-0.773]

58 We establish an interesting connection of these two subordinators with nonconvex logarithmic (LOG) and exponential (EXP) penalties. [sent-118, score-0.679]

59 Particularly, these two penalties are the Laplace exponents of the two subordinators, respectively. [sent-119, score-0.269]

60 1 The LOG penalty and Gamma Subordinator The log-penalty function is deﬁned by ψ(|b|) = ( ) 1 log α|b|+1 , γ α, γ > 0. [sent-121, score-0.201]

61 Thus, it is the Laplace exponent of a subordinator. [sent-123, score-0.072]

62 Then, ) ∫ ∞[ ( ] 1 log αs+1 = 1 − exp(−su) ν(du), γ 0 where the L´ vy measure ν is e ν(du) = 1 exp(−u/α)du. [sent-126, score-0.286]

63 0 Furthermore, if t > γ, we can form the pseudo-prior as a proper distribution, which is the mixture of L(b|0, T (t)−1 ) with Gamma distribution Ga(T (t)|γ −1 t−1, α). [sent-131, score-0.062]

64 Then T (t) Z(K(1)) + · · · + Z(K(t)) for t ≥ 0 follows a compound Poisson distribution (denoted T (t) ∼ Po(T (t)|λt, µZ )). [sent-141, score-0.154]

65 We then call {T (t) : t ≥ 0} the compound Poisson process. [sent-142, score-0.154]

66 A compound Poisson process is a subordinator if and only if the Z(k) are nonnegative random variables [16]. [sent-144, score-0.434]

67 In this section we employ the compound Poisson process to explore the EXP penalty, which is ψ(|b|) = 1 (1 − exp(−α|b|)), γ α, γ > 0. [sent-145, score-0.154]

68 Then ∫ ∞ ψ(s) = [1 − exp(−su)]ν(du) 0 with the L´ vy measure ν(du) = γ e −1 δα (u)du. [sent-148, score-0.219]

69 Furthermore, ∫ ∞ exp(−tψ(s)) = exp(−sT (t))P (T (t))dT (t), 0 where {T (t) : t ≥ 0} is a compound Poisson subordinator, each T (t) ∼ Po(T (t)|t/γ, δα (·)), and δu (·) is the Dirac Delta measure. [sent-149, score-0.154]

70 ∫ Note that R (1− exp(−α|b|))db = ∞, so γ −1 (1 − exp(−α|b|)) is an improper prior of b. [sent-150, score-0.061]

71 Usually, for the LOG penalty ones set γ = log(1 + α), because the corresponding ψ(|b|) goes from ∥b∥1 to ∥b∥0 , as α varying from 0 to ∞. [sent-152, score-0.134]

72 In this section we have an interesting connection between the LOG and EXP penalties based on the relationship between the Gamma and compound Poisson subordinators. [sent-162, score-0.352]

73 Subordinators help 5 us establish a direct connection between the tuning hyperparameters tj and the latent variables ηj (T (tj )). [sent-163, score-0.248]

74 However, when we implement the MAP estimation, it is challenging how to select these tuning hyperparameters. [sent-164, score-0.046]

75 [14] considered the application of concave conjugates in developing variational EM algorithms for non-Gaussian latent variable models. [sent-166, score-0.231]

76 In the next section we rederive the nonconvex LOG and EXP penalties via concave conjugate. [sent-167, score-0.634]

77 4 A View of Concave Conjugate Our derivation for the LOG and EXP penalties is based on the Kullback-Leibler (KL) distance. [sent-169, score-0.164]

78 , sp )T , the KL distance between them is p ∑ aj KL(a, s) = aj log −aj +sj , sj j=1 where 0 log 0 = 0. [sent-176, score-0.442]

79 , ap )T be a nonnegative vector and |b| = (|b1 |, . [sent-181, score-0.061]

80 Then, p ∑ aj ψ(|bj |) j=1 } { ( ) 1 log α|bj |+1 = min wT |b| + KL(a, w) w≥0 α α p ∑ aj j=1 when wj = aj /(1 + α|bj |), and p ∑ { } 1 [1 − exp(−α|bj |)] = min wT |b| + KL(w, a) w≥0 α α p ∑ aj aj ψ(|bj |) j=1 j=1 when wj = aj exp(−α|bj |). [sent-185, score-1.059]

81 When setting aj = α tj , we readily see the LOG and EXP penalties. [sent-186, score-0.258]

82 Since KL(a, w) is strictly convex in either w or a, the LOG and EXP penalties are respectively the concave conjugates of −α−1 KL(a, w) and −α−1 KL(w, a). [sent-188, score-0.369]

83 The construction method for the nonconvex penalties provides us with a new approach for solving the corresponding penalized regression model. [sent-189, score-0.624]

84 In particular, to solve the nonconvex penalized regression problem: min b { J(b, a) p } ∑ 1 aj ψ(|bj |) , ∥y − Xb∥2 + 2 2 j=1 (5) we equivalently formulate it as { {1 }} 1 min min ∥y − Xb∥2 + wT |b| + D(w, a) . [sent-190, score-0.569]

85 The ﬁrst step calculates w(k) via w(k) = argmin w>0 (k) Particular, wj (k) p {∑ (k) wj |bj | + j=1 (k) (k) = aj /(1 + α|bj |) in LOG, while wj 6 } 1 D(w, a(k) ) . [sent-201, score-0.398]

86 The second step then calculates (b(k+1) , a(k+1) ) via {1 } 1 (b(k+1) , a(k+1) ) = argmin ∥y − Xb∥2 + |b|T w(k) + D(w(k) , a) . [sent-203, score-0.054]

87 Namely, a(k+1) = w(k) and b (k+1) = argmin b {1 2 ∥y − Xb∥2 2 + p ∑ } (k) wj |bj | . [sent-206, score-0.092]

88 j=1 Recall that the LOG and EXP penalties are differentiable and strictly concave in |b| on [0, ∞). [sent-207, score-0.284]

89 In the second experiment, we apply our methods to regression problems on four datasets from UCI Machine Learning Repository and the cookie (Near-Infrared (NIR) Spectroscopy of Biscuit Doughs) dataset [7]. [sent-230, score-0.054]

90 We report the mean and standard deviation of the Root Mean Square Error (RMSE) and the model sparsity (proportion of zero coefﬁcients in the model) in Tables 1 and 2. [sent-232, score-0.056]

91 We form four different datasets for the four responses (“fat”, “sucrose”, “dry ﬂour” and “water”) in the experiment, and report the RMSE on the test set and the model sparsity in Table 3. [sent-234, score-0.079]

92 But the nonconvex LOG, EXP and MCP have strong ability in feature selection. [sent-236, score-0.35]

93 29 Conclusion In this paper we have introduced subordinators of L´ vy processes into the deﬁnition of nonconvex e penalties. [sent-360, score-0.875]

94 This leads us to a Bayesian nonparametric approach for constructing sparsity-inducing penalties. [sent-361, score-0.068]

95 Along this line, it would be interesting to investigate other penalty functions via subordinators and compare the performance of these penalties. [sent-363, score-0.4]

96 Feature selection via concave minimization and support vector machines. [sent-382, score-0.148]

97 Variable selection via nonconcave penalized likelihood and its Oracle properties. [sent-397, score-0.116]

98 Application of near-infrared reﬂectance spectroscopy to compositional analysis of biscuits and biscuit dough. [sent-405, score-0.087]

99 Group sparse coding with a Laplacian scale mixture prior. [sent-419, score-0.051]

100 Local shrinkage rules, l´ vy processes, and regularized regression. [sent-462, score-0.27]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nonconvex', 0.35), ('bj', 0.272), ('laplace', 0.268), ('subordinators', 0.266), ('subordinator', 0.242), ('vy', 0.219), ('mcp', 0.218), ('adlasso', 0.193), ('exp', 0.18), ('penalties', 0.164), ('compound', 0.154), ('nir', 0.149), ('aj', 0.143), ('penalty', 0.134), ('poisson', 0.127), ('kl', 0.126), ('xb', 0.123), ('spe', 0.121), ('concave', 0.12), ('tj', 0.115), ('gamma', 0.114), ('bernstein', 0.106), ('exponents', 0.105), ('fse', 0.085), ('conjugates', 0.085), ('rmse', 0.078), ('lasso', 0.076), ('du', 0.075), ('lla', 0.073), ('exponent', 0.072), ('dt', 0.07), ('snr', 0.07), ('log', 0.067), ('wj', 0.067), ('ind', 0.064), ('po', 0.063), ('improper', 0.061), ('sparsity', 0.056), ('zou', 0.053), ('shrinkage', 0.051), ('su', 0.05), ('biscuit', 0.048), ('pyrim', 0.048), ('sparsenet', 0.048), ('sucrose', 0.048), ('triazines', 0.048), ('tuning', 0.046), ('nonparametric', 0.045), ('penalized', 0.045), ('mixtures', 0.044), ('bayesian', 0.043), ('ga', 0.043), ('dry', 0.043), ('polson', 0.043), ('nonconcave', 0.043), ('scad', 0.043), ('processes', 0.04), ('sparsityinducing', 0.039), ('fat', 0.039), ('spectroscopy', 0.039), ('density', 0.039), ('bridge', 0.038), ('nonnegative', 0.038), ('caron', 0.037), ('abalone', 0.035), ('housing', 0.035), ('connection', 0.034), ('nondifferentiable', 0.034), ('palmer', 0.034), ('construction', 0.034), ('proper', 0.033), ('nondecreasing', 0.032), ('regression', 0.031), ('st', 0.03), ('water', 0.03), ('grif', 0.03), ('mixture', 0.029), ('logarithmic', 0.029), ('calculates', 0.029), ('selection', 0.028), ('moreover', 0.027), ('hyperparameters', 0.027), ('lemma', 0.027), ('latent', 0.026), ('wt', 0.026), ('argmin', 0.025), ('uk', 0.025), ('bp', 0.025), ('monotone', 0.025), ('plays', 0.024), ('china', 0.024), ('accordingly', 0.024), ('ap', 0.023), ('datasets', 0.023), ('constructing', 0.023), ('theorem', 0.023), ('sparse', 0.022), ('intensity', 0.022), ('lemmas', 0.022), ('sj', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

2 0.10465504 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backﬁtting algorithms are derived and justiﬁed using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1

3 0.098225035 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially inﬁnite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well ﬁtted to several real-world social networks. 1

4 0.09674345 37 nips-2012-Affine Independent Variational Inference

Author: Edward Challis, David Barber

Abstract: We consider inference in a broad class of non-conjugate probabilistic models based on minimising the Kullback-Leibler divergence between the given target density and an approximating ‘variational’ density. In particular, for generalised linear models we describe approximating densities formed from an afﬁne transformation of independently distributed latent variables, this class including many well known densities as special cases. We show how all relevant quantities can be efﬁciently computed using the fast Fourier transform. This extends the known class of tractable variational approximations and enables the ﬁtting for example of skew variational densities to the target density. 1

5 0.096281417 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

Author: Yi Wu, David P. Wipf

Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefﬁcients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefﬁcients, while an empirical Bayesian alternative (Type II) ﬁrst marginalizes the coefﬁcients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefﬁcient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data ﬁt. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefﬁcient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefﬁcients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1

6 0.094233342 300 nips-2012-Scalable nonconvex inexact proximal splitting

7 0.082730621 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

8 0.07834667 60 nips-2012-Bayesian nonparametric models for ranked data

9 0.073407792 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

10 0.072010532 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

11 0.071956068 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

12 0.070774212 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

13 0.069935106 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

14 0.062618554 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

15 0.062450312 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

16 0.057037055 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

17 0.056434441 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

18 0.056375172 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

19 0.053381346 206 nips-2012-Majorization for CRFs and Latent Likelihoods

20 0.051782187 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.154), (1, 0.043), (2, 0.053), (3, 0.0), (4, -0.065), (5, 0.035), (6, 0.0), (7, -0.0), (8, 0.045), (9, -0.028), (10, -0.033), (11, 0.028), (12, 0.015), (13, -0.031), (14, 0.023), (15, -0.025), (16, 0.034), (17, -0.037), (18, -0.039), (19, -0.06), (20, 0.019), (21, 0.047), (22, 0.012), (23, 0.03), (24, -0.029), (25, 0.01), (26, 0.022), (27, -0.063), (28, -0.082), (29, -0.019), (30, -0.115), (31, -0.003), (32, 0.063), (33, 0.028), (34, 0.089), (35, 0.119), (36, 0.05), (37, -0.057), (38, 0.037), (39, 0.072), (40, -0.175), (41, 0.051), (42, 0.052), (43, 0.002), (44, -0.043), (45, 0.012), (46, -0.066), (47, -0.058), (48, 0.065), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92822665 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

2 0.71372575 59 nips-2012-Bayesian nonparametric models for bipartite graphs

Author: Francois Caron

3 0.64689606 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

Author: Mingyuan Zhou, Lawrence Carin

Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efﬁcient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1

4 0.58991444 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

5 0.53633684 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

Author: Charles Blundell, Jeff Beck, Katherine A. Heller

Abstract: We present a Bayesian nonparametric model that discovers implicit social structure from interaction time-series data. Social groups are often formed implicitly, through actions among members of groups. Yet many models of social networks use explicitly declared relationships to infer social structure. We consider a particular class of Hawkes processes, a doubly stochastic point process, that is able to model reciprocity between groups of individuals. We then extend the Inﬁnite Relational Model by using these reciprocating Hawkes processes to parameterise its edges, making events associated with edges co-dependent through time. Our model outperforms general, unstructured Hawkes processes as well as structured Poisson process-based models at predicting verbal and email turn-taking, and military conﬂicts among nations. 1

6 0.53008646 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

7 0.51091063 60 nips-2012-Bayesian nonparametric models for ranked data

8 0.49116457 319 nips-2012-Sparse Prediction with the $k$-Support Norm

9 0.47113189 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

10 0.46861789 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing

11 0.46689981 294 nips-2012-Repulsive Mixtures

12 0.45281279 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

13 0.44868237 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

14 0.44257301 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

15 0.43283677 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

16 0.43087763 37 nips-2012-Affine Independent Variational Inference

17 0.42829159 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

18 0.42681211 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

19 0.40962803 206 nips-2012-Majorization for CRFs and Latent Likelihoods

20 0.40573016 359 nips-2012-Variational Inference for Crowdsourcing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.036), (21, 0.035), (38, 0.102), (39, 0.012), (42, 0.029), (46, 0.325), (54, 0.023), (55, 0.015), (74, 0.041), (76, 0.171), (80, 0.089), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80367213 256 nips-2012-On the connections between saliency and tracking

Author: Vijay Mahadevan, Nuno Vasconcelos

Abstract: A model connecting visual tracking and saliency has recently been proposed. This model is based on the saliency hypothesis for tracking which postulates that tracking is achieved by the top-down tuning, based on target features, of discriminant center-surround saliency mechanisms over time. In this work, we identify three main predictions that must hold if the hypothesis were true: 1) tracking reliability should be larger for salient than for non-salient targets, 2) tracking reliability should have a dependence on the deﬁning variables of saliency, namely feature contrast and distractor heterogeneity, and must replicate the dependence of saliency on these variables, and 3) saliency and tracking can be implemented with common low level neural mechanisms. We conﬁrm that the ﬁrst two predictions hold by reporting results from a set of human behavior studies on the connection between saliency and tracking. We also show that the third prediction holds by constructing a common neurophysiologically plausible architecture that can computationally solve both saliency and tracking. This architecture is fully compliant with the standard physiological models of V1 and MT, and with what is known about attentional control in area LIP, while explaining the results of the human behavior experiments.

same-paper 2 0.75849164 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

3 0.73497623 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees

Author: Erik Talvitie

Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some speciﬁc predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a ﬁnite sufﬁx of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1

4 0.73330563 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

Author: Dongho Kim, Kee-eung Kim, Pascal Poupart

Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1

5 0.57271391 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

Author: Ben Calderhead, Mátyás A. Sustik

Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efﬁciency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we ﬁrstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1

6 0.57177949 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

7 0.57137877 188 nips-2012-Learning from Distributions via Support Measure Machines

8 0.57108045 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

9 0.57059318 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

10 0.57029814 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

11 0.57001847 280 nips-2012-Proper losses for learning from partial labels

12 0.56982058 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

13 0.56974065 294 nips-2012-Repulsive Mixtures

14 0.56921643 232 nips-2012-Multiplicative Forests for Continuous-Time Processes

15 0.56905007 279 nips-2012-Projection Retrieval for Classification

16 0.56895781 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

17 0.5689441 126 nips-2012-FastEx: Hash Clustering with Exponential Families

18 0.56889188 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data

19 0.56867361 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

20 0.56761003 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses