nips nips2012 nips2012-244 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhihua Zhang, Bojun Tu
Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. [sent-3, score-0.703]
2 We define such a penalty as the Laplace exponent of a subordinator. [sent-4, score-0.206]
3 Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. [sent-5, score-0.423]
4 Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. [sent-6, score-0.772]
5 Additionally, we explore the concave conjugate of nonconvex penalties. [sent-7, score-0.47]
6 We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. [sent-8, score-0.369]
7 Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. [sent-9, score-0.164]
8 1 Introduction Variable selection plays a fundamental role in statistical modeling for high-dimensional data sets, especially when the underlying model has a sparse representation. [sent-10, score-0.074]
9 The approach based on penalty theory has been widely used for variable selection in the literature. [sent-11, score-0.162]
10 A principled approach is to due the lasso of [17], which uses the ℓ1 -norm penalty. [sent-12, score-0.076]
11 There has also been work on nonconvex penalties within a Bayesian framework. [sent-14, score-0.514]
12 In particular, they showed that the bridge penalty can be obtained by mixing the Laplace distribution with a stable distribution. [sent-16, score-0.172]
13 Other authors have shown that the prior induced from the LOG penalty has an interpretation as a scale mixture of Laplace distributions with an inverse gamma density [5, 9, 12, 2]. [sent-18, score-0.316]
14 [22] extended this class of Laplace variance mixtures by using a generalized inverse Gaussian density. [sent-20, score-0.044]
15 Our work is motivated by recent developments of Bayesian nonparametric methods in feature selection [10, 18, 4, 15]. [sent-22, score-0.073]
16 Especially, Polson and Scott [15] proposed a nonparametric approach for normal variance mixtures using L´ vy processes, which embeds finite dimensional normal variance e mixtures in infinite ones. [sent-23, score-0.352]
17 We develop a Bayesian nonparametric approach for the construction of sparsity-inducing nonconvex penalties. [sent-24, score-0.429]
18 Particularly, we show that Laplace transformations of L´ vy processes can be viewed as pseudo-priors and the corresponding Laplace exponents then form e 1 sparsity-inducing nonconvex penalties. [sent-25, score-0.714]
19 Moreover, we exemplify that the LOG and EXP penalties can be respectively regarded as Laplace exponents of Gamma and compound Poisson subordinators. [sent-26, score-0.423]
20 This construction recovers an inherent connection between LOG and EXP. [sent-28, score-0.068]
21 Moreover, it provides us with an approach for adaptively updating tuning hyperparameters, which is a very important computational issue in nonconvex sparse penalization. [sent-29, score-0.418]
22 Typically, the multi-stage LLA and SparseNet algorithms with nonconvex penalties [21, 13] implement a two-dimensional grid research, so they take more computational costs. [sent-30, score-0.514]
23 2 L´ vy Processes for Nonconvex Penalty Functions e Suppose we are given a set of training data {(xi , yi ) : i = 1, . [sent-32, score-0.219]
24 We aim to find a sparse estimate of regression vector b = (b1 , . [sent-44, score-0.053]
25 We particular study the use of Laplace variance mixtures in sparsity modeling. [sent-48, score-0.1]
26 For this purpose, we define a hierarchical model: [bj |ηj , σ] ∼ L(bj |0, σ(2ηj )−1 ), ind iid [ηj ] ∼ p(ηj ), p(σ) = “Constant”, where the ηj s are known as the local shrinkage parameters and L(b|u, η) denotes a Laplace distribution of the density ( ) 1 1 L(b|u, η) = exp − |b − u| . [sent-49, score-0.334]
27 4η 2η The classical regularization framework is based on a penalty function induced from the margin prior p(bj |σ). [sent-50, score-0.134]
28 Let ψ(|b|) = − log p(b|σ), ∫∞ where p(b|σ) = 0 L(b|0, ση −1 )p(η)dη. [sent-51, score-0.067]
29 Then the penalized regression problem is { min F (b) b p } ∑ 1 2 ∥y−Xb∥2 + λ ψ(|bj |) . [sent-52, score-0.076]
30 This implies d|b| that ψ(|b|) is nondecreasing and concave in |b|. [sent-54, score-0.152]
31 In other words, ψ(|b|) forms a class of nonconvex penalty functions for b. [sent-55, score-0.484]
32 Motivated by use of Bayesian nonparametrics in sparsity modeling, we now explore Laplace scale mixtures by relating η with a subordinator. [sent-56, score-0.1]
33 We thus have a Bayesian nonparametric formulation for the construction of joint priors of the bj ’s. [sent-57, score-0.351]
34 ∫∞ Lemma 1 Let ν be the L´ vy measure such that 0 min(u, 1)ν(du) < ∞. [sent-63, score-0.219]
35 Roughly speaking, a subordinator is an onedimensional L´ vy process that is non-decreasing (a. [sent-67, score-0.461]
36 An important property for subordinators e is given in the following lemma. [sent-70, score-0.266]
37 (2) 0 Here β ≥ 0 and ν is the L´ vy measure defined in Lemma 1. [sent-72, score-0.219]
38 The function ψ in (2) is usually called the Laplace exponent of the subordinator and it satisfies ψ(0) = 0. [sent-75, score-0.314]
39 Lemma 1 implies that the Laplace exponent ψ is a Bernstein function and the corresponding Laplace transformation exp(−tψ(s)) is completely monotone. [sent-76, score-0.072]
40 Recall that the Laplace exponent ψ(s) is nonnegative, nondecreasing and concave on (0, ∞). [sent-77, score-0.224]
41 Thus, if we let s = |b|, then ψ(|b|) defines a nonconvex penalty function of b on (−∞, ∞). [sent-78, score-0.484]
42 If ψ(s) = 0, then ψ(|b|) is a nondifferentiable and nonconvex function of b on (−∞, ∞). [sent-85, score-0.384]
43 The subordinator T (t) plays the same role as the local shrinkage parameter η, which is also called a latent variable. [sent-87, score-0.343]
44 Moreover, we will see that t plays the role of a tuning hyperparameter. [sent-88, score-0.07]
45 Theorem 1 shows an explicit relationship between the local shrinkage parameter and the corresponding tuning hyperparameter; i. [sent-89, score-0.097]
46 ∫∞ 0 −1 Thus, if 0 T (t) p(T (t))dT (t) = 1/C < ∞, p∗ (T (t)) CT (t)−1 p(T (t)) defines a new proper density for T (t). [sent-93, score-0.072]
47 If 0 T (t)−1 p(T (t))dT (t) = ∞, then p∗ (T (t)) T (t)−1 p(T (t)) defines an improper density for T (t). [sent-97, score-0.1]
48 Thus, the improper prior exp(−tψ(|b|)) is a mixture of L(b|0, (2T (t))−1 ) with p∗ (T (t)). [sent-98, score-0.09]
49 ∫∞ If 0 exp(−tψ(s))ds is infinite, exp(−tψ(|b|)) is an improper density w. [sent-99, score-0.1]
50 2 The MAP Estimation Based on the subordinator given in the previous subsection, we rewrite the hierarchical representation for joint prior of the bj under the regression framework. [sent-106, score-0.545]
51 That is, [bj |ηj , σ] ind ∼ L(bj |0, σ(2ηj )−1 ), p∗ (ηj ) ∝ −1 σηj p(ηj ), which is equivalent to ( η ) ind j [bj , ηj |σ] ∝ exp − |bj | p(ηj ). [sent-107, score-0.308]
52 The joint marginal pseudo-prior of the bj ’s is p p ) ( ( |b | )) ( η ∏ ∏∫ ∞ j j exp − tj ψ . [sent-109, score-0.567]
53 p∗ (b|σ) = exp − |bj | P (ηj )dηj = σ σ 0 j=1 j=1 Thus, the MAP estimate of b is based on the following optimization problem min b {1 2 ∥y − Xb∥2 + σ 2 p ∑ } tj ψ(|bj |/σ) . [sent-110, score-0.295]
54 j=1 Clearly, the tj ’s are tuning hyperparameters and the ηj ’s are latent variables. [sent-111, score-0.214]
55 Moreover, it is interesting that ηj (T (tj )) is defined as a subordinator w. [sent-112, score-0.242]
56 3 Gamma and Compound Poisson Subordinators In [15], the authors discussed the use of α-stable subordinators and inverted-beta subordinators. [sent-116, score-0.266]
57 In this section we study applications of Gamma and Compound Poisson subordinators in constructing nonconvex penalty functions. [sent-117, score-0.773]
58 We establish an interesting connection of these two subordinators with nonconvex logarithmic (LOG) and exponential (EXP) penalties. [sent-118, score-0.679]
59 Particularly, these two penalties are the Laplace exponents of the two subordinators, respectively. [sent-119, score-0.269]
60 1 The LOG penalty and Gamma Subordinator The log-penalty function is defined by ψ(|b|) = ( ) 1 log α|b|+1 , γ α, γ > 0. [sent-121, score-0.201]
61 Thus, it is the Laplace exponent of a subordinator. [sent-123, score-0.072]
62 Then, ) ∫ ∞[ ( ] 1 log αs+1 = 1 − exp(−su) ν(du), γ 0 where the L´ vy measure ν is e ν(du) = 1 exp(−u/α)du. [sent-126, score-0.286]
63 0 Furthermore, if t > γ, we can form the pseudo-prior as a proper distribution, which is the mixture of L(b|0, T (t)−1 ) with Gamma distribution Ga(T (t)|γ −1 t−1, α). [sent-131, score-0.062]
64 Then T (t) Z(K(1)) + · · · + Z(K(t)) for t ≥ 0 follows a compound Poisson distribution (denoted T (t) ∼ Po(T (t)|λt, µZ )). [sent-141, score-0.154]
65 We then call {T (t) : t ≥ 0} the compound Poisson process. [sent-142, score-0.154]
66 A compound Poisson process is a subordinator if and only if the Z(k) are nonnegative random variables [16]. [sent-144, score-0.434]
67 In this section we employ the compound Poisson process to explore the EXP penalty, which is ψ(|b|) = 1 (1 − exp(−α|b|)), γ α, γ > 0. [sent-145, score-0.154]
68 Then ∫ ∞ ψ(s) = [1 − exp(−su)]ν(du) 0 with the L´ vy measure ν(du) = γ e −1 δα (u)du. [sent-148, score-0.219]
69 Furthermore, ∫ ∞ exp(−tψ(s)) = exp(−sT (t))P (T (t))dT (t), 0 where {T (t) : t ≥ 0} is a compound Poisson subordinator, each T (t) ∼ Po(T (t)|t/γ, δα (·)), and δu (·) is the Dirac Delta measure. [sent-149, score-0.154]
70 ∫ Note that R (1− exp(−α|b|))db = ∞, so γ −1 (1 − exp(−α|b|)) is an improper prior of b. [sent-150, score-0.061]
71 Usually, for the LOG penalty ones set γ = log(1 + α), because the corresponding ψ(|b|) goes from ∥b∥1 to ∥b∥0 , as α varying from 0 to ∞. [sent-152, score-0.134]
72 In this section we have an interesting connection between the LOG and EXP penalties based on the relationship between the Gamma and compound Poisson subordinators. [sent-162, score-0.352]
73 Subordinators help 5 us establish a direct connection between the tuning hyperparameters tj and the latent variables ηj (T (tj )). [sent-163, score-0.248]
74 However, when we implement the MAP estimation, it is challenging how to select these tuning hyperparameters. [sent-164, score-0.046]
75 [14] considered the application of concave conjugates in developing variational EM algorithms for non-Gaussian latent variable models. [sent-166, score-0.231]
76 In the next section we rederive the nonconvex LOG and EXP penalties via concave conjugate. [sent-167, score-0.634]
77 4 A View of Concave Conjugate Our derivation for the LOG and EXP penalties is based on the Kullback-Leibler (KL) distance. [sent-169, score-0.164]
78 , sp )T , the KL distance between them is p ∑ aj KL(a, s) = aj log −aj +sj , sj j=1 where 0 log 0 = 0. [sent-176, score-0.442]
79 , ap )T be a nonnegative vector and |b| = (|b1 |, . [sent-181, score-0.061]
80 Then, p ∑ aj ψ(|bj |) j=1 } { ( ) 1 log α|bj |+1 = min wT |b| + KL(a, w) w≥0 α α p ∑ aj j=1 when wj = aj /(1 + α|bj |), and p ∑ { } 1 [1 − exp(−α|bj |)] = min wT |b| + KL(w, a) w≥0 α α p ∑ aj aj ψ(|bj |) j=1 j=1 when wj = aj exp(−α|bj |). [sent-185, score-1.059]
81 When setting aj = α tj , we readily see the LOG and EXP penalties. [sent-186, score-0.258]
82 Since KL(a, w) is strictly convex in either w or a, the LOG and EXP penalties are respectively the concave conjugates of −α−1 KL(a, w) and −α−1 KL(w, a). [sent-188, score-0.369]
83 The construction method for the nonconvex penalties provides us with a new approach for solving the corresponding penalized regression model. [sent-189, score-0.624]
84 In particular, to solve the nonconvex penalized regression problem: min b { J(b, a) p } ∑ 1 aj ψ(|bj |) , ∥y − Xb∥2 + 2 2 j=1 (5) we equivalently formulate it as { {1 }} 1 min min ∥y − Xb∥2 + wT |b| + D(w, a) . [sent-190, score-0.569]
85 The first step calculates w(k) via w(k) = argmin w>0 (k) Particular, wj (k) p {∑ (k) wj |bj | + j=1 (k) (k) = aj /(1 + α|bj |) in LOG, while wj 6 } 1 D(w, a(k) ) . [sent-201, score-0.398]
86 The second step then calculates (b(k+1) , a(k+1) ) via {1 } 1 (b(k+1) , a(k+1) ) = argmin ∥y − Xb∥2 + |b|T w(k) + D(w(k) , a) . [sent-203, score-0.054]
87 Namely, a(k+1) = w(k) and b (k+1) = argmin b {1 2 ∥y − Xb∥2 2 + p ∑ } (k) wj |bj | . [sent-206, score-0.092]
88 j=1 Recall that the LOG and EXP penalties are differentiable and strictly concave in |b| on [0, ∞). [sent-207, score-0.284]
89 In the second experiment, we apply our methods to regression problems on four datasets from UCI Machine Learning Repository and the cookie (Near-Infrared (NIR) Spectroscopy of Biscuit Doughs) dataset [7]. [sent-230, score-0.054]
90 We report the mean and standard deviation of the Root Mean Square Error (RMSE) and the model sparsity (proportion of zero coefficients in the model) in Tables 1 and 2. [sent-232, score-0.056]
91 We form four different datasets for the four responses (“fat”, “sucrose”, “dry flour” and “water”) in the experiment, and report the RMSE on the test set and the model sparsity in Table 3. [sent-234, score-0.079]
92 But the nonconvex LOG, EXP and MCP have strong ability in feature selection. [sent-236, score-0.35]
93 29 Conclusion In this paper we have introduced subordinators of L´ vy processes into the definition of nonconvex e penalties. [sent-360, score-0.875]
94 This leads us to a Bayesian nonparametric approach for constructing sparsity-inducing penalties. [sent-361, score-0.068]
95 Along this line, it would be interesting to investigate other penalty functions via subordinators and compare the performance of these penalties. [sent-363, score-0.4]
96 Feature selection via concave minimization and support vector machines. [sent-382, score-0.148]
97 Variable selection via nonconcave penalized likelihood and its Oracle properties. [sent-397, score-0.116]
98 Application of near-infrared reflectance spectroscopy to compositional analysis of biscuits and biscuit dough. [sent-405, score-0.087]
99 Group sparse coding with a Laplacian scale mixture prior. [sent-419, score-0.051]
100 Local shrinkage rules, l´ vy processes, and regularized regression. [sent-462, score-0.27]
wordName wordTfidf (topN-words)
[('nonconvex', 0.35), ('bj', 0.272), ('laplace', 0.268), ('subordinators', 0.266), ('subordinator', 0.242), ('vy', 0.219), ('mcp', 0.218), ('adlasso', 0.193), ('exp', 0.18), ('penalties', 0.164), ('compound', 0.154), ('nir', 0.149), ('aj', 0.143), ('penalty', 0.134), ('poisson', 0.127), ('kl', 0.126), ('xb', 0.123), ('spe', 0.121), ('concave', 0.12), ('tj', 0.115), ('gamma', 0.114), ('bernstein', 0.106), ('exponents', 0.105), ('fse', 0.085), ('conjugates', 0.085), ('rmse', 0.078), ('lasso', 0.076), ('du', 0.075), ('lla', 0.073), ('exponent', 0.072), ('dt', 0.07), ('snr', 0.07), ('log', 0.067), ('wj', 0.067), ('ind', 0.064), ('po', 0.063), ('improper', 0.061), ('sparsity', 0.056), ('zou', 0.053), ('shrinkage', 0.051), ('su', 0.05), ('biscuit', 0.048), ('pyrim', 0.048), ('sparsenet', 0.048), ('sucrose', 0.048), ('triazines', 0.048), ('tuning', 0.046), ('nonparametric', 0.045), ('penalized', 0.045), ('mixtures', 0.044), ('bayesian', 0.043), ('ga', 0.043), ('dry', 0.043), ('polson', 0.043), ('nonconcave', 0.043), ('scad', 0.043), ('processes', 0.04), ('sparsityinducing', 0.039), ('fat', 0.039), ('spectroscopy', 0.039), ('density', 0.039), ('bridge', 0.038), ('nonnegative', 0.038), ('caron', 0.037), ('abalone', 0.035), ('housing', 0.035), ('connection', 0.034), ('nondifferentiable', 0.034), ('palmer', 0.034), ('construction', 0.034), ('proper', 0.033), ('nondecreasing', 0.032), ('regression', 0.031), ('st', 0.03), ('water', 0.03), ('grif', 0.03), ('mixture', 0.029), ('logarithmic', 0.029), ('calculates', 0.029), ('selection', 0.028), ('moreover', 0.027), ('hyperparameters', 0.027), ('lemma', 0.027), ('latent', 0.026), ('wt', 0.026), ('argmin', 0.025), ('uk', 0.025), ('bp', 0.025), ('monotone', 0.025), ('plays', 0.024), ('china', 0.024), ('accordingly', 0.024), ('ap', 0.023), ('datasets', 0.023), ('constructing', 0.023), ('theorem', 0.023), ('sparse', 0.022), ('intensity', 0.022), ('lemmas', 0.022), ('sj', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates
Author: Zhihua Zhang, Bojun Tu
Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1
2 0.10465504 247 nips-2012-Nonparametric Reduced Rank Regression
Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty
Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1
3 0.098225035 59 nips-2012-Bayesian nonparametric models for bipartite graphs
Author: Francois Caron
Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks. 1
4 0.09674345 37 nips-2012-Affine Independent Variational Inference
Author: Edward Challis, David Barber
Abstract: We consider inference in a broad class of non-conjugate probabilistic models based on minimising the Kullback-Leibler divergence between the given target density and an approximating ‘variational’ density. In particular, for generalised linear models we describe approximating densities formed from an affine transformation of independently distributed latent variables, this class including many well known densities as special cases. We show how all relevant quantities can be efficiently computed using the fast Fourier transform. This extends the known class of tractable variational approximations and enables the fitting for example of skew variational densities to the target density. 1
5 0.096281417 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
Author: Yi Wu, David P. Wipf
Abstract: Sparse linear (or generalized linear) models combine a standard likelihood function with a sparse prior on the unknown coefficients. These priors can conveniently be expressed as a maximization over zero-mean Gaussians with different variance hyperparameters. Standard MAP estimation (Type I) involves maximizing over both the hyperparameters and coefficients, while an empirical Bayesian alternative (Type II) first marginalizes the coefficients and then maximizes over the hyperparameters, leading to a tractable posterior approximation. The underlying cost functions can be related via a dual-space framework from [22], which allows both the Type I or Type II objectives to be expressed in either coefficient or hyperparmeter space. This perspective is useful because some analyses or extensions are more conducive to development in one space or the other. Herein we consider the estimation of a trade-off parameter balancing sparsity and data fit. As this parameter is effectively a variance, natural estimators exist by assessing the problem in hyperparameter (variance) space, transitioning natural ideas from Type II to solve what is much less intuitive for Type I. In contrast, for analyses of update rules and sparsity properties of local and global solutions, as well as extensions to more general likelihood models, we can leverage coefficient-space techniques developed for Type I and apply them to Type II. For example, this allows us to prove that Type II-inspired techniques can be successful recovering sparse coefficients when unfavorable restricted isometry properties (RIP) lead to failure of popular ℓ1 reconstructions. It also facilitates the analysis of Type II when non-Gaussian likelihood models lead to intractable integrations. 1
6 0.094233342 300 nips-2012-Scalable nonconvex inexact proximal splitting
7 0.082730621 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
8 0.07834667 60 nips-2012-Bayesian nonparametric models for ranked data
9 0.073407792 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
10 0.072010532 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
11 0.071956068 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
12 0.070774212 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
13 0.069935106 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
14 0.062618554 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
15 0.062450312 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
16 0.057037055 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
17 0.056434441 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss
18 0.056375172 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
19 0.053381346 206 nips-2012-Majorization for CRFs and Latent Likelihoods
20 0.051782187 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
topicId topicWeight
[(0, 0.154), (1, 0.043), (2, 0.053), (3, 0.0), (4, -0.065), (5, 0.035), (6, 0.0), (7, -0.0), (8, 0.045), (9, -0.028), (10, -0.033), (11, 0.028), (12, 0.015), (13, -0.031), (14, 0.023), (15, -0.025), (16, 0.034), (17, -0.037), (18, -0.039), (19, -0.06), (20, 0.019), (21, 0.047), (22, 0.012), (23, 0.03), (24, -0.029), (25, 0.01), (26, 0.022), (27, -0.063), (28, -0.082), (29, -0.019), (30, -0.115), (31, -0.003), (32, 0.063), (33, 0.028), (34, 0.089), (35, 0.119), (36, 0.05), (37, -0.057), (38, 0.037), (39, 0.072), (40, -0.175), (41, 0.051), (42, 0.052), (43, 0.002), (44, -0.043), (45, 0.012), (46, -0.066), (47, -0.058), (48, 0.065), (49, -0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.92822665 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates
Author: Zhihua Zhang, Bojun Tu
Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1
2 0.71372575 59 nips-2012-Bayesian nonparametric models for bipartite graphs
Author: Francois Caron
Abstract: We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, a generative process for network growth, and a simple Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks. 1
3 0.64689606 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
Author: Mingyuan Zhou, Lawrence Carin
Abstract: By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters. 1
4 0.58991444 247 nips-2012-Nonparametric Reduced Rank Regression
Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty
Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1
5 0.53633684 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes
Author: Charles Blundell, Jeff Beck, Katherine A. Heller
Abstract: We present a Bayesian nonparametric model that discovers implicit social structure from interaction time-series data. Social groups are often formed implicitly, through actions among members of groups. Yet many models of social networks use explicitly declared relationships to infer social structure. We consider a particular class of Hawkes processes, a doubly stochastic point process, that is able to model reciprocity between groups of individuals. We then extend the Infinite Relational Model by using these reciprocating Hawkes processes to parameterise its edges, making events associated with edges co-dependent through time. Our model outperforms general, unstructured Hawkes processes as well as structured Poisson process-based models at predicting verbal and email turn-taking, and military conflicts among nations. 1
6 0.53008646 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
7 0.51091063 60 nips-2012-Bayesian nonparametric models for ranked data
8 0.49116457 319 nips-2012-Sparse Prediction with the $k$-Support Norm
9 0.47113189 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
10 0.46861789 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
11 0.46689981 294 nips-2012-Repulsive Mixtures
12 0.45281279 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes
13 0.44868237 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding
14 0.44257301 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
15 0.43283677 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
16 0.43087763 37 nips-2012-Affine Independent Variational Inference
17 0.42829159 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
18 0.42681211 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
19 0.40962803 206 nips-2012-Majorization for CRFs and Latent Likelihoods
20 0.40573016 359 nips-2012-Variational Inference for Crowdsourcing
topicId topicWeight
[(0, 0.036), (21, 0.035), (38, 0.102), (39, 0.012), (42, 0.029), (46, 0.325), (54, 0.023), (55, 0.015), (74, 0.041), (76, 0.171), (80, 0.089), (92, 0.033)]
simIndex simValue paperId paperTitle
1 0.80367213 256 nips-2012-On the connections between saliency and tracking
Author: Vijay Mahadevan, Nuno Vasconcelos
Abstract: A model connecting visual tracking and saliency has recently been proposed. This model is based on the saliency hypothesis for tracking which postulates that tracking is achieved by the top-down tuning, based on target features, of discriminant center-surround saliency mechanisms over time. In this work, we identify three main predictions that must hold if the hypothesis were true: 1) tracking reliability should be larger for salient than for non-salient targets, 2) tracking reliability should have a dependence on the defining variables of saliency, namely feature contrast and distractor heterogeneity, and must replicate the dependence of saliency on these variables, and 3) saliency and tracking can be implemented with common low level neural mechanisms. We confirm that the first two predictions hold by reporting results from a set of human behavior studies on the connection between saliency and tracking. We also show that the third prediction holds by constructing a common neurophysiologically plausible architecture that can computationally solve both saliency and tracking. This architecture is fully compliant with the standard physiological models of V1 and MT, and with what is known about attentional control in area LIP, while explaining the results of the human behavior experiments.
same-paper 2 0.75849164 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates
Author: Zhihua Zhang, Bojun Tu
Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1
3 0.73497623 183 nips-2012-Learning Partially Observable Models Using Temporally Abstract Decision Trees
Author: Erik Talvitie
Abstract: This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-based methods have been limited to a finite suffix of history). Experiments demonstrate that timeline trees can learn to make high quality predictions in complex, partially observable environments with high-dimensional observations (e.g. an arcade game). 1
4 0.73330563 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning
Author: Dongho Kim, Kee-eung Kim, Pascal Poupart
Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1
5 0.57271391 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
Author: Ben Calderhead, Mátyás A. Sustik
Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1
6 0.57177949 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
7 0.57137877 188 nips-2012-Learning from Distributions via Support Measure Machines
8 0.57108045 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
10 0.57029814 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
11 0.57001847 280 nips-2012-Proper losses for learning from partial labels
12 0.56982058 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
13 0.56974065 294 nips-2012-Repulsive Mixtures
14 0.56921643 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
15 0.56905007 279 nips-2012-Projection Retrieval for Classification
16 0.56895781 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
17 0.5689441 126 nips-2012-FastEx: Hash Clustering with Exponential Families
18 0.56889188 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data
19 0.56867361 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms
20 0.56761003 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses