nips nips2013 nips2013-147 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye
Abstract: Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1
Reference: text
sentIndex sentText sentNum sentScore
1 To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i. [sent-9, score-0.541]
2 , predictors that have 0 components in the solution vector. [sent-11, score-0.213]
3 Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. [sent-12, score-0.46]
4 By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. [sent-13, score-0.944]
5 In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. [sent-14, score-0.771]
6 Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. [sent-15, score-0.714]
7 To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. [sent-16, score-0.424]
8 We have evaluated our screening rule using many real data sets. [sent-17, score-0.389]
9 Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. [sent-18, score-1.117]
10 The idea of a screening test proposed by El Ghaoui et al. [sent-32, score-0.343]
11 [8] is to first identify inactive predictors that have 0 components in the solution and then remove them from the optimization. [sent-33, score-0.484]
12 In [8], the “SAFE” rule discards xi when |xT y| < λ − xi 2 y 2 λmax −λ (2) i λmax where λmax = maxi |xT y| is the largest parameter such that the solution is nontrivial. [sent-35, score-0.206]
13 [21] proposed a set of strong rules which were more effective in identifying inactive predictors. [sent-37, score-0.553]
14 However, it should be noted that the proposed i strong rules might mistakenly discard active predictors, i. [sent-39, score-0.333]
15 , predictors which have nonzero coefficients in the solution vector. [sent-41, score-0.213]
16 [26, 25] developed a set of screening tests based on the estimation of the optimal dual solution and they have shown that the SAFE rules are in fact a special case of the general sphere test. [sent-43, score-0.772]
17 In this paper, we develop new efficient and effective screening rules for the Lasso problem; our screening rules are exact in the sense that no active predictors will be discarded. [sent-44, score-1.449]
18 By transforming problem (1) to its dual form, our motivation is mainly based on three geometric observations in the dual space. [sent-45, score-0.266]
19 First, the active predictors belong to a subset of the active constraints on the optimal dual solution, which is a direct consequence of the KKT conditions. [sent-46, score-0.377]
20 Second, the optimal dual solution is in fact the projection of the scaled response vector onto the feasible set of the dual variables. [sent-47, score-0.408]
21 Third, because the feasible set of the dual variables is closed and convex, the projection is nonexpansive with respect to λ [2], which results in an effective estimation of its variation. [sent-48, score-0.3]
22 Moreover, based on the basic DPP rules, we propose the “Enhanced DPP” rules which are able to detect more inactive features than DPP. [sent-49, score-0.526]
23 We evaluate our screening rules on real data sets from many different applications. [sent-50, score-0.628]
24 The experimental results demonstrate that our rules are more effective in discarding inactive features than existing state-of-the-art screening rules. [sent-51, score-0.927]
25 2 Screening Rules for Lasso via Dual Polytope Projections In this section, we present the basics of the dual formulation of problem (1) including its geometric properties (Section 2. [sent-52, score-0.162]
26 2 (Theorem 2), which can be used to construct screening rules for Lasso. [sent-55, score-0.615]
27 Moreover, we present enhanced DPP rules in Section 2. [sent-60, score-0.317]
28 We first transform problem (1) to its dual form (to make the paper self-contained, we provide the detailed derivation of the dual form in the supplemental materials): sup θ 1 2 y 2 2 − λ2 2 θ− y 2 λ 2 : |xT θ| ≤ 1, i = 1, 2, . [sent-64, score-0.25]
29 From the objective function of the dual problem (3), it is easy to see that the optimal dual solution θ∗ is a feasible θ which is closest to y . [sent-69, score-0.304]
30 We know that the optimal primal and dual solutions satisfy: y = Xβ ∗ + λθ∗ and the KKT conditions for the Lasso problem (1) are sign([β ∗ ]i ) if [β ∗ ]i = 0 (θ∗ )T xi ∈ [−1, 1] if [β ∗ ]i = 0 where [·]k denotes the k th component. [sent-72, score-0.217]
31 As a result, xi is an inactive predictor and can be removed from the optimization. [sent-75, score-0.305]
32 On the other hand, let ∂H(xi ) = {z: zT xi = 1} and H(xi )− = {z: zT xi ≤ 1} be the hyperplane and half space determined by xi respectively. [sent-76, score-0.159]
33 Consider the dual problem (3); constraints induced by each xi are equivalent to requiring each feasible θ to lie inside the intersection of H(xi )− and H(−xi )− . [sent-77, score-0.225]
34 , the projection of y onto F , the predictors in the inactive set on θ∗ can be discarded λ from the optimization. [sent-92, score-0.527]
35 It is worthwhile to mention that inactive predictors, i. [sent-93, score-0.27]
36 , predictors that have 0 components in the solution, are not the same as predictors in the inactive set. [sent-95, score-0.632]
37 In fact, by the KKT conditions, predictors in the inactive set must be inactive predictors since they are guaranteed to have 0 components in the solution, but the converse may not be true. [sent-96, score-0.884]
38 2 Fundamental Screening Rules via Dual Polytope Projections Motivated by the above geometric intuitions, we next show how to find the predictors in the inactive set on θ∗ . [sent-98, score-0.458]
39 If we know exactly where θ∗ (λ) is, it will be trivial to find the predictors in the inactive set. [sent-100, score-0.465]
40 How can we estimate θ∗ (λ ) for another λ and its inactive set? [sent-103, score-0.252]
41 (8) Given θ∗ (λ ), the next theorem shows how to estimate θ∗ (λ ) and its inactive set for another parameter λ . [sent-112, score-0.27]
42 For the Lasso problem, assume we are given the solution of its dual problem θ∗ (λ ) for a specific λ . [sent-114, score-0.148]
43 By the i dual problem (3), θ∗ (λ) is the projection of y onto the feasible set F . [sent-120, score-0.219]
44 2 2 (θ∗ (λ ) − θ∗ (λ )) y 1 2 λ − 1 λ 2 + 1 − xi + 1 − xi 2 y 2 y 1 2 λ 1 2 λ 1 −λ − 1 λ =1 From theorem 2, it is easy to see our rule is quite flexible since every θ∗ (λ ) would result in a new screening rule. [sent-124, score-0.498]
45 And the smaller the gap between λ and λ , the more effective the screening rule is. [sent-125, score-0.404]
46 By “more effective”, we mean a stronger capability of identifying inactive predictors. [sent-126, score-0.267]
47 , all predictors are in the i inactive set at θ∗ (λ), we conclude that the solution to problem (1) is 0. [sent-135, score-0.465]
48 3 Table 1: Illustration of the running time for DPP screening and for solving the Lasso problem after screening. [sent-149, score-0.357]
49 Considering the DPP rules in i 1 Theorem 2, it is equivalent to setting q = θ∗ (λ ) and r = | λ − λ1 |. [sent-183, score-0.256]
50 Therefore, different from the sphere test and Dome developed in [26, 25] with the radius r fixed at the beginning, the construction of our DPP rules is equivalent to an “r” decreasing process. [sent-184, score-0.281]
51 Clearly, the smaller r is, the more effective the DPP rules will be. [sent-185, score-0.286]
52 In Table 1, we report both the time used for screening and the time needed to solve the Lasso problem after screening. [sent-208, score-0.357]
53 From Table 1, we can see that compared with the time saved by the screening rules, the time used for screening is negligible. [sent-210, score-0.702]
54 In practice, DPP rules built on the first few θ∗ (λ(i) )’s lead to more significant performance improvement than existing state-of-art screening tests. [sent-212, score-0.599]
55 We will demonstrate the effectiveness of our DPP rules in the experiment section. [sent-213, score-0.256]
56 Hence we can remove the predictors in the inactive set from the optimization problem (1). [sent-222, score-0.442]
57 , [21], [8], solve several Lasso problems for different parameters to improve the screening performance. [sent-226, score-0.343]
58 Thus, different from [21], [8] who need to iteratively compute a screening step and a Lasso step, DPP algorithms only compute one screening step and one Lasso step. [sent-229, score-0.686]
59 > λm , we can first apply DPP to discard inactive predictors for the Lasso problem (1) with parameter being λ1 . [sent-235, score-0.459]
60 According to Theorem 2, once we know the optimal dual solution θ∗ (λ1 ), we can construct a new screening rule to identify inactive predictors for problem (1) with λ = λ2 . [sent-239, score-1.022]
61 Remark: There are some other related works on screening rules, e. [sent-247, score-0.343]
62 [24] built screening rules for 1 penalized logistic regression based on the inner products between the response vector and each predictor; Tibshirani et al. [sent-250, score-0.64]
63 [21] developed strong rules for a set of Lasso-type problems via the inner products between the residual and predictors; in [9], Fan and Lv studied screening rules for Lasso and related problems. [sent-251, score-0.855]
64 But all of the above works may mistakenly discard predictors that have non-zero coefficients in the solution. [sent-252, score-0.236]
65 Similar to [8, 26, 25], our DPP rules are exact in the sense that the predictors discarded by our rules are inactive predictors, i. [sent-253, score-0.976]
66 From the inequality in (9), we can see that the larger the right hand side is, the more inactive features can be detected. [sent-258, score-0.312]
67 For the Lasso problem, assume we are given the solution of its dual problem θ∗ (λ ) for a specific λ . [sent-278, score-0.148]
68 Thus, the enhanced DPP is able to detect more inactive features than DPP. [sent-282, score-0.331]
69 3 Extensions to Group Lasso To demonstrate the flexibility of DPP rules, we extend our idea to the group Lasso problem [27]: inf 1 β∈ p 2 G y− g=1 Xg βg 2 2 +λ G g=1 √ ng βg 2, (16) G where Xg ∈ N ×ng is the data matrix for the gth group and p = g=1 ng . [sent-295, score-0.217]
70 The corresponding dual problem of (16) is (see detailed derivation in the supplemental materials): √ y 2 2 1 λ2 sup XT θ 2 ≤ ng , g = 1, 2, . [sent-296, score-0.176]
71 , G (17) g 2 y 2− 2 θ− λ 2 : θ Similar to the Lasso problem, the primal and dual optimal solutions of the group Lasso satisfy: G y= and the KKT conditions are: g=1 √ (θ∗ )T Xg ∈ √ ng ∗ Xg βg + λθ∗ ∗ βg ∗ βg 2 ∗ if βg = 0 ∗ ng u, u 2 ≤ 1 if βg = 0 √ ∗ T ∗ for g = 1, 2, . [sent-299, score-0.293]
72 It is easy to see that the dual optimal θ∗ is the projection of y onto the λ √ feasible set. [sent-305, score-0.219]
73 Therefore, the feasible set of the dual problem (17) is the intersection of ellipsoids and thus closed and convex. [sent-307, score-0.196]
74 Hence θ∗ (λ) is also nonexpansive for the group lasso problem. [sent-308, score-0.339]
75 For the group Lasso problem, assume we are given the solution of its dual problem ∗ θ∗ (λ ) for a specific λ . [sent-311, score-0.198]
76 1, we first evaluate the DPP and DPP∗ rules on both real and synthetic data. [sent-333, score-0.29]
77 We then compare the performance of DPP with Dome (see [25, 26]) which achieves state-of-art performance for the Lasso problem among exact screening rules [25]. [sent-334, score-0.599]
78 We are not aware of any “exact” screening rules for the group Lasso problem at this point. [sent-337, score-0.649]
79 6 (a) MNIST-DPP2/DPP∗ 2 (b) MNIST-DPP5/DPP∗ 5 (c) COIL-DPP2/DPP∗ 2 (d) COIL-DPP5/DPP∗ 5 Figure 1: Comparison of DPP and DPP∗ rules on the MNIST and COIL data sets. [sent-338, score-0.271]
80 To measure the performance of our screening rules, we compute the rejection rate, i. [sent-339, score-0.37]
81 , the ratio between the number of predictors discarded by screening rules and the actual number of zero predictors in the ground truth. [sent-341, score-1.001]
82 , no active predictors will be mistakenly discarded, the rejection rate will be less than one. [sent-344, score-0.277]
83 Similarly to previous works [26], we do not report the computational time saved by screening because it can be easily computed from the rejection ratio. [sent-346, score-0.4]
84 For each data set, after we construct the data matrix X and the response y, we run the screening rules along a sequence of 100 values equally spaced on the λ/λmax scale from 0 to 1. [sent-349, score-0.7]
85 All of the screening rules are implemented in Matlab. [sent-351, score-0.599]
86 1 DPPs and DPP∗ s for the Lasso Problem In this experiment, we first compare the performance of the proposed DPP rules with the enhanced DPP rules (DPP∗ ) on (a) the MNIST handwritten digit data set [13]; (b) the COIL rotational image data set [16] in Section 4. [sent-355, score-0.603]
87 We show that the DPP∗ rules are more effective in identifying inactive features than the DPP rules. [sent-358, score-0.571]
88 Then we evaluate the DPP∗ /SDPP∗ rules and Dome on (c) the ADNI data set; (d) the Olivetti Faces data set [19]; (e) Yahoo web pages data sets [22] and (f) a synthetic data set whose entries are i. [sent-361, score-0.364]
89 5, all inactive feature detected by the DPP rules can also be detected by the DPP∗ rules. [sent-368, score-0.508]
90 Clearly, the predictors in the data sets are high correlated. [sent-382, score-0.219]
91 2 Comparison of DPP∗ /SDPP∗ and Dome In this experiment, we compare DPP∗ /SDPP∗ rules with Dome. [sent-388, score-0.256]
92 We only report the performance of DPP∗ 5 and DPP∗ 10 among the family of DPP∗ rules on the following four data sets. [sent-389, score-0.285]
93 The predictors in ADNI, Yahoo-Computers and Olivetti data sets are highly correlated as indicated by the average λmax . [sent-415, score-0.219]
94 As noted in [26, 25], Dome is very effective in discarding inactive features when λmax is large. [sent-417, score-0.328]
95 However, the proposed rules are able to identify far more inactive features than Dome on both real and synthetic data, even for the cases in which λmax is small. [sent-420, score-0.579]
96 5 Conclusion In this paper, we develop new screening rules for the Lasso problem by making use of the nonexpansiveness of the projection operator with respect to a closed convex set. [sent-437, score-0.704]
97 , DPP rules, are able to effectively identify inactive predictors of the Lasso problem, thus greatly reducing the size of the optimization problem. [sent-440, score-0.461]
98 Moreover, we further improve DPP rules and propose the enhanced DPP rules, that is, the DPP∗ rules, which are even more effective in discarding inactive predictors than DPP rules. [sent-441, score-0.817]
99 The idea of DPP and DPP∗ rules can be easily generalized to screen the inactive groups of the group Lasso problem. [sent-442, score-0.577]
100 Moreover, DPP and DPP∗ rules can be combined with any Lasso solver as a speedup tool. [sent-444, score-0.274]
wordName wordTfidf (topN-words)
[('dpp', 0.704), ('screening', 0.343), ('rules', 0.256), ('inactive', 0.252), ('lasso', 0.241), ('predictors', 0.19), ('dual', 0.125), ('dome', 0.106), ('sdpp', 0.095), ('xg', 0.086), ('xt', 0.084), ('gdpp', 0.083), ('sgdpp', 0.083), ('breakpoints', 0.071), ('max', 0.066), ('adni', 0.063), ('enhanced', 0.061), ('kkt', 0.06), ('olivetti', 0.059), ('coil', 0.058), ('xi', 0.053), ('ng', 0.051), ('pf', 0.05), ('group', 0.05), ('nonexpansive', 0.048), ('projection', 0.042), ('response', 0.041), ('safe', 0.039), ('xiang', 0.034), ('synthetic', 0.034), ('lars', 0.032), ('active', 0.031), ('polytope', 0.031), ('feasible', 0.031), ('rule', 0.031), ('effective', 0.03), ('mistakenly', 0.029), ('homotopy', 0.029), ('inequality', 0.028), ('discarding', 0.028), ('rejection', 0.027), ('please', 0.026), ('images', 0.026), ('sphere', 0.025), ('hastie', 0.025), ('discards', 0.025), ('mnist', 0.025), ('closed', 0.024), ('gdpps', 0.024), ('nonexpansiveness', 0.024), ('slep', 0.024), ('solution', 0.023), ('know', 0.023), ('discarded', 0.022), ('maxi', 0.021), ('corollary', 0.021), ('onto', 0.021), ('ghaoui', 0.021), ('basics', 0.021), ('donoho', 0.02), ('yahoo', 0.02), ('disease', 0.02), ('royal', 0.019), ('arizona', 0.019), ('alzheimer', 0.019), ('maxg', 0.019), ('identify', 0.019), ('groups', 0.019), ('supplement', 0.018), ('theorem', 0.018), ('worthwhile', 0.018), ('solver', 0.018), ('remark', 0.018), ('features', 0.018), ('pc', 0.017), ('el', 0.017), ('tibshirani', 0.017), ('discard', 0.017), ('tl', 0.016), ('saved', 0.016), ('geometric', 0.016), ('construct', 0.016), ('friedman', 0.016), ('intersection', 0.016), ('primal', 0.016), ('convex', 0.015), ('grey', 0.015), ('identifying', 0.015), ('path', 0.015), ('sparse', 0.015), ('regularization', 0.015), ('data', 0.015), ('side', 0.014), ('spaced', 0.014), ('patients', 0.014), ('clearly', 0.014), ('sets', 0.014), ('report', 0.014), ('solving', 0.014), ('society', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye
Abstract: Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1
2 0.4932549 40 nips-2013-Approximate Inference in Continuous Determinantal Processes
Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar
Abstract: Determinantal point processes (DPPs) are random point processes well-suited for modeling repulsion. In machine learning, the focus of DPP-based models has been on diverse subset selection from a discrete and finite base set. This discrete setting admits an efficient sampling algorithm based on the eigendecomposition of the defining kernel matrix. Recently, there has been growing interest in using DPPs defined on continuous spaces. While the discrete-DPP sampler extends formally to the continuous case, computationally, the steps required are not tractable in general. In this paper, we present two efficient DPP sampling schemes that apply to a wide range of kernel functions: one based on low rank approximations via Nystr¨ m o and random Fourier feature techniques and another based on Gibbs sampling. We demonstrate the utility of continuous DPPs in repulsive mixture modeling and synthesizing human poses spanning activity spaces. 1
3 0.32115293 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
Author: Byungkon Kang
Abstract: Determinantal Point Process (DPP) has gained much popularity for modeling sets of diverse items. The gist of DPP is that the probability of choosing a particular set of items is proportional to the determinant of a positive definite matrix that defines the similarity of those items. However, computing the determinant requires time cubic in the number of items, and is hence impractical for large sets. In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time. In addition, we show that this framework can be extended to sampling from cardinalityconstrained DPPs. As an application, we show how our sampling algorithm can be used to provide a fast heuristic for determining the number of clusters, resulting in better clustering.
4 0.18083927 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
Author: Jasper Snoek, Richard Zemel, Ryan P. Adams
Abstract: Point processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data. 1
5 0.13555217 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
6 0.11734093 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
7 0.10988762 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
8 0.10323607 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
9 0.094886228 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
11 0.082920931 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
12 0.055512898 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
13 0.053814977 230 nips-2013-Online Learning with Costly Features and Labels
14 0.052859511 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
15 0.051691238 215 nips-2013-On Decomposing the Proximal Map
16 0.047760967 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
17 0.047608513 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
18 0.044712726 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
19 0.044337586 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
20 0.043619856 268 nips-2013-Reflection methods for user-friendly submodular optimization
topicId topicWeight
[(0, 0.126), (1, 0.053), (2, 0.03), (3, 0.01), (4, -0.063), (5, 0.091), (6, -0.014), (7, 0.014), (8, -0.105), (9, -0.003), (10, 0.011), (11, -0.099), (12, -0.119), (13, -0.302), (14, 0.076), (15, 0.022), (16, 0.007), (17, -0.329), (18, 0.515), (19, 0.053), (20, 0.11), (21, -0.171), (22, -0.017), (23, -0.004), (24, 0.052), (25, -0.013), (26, 0.062), (27, 0.048), (28, -0.038), (29, 0.012), (30, 0.039), (31, 0.006), (32, -0.077), (33, -0.001), (34, 0.016), (35, 0.093), (36, -0.018), (37, 0.011), (38, -0.004), (39, 0.013), (40, -0.028), (41, -0.016), (42, -0.029), (43, -0.039), (44, 0.011), (45, 0.038), (46, 0.007), (47, 0.016), (48, 0.011), (49, -0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.93548125 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye
Abstract: Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1
2 0.72172874 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
Author: Byungkon Kang
Abstract: Determinantal Point Process (DPP) has gained much popularity for modeling sets of diverse items. The gist of DPP is that the probability of choosing a particular set of items is proportional to the determinant of a positive definite matrix that defines the similarity of those items. However, computing the determinant requires time cubic in the number of items, and is hence impractical for large sets. In this paper, we address this problem by constructing a rapidly mixing Markov chain, from which we can acquire a sample from the given DPP in sub-cubic time. In addition, we show that this framework can be extended to sampling from cardinalityconstrained DPPs. As an application, we show how our sampling algorithm can be used to provide a fast heuristic for determining the number of clusters, resulting in better clustering.
3 0.71660215 40 nips-2013-Approximate Inference in Continuous Determinantal Processes
Author: Raja Hafiz Affandi, Emily Fox, Ben Taskar
Abstract: Determinantal point processes (DPPs) are random point processes well-suited for modeling repulsion. In machine learning, the focus of DPP-based models has been on diverse subset selection from a discrete and finite base set. This discrete setting admits an efficient sampling algorithm based on the eigendecomposition of the defining kernel matrix. Recently, there has been growing interest in using DPPs defined on continuous spaces. While the discrete-DPP sampler extends formally to the continuous case, computationally, the steps required are not tractable in general. In this paper, we present two efficient DPP sampling schemes that apply to a wide range of kernel functions: one based on low rank approximations via Nystr¨ m o and random Fourier feature techniques and another based on Gibbs sampling. We demonstrate the utility of continuous DPPs in repulsive mixture modeling and synthesizing human poses spanning activity spaces. 1
4 0.34742716 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
Author: Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers
Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1
5 0.3429679 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
6 0.32954961 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
7 0.30794886 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
8 0.28334516 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
9 0.26631474 109 nips-2013-Estimating LASSO Risk and Noise Level
10 0.25640315 76 nips-2013-Correlated random features for fast semi-supervised learning
11 0.25446403 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
12 0.2132898 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
13 0.21165198 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
14 0.20894572 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
15 0.20418178 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
16 0.17880245 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
17 0.17457215 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
18 0.16660699 249 nips-2013-Polar Operators for Structured Sparse Estimation
19 0.15695263 198 nips-2013-More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
20 0.15321411 111 nips-2013-Estimation, Optimization, and Parallelism when Data is Sparse
topicId topicWeight
[(8, 0.32), (16, 0.029), (33, 0.124), (34, 0.097), (41, 0.025), (49, 0.022), (56, 0.094), (70, 0.021), (85, 0.032), (89, 0.055), (93, 0.031), (95, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.73248273 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye
Abstract: Lasso is a widely used regression technique to find sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efficiency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efficient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1
2 0.62907797 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
Author: Jeffrey W. Miller, Matthew T. Harrison
Abstract: For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the true number of components. In this note, we give an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a DPM with normal components of unit variance, applied to data from a “mixture” with one standard normal component. Further, we show that this example exhibits severe inconsistency: instead of going to 1, the posterior probability that there is one cluster converges (in probability) to 0. 1
3 0.52688164 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
4 0.52411562 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack
Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1
5 0.52268678 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
Author: Yu-Xiang Wang, Huan Xu, Chenlei Leng
Abstract: Sparse Subspace Clustering (SSC) and Low-Rank Representation (LRR) are both considered as the state-of-the-art methods for subspace clustering. The two methods are fundamentally similar in that both are convex optimizations exploiting the intuition of “Self-Expressiveness”. The main difference is that SSC minimizes the vector 1 norm of the representation matrix to induce sparsity while LRR minimizes nuclear norm (aka trace norm) to promote a low-rank structure. Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds. The results reveal interesting insights into the strength and weakness of SSC and LRR and demonstrate how LRSSC can take the advantages of both methods in preserving the “Self-Expressiveness Property” and “Graph Connectivity” at the same time. 1
6 0.52163637 173 nips-2013-Least Informative Dimensions
7 0.52066475 350 nips-2013-Wavelets on Graphs via Deep Learning
8 0.52043647 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
9 0.52019775 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
10 0.52017289 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
11 0.52001083 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
12 0.5196358 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
13 0.5195092 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
14 0.51943707 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
15 0.51923758 233 nips-2013-Online Robust PCA via Stochastic Optimization
16 0.51865673 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
17 0.51860195 318 nips-2013-Structured Learning via Logistic Regression
18 0.51856673 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
19 0.51855117 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
20 0.51830506 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models