nips nips2011 nips2011-108 knowledge-graph by maker-knowledge-mining

108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

Source: pdf

Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon

Abstract: A hallmark of modern machine learning is its ability to deal with high dimensional problems by exploiting structural assumptions that limit the degrees of freedom in the underlying model. A deep understanding of the capabilities and limits of high dimensional learning methods under specific assumptions such as sparsity, group sparsity, and low rank has been attsined. Efforts [1,2] are now underway to distill this valuable experience by proposing general unified frameworks that can achieve the twio goals of summarizing previous analyses and enabling their application to notions of structure hitherto unexplored. Inspired by these developments, we propose and analyze a general computational scheme based on a greedy strategy to solve convex optimization problems that arise when dealing with structurally constrained high-dimensional problems. Our framework not only unifies existing greedy algorithms by recovering them as special cases but also yields novel ones. Finally, we extend our results to infinite dimensional settings by using interesting connections between smoothness of norms and behavior of martingales in Banach spaces.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract A hallmark of modern machine learning is its ability to deal with high dimensional problems by exploiting structural assumptions that limit the degrees of freedom in the underlying model. [sent-8, score-0.271]

2 A deep understanding of the capabilities and limits of high dimensional learning methods under specific assumptions such as sparsity, group sparsity, and low rank has been attsined. [sent-9, score-0.478]

3 Efforts [1,2] are now underway to distill this valuable experience by proposing general unified frameworks that can achieve the twio goals of summarizing previous analyses and enabling their application to notions of structure hitherto unexplored. [sent-10, score-0.543]

4 Inspired by these developments, we propose and analyze a general computational scheme based on a greedy strategy to solve convex optimization problems that arise when dealing with structurally constrained high-dimensional problems. [sent-11, score-0.38]

5 Our framework not only unifies existing greedy algorithms by recovering them as special cases but also yields novel ones. [sent-12, score-0.086]

6 Finally, we extend our results to infinite dimensional settings by using interesting connections between smoothness of norms and behavior of martingales in Banach spaces. [sent-13, score-0.471]

7 1 Introduction Increasingly in modern settings, in domains across science and engineering, one is faced with the challenge of working with high-dimensional models where the number of parameters is large, particularly when compared to the number of observations. [sent-14, score-0.101]

8 In such high-dimensional regimes, a growing body of literature in machine learning and statistics has shown that it is typically iropossible to obtain consistent estimators unless some low-dimensional "structure" is imposed on the high dimensional object that is being estimated from the data. [sent-15, score-0.242]

9 For instance, the sigoal could be sparse in some basis, could lie on some manifold, have some graphical model structure, or be matrix-structured with a low rank. [sent-16, score-0.085]

10 Indeed, given the variety of high dimensional problems that researchers face, it is natural that many novel notions of such low-dimensional structure will continue to appear in the future. [sent-17, score-0.322]

11 There are a variety of issues that researchers have grappled with in this area but two themes stand out. [sent-18, score-0.23]

12 First, there is the statistical problem of identifyiog the miniroum amount of data needed to accurately estimate high- <': a : x E tCA)} . [sent-19, score-0.039]

13 IIA is not a norm in general, unless for instance A satisfies a technical condition, namely that it be centrally symmetric: x E A iff - x E A. [sent-21, score-0.547]

14 Also, define the support function, IlxiiA := sup{ (x, a) : a E A}. [sent-22, score-0.058]

15 IIA happens to be a norm, then this is just the dual norm of I . [sent-24, score-0.137]

16 (Sparse vectors) A huge amount of recent literature deals with the notion of sparsity of high-dimensional vectors. [sent-28, score-0.202]

17 Here, the set A c RP of atoms is finite and consists of the 2p vectors 卤ei. [sent-29, score-0.375]

18 This is a centrally symmetric set and hence II路 IIA becomes a norm, viz. [sent-30, score-0.359]

19 (Sparse non-negative vectors) Using a slight variation on the previous example, the atoms can be the p non-negative basis vectors ei. [sent-33, score-0.359]

20 The convex hull CA is the (p - I)-dimensional probability simplex. [sent-34, score-0.322]

21 (Group sparse matrices) Here the structure we have in mind for a p x k matrix is that it only has a few non-zero rows. [sent-38, score-0.087]

22 This generalizes Example I which can be thought of as the case when k = 1. [sent-39, score-0.066]

23 There are an infinite number of atoms: all matrices with a single non-zero row where that row has i. [sent-40, score-0.473]

24 The convex hull CA becomes the unit ball of the I . [sent-42, score-0.411]

25 ,1 group normonRPxk that is defined to be the sum ofthel. [sent-44, score-0.122]

26 (Low rank matrices) This is another example that has attracted a huge amount of attention in the recent literature. [sent-47, score-0.21]

27 The set I A E RPxp of atoms here is infinite and consists of rankone matrices with Frobenius norm 1. [sent-48, score-0.761]

28 This is centrally symmetric and II路IIA becomes the trace norm (also called the nucleaTor Schatten-I norm, it is equal to the sum of the singular values of a matrix). [sent-49, score-0.496]

29 (Low rank tensors) This is a generalization of the previous example to higher order tensors. [sent-51, score-0.06]

30 Considering order three tensors, the set A of atoms can be taken to be all rank-one tensors of the form U1 <8l U2 <8l Ua E Rpxpxp for Ui E RP, 1111;112 = 1. [sent-52, score-0.446]

31 IIA which can thought of as the tensor nuclear norm. [sent-54, score-0.233]

32 Unfortunately, the tensor nuclear norm is intractable to compute and hence there is a need to consider relaxations to retain tractability. [sent-55, score-0.39]

33 (Permutation matrices) Here, we consider permutation matrices2 of size p x p as the set A of atoms. [sent-57, score-0.112]

34 of them, their convex hull has a succinct description thanks to the BiTklwff-von Newnann theorem: the convex hull of permutation matrices is the set of doubly stochastic matrices. [sent-59, score-1.059]

35 As we shall see later, this fact will be crucial for the greedy algorithm to be tractable in this case. [sent-60, score-0.121]

36 Setup We consider the general optimization problem min x: IIxIlA~1t f(x), (2) where f is a convex and smooth function, and {x: IlxiiA ::; ,,} is the atomic norm constraint set that encourages some specific structure. [sent-62, score-0.482]

37 This is a convex optimiwtion problem that is a constrained version of the usual regularized problem, minx f(x) + I'llxIlA. [sent-63, score-0.192]

38 A line of recent work (see, for example, [2], and the references therein) has focused on different cases, with different atomic norms, i For simplicity we consider square matrices. [sent-64, score-0.086]

39 It is definitely also possible to consider rectangular matrices in lRP1 XP2 for PI =f:. [sent-65, score-0.151]

40 P2 2A pennutation matrix is one consisting only of O's & l's such that there is exactly a single 1 in each row & column. [sent-66, score-0.064]

41 A non-negative matrix with every row & column sum equal to 1 is called a doubly stochastic matrix. [sent-67, score-0.171]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('iia', 0.488), ('atoms', 0.279), ('centrally', 0.246), ('infinite', 0.244), ('hull', 0.23), ('tensors', 0.167), ('ilxiia', 0.163), ('norm', 0.137), ('austin', 0.131), ('specific', 0.131), ('texas', 0.128), ('permutation', 0.112), ('structurally', 0.112), ('doubly', 0.107), ('ambuj', 0.107), ('dimensional', 0.106), ('inderjit', 0.103), ('matrices', 0.101), ('convex', 0.092), ('nuclear', 0.09), ('ii', 0.088), ('greedy', 0.086), ('atomic', 0.086), ('symmetric', 0.08), ('tensor', 0.077), ('notions', 0.077), ('tca', 0.071), ('ua', 0.071), ('hallmark', 0.071), ('themes', 0.071), ('underway', 0.071), ('unified', 0.071), ('huge', 0.066), ('thought', 0.066), ('gauge', 0.066), ('defined', 0.066), ('pradeepr', 0.066), ('satisfies', 0.066), ('norms', 0.065), ('row', 0.064), ('rp', 0.062), ('researchers', 0.061), ('rank', 0.06), ('define', 0.058), ('sparsity', 0.058), ('ball', 0.056), ('modern', 0.056), ('group', 0.056), ('stand', 0.056), ('succinct', 0.056), ('martingales', 0.056), ('pradeep', 0.056), ('unless', 0.055), ('constrained', 0.054), ('dhillon', 0.052), ('ca', 0.05), ('rectangular', 0.05), ('finite', 0.049), ('summarizing', 0.049), ('banach', 0.049), ('vectors', 0.047), ('capabilities', 0.047), ('mind', 0.047), ('enabling', 0.047), ('minx', 0.046), ('faced', 0.045), ('attracted', 0.045), ('low', 0.045), ('proposing', 0.044), ('therein', 0.044), ('retain', 0.043), ('frameworks', 0.043), ('developments', 0.043), ('iff', 0.043), ('relaxations', 0.043), ('variety', 0.042), ('growing', 0.042), ('ravikumar', 0.04), ('efforts', 0.04), ('sparse', 0.04), ('regimes', 0.039), ('thanks', 0.039), ('deals', 0.039), ('amount', 0.039), ('imposed', 0.039), ('freedom', 0.038), ('goals', 0.038), ('rn', 0.037), ('encourages', 0.036), ('dealing', 0.036), ('continue', 0.036), ('shall', 0.035), ('experience', 0.035), ('analyses', 0.034), ('valuable', 0.034), ('slight', 0.033), ('becomes', 0.033), ('increasingly', 0.033), ('manifold', 0.033), ('limits', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon

2 0.12901311 270 nips-2011-Statistical Performance of Convex Tensor Decomposition

Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima

Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.

3 0.12120905 202 nips-2011-On the Universality of Online Mirror Descent

Author: Nati Srebro, Karthik Sridharan, Ambuj Tewari

Abstract: We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee. 1

4 0.10016108 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

Author: Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari

Abstract: Increasingly, optimization problems in machine learning, especially those arising from bigh-dimensional statistical estimation, bave a large number of variables. Modem statistical estimators developed over the past decade have statistical or sample complexity that depends only weakly on the number of parameters when there is some structore to the problem, such as sparsity. A central question is whether similar advances can be made in their computational complexity as well. In this paper, we propose strategies that indicate that such advances can indeed be made. In particular, we investigate the greedy coordinate descent algorithm, and note that performing the greedy step efficiently weakens the costly dependence on the problem size provided the solution is sparse. We then propose a snite of methods that perform these greedy steps efficiently by a reduction to nearest neighbor search. We also devise a more amenable form of greedy descent for composite non-smooth objectives; as well as several approximate variants of such greedy descent. We develop a practical implementation of our algorithm that combines greedy coordinate descent with locality sensitive hashing. Without tuning the latter data structore, we are not only able to significantly speed up the vanilla greedy method, hot also outperform cyclic descent when the problem size becomes large. Our resnlts indicate the effectiveness of our nearest neighbor strategies, and also point to many open questions regarding the development of computational geometric techniques tailored towards first-order optimization methods.

5 0.081980146 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs

Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach

Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1

6 0.081931733 259 nips-2011-Sparse Estimation with Structured Dictionaries

7 0.080713741 211 nips-2011-Penalty Decomposition Methods for Rank Minimization

8 0.080463603 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

9 0.074303389 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

10 0.073246121 102 nips-2011-Generalised Coupled Tensor Factorisation

11 0.072941646 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

12 0.061146867 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

13 0.060456108 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

14 0.059385858 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

15 0.058540706 258 nips-2011-Sparse Bayesian Multi-Task Learning

16 0.056613054 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

17 0.053057909 260 nips-2011-Sparse Features for PCA-Like Linear Regression

18 0.045957018 165 nips-2011-Matrix Completion for Multi-label Image Classification

19 0.043967381 276 nips-2011-Structured sparse coding via lateral inhibition

20 0.043865371 251 nips-2011-Shaping Level Sets with Submodular Functions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.127), (1, 0.002), (2, -0.046), (3, -0.111), (4, -0.072), (5, 0.061), (6, 0.028), (7, 0.077), (8, 0.039), (9, 0.031), (10, 0.062), (11, -0.016), (12, -0.074), (13, 0.031), (14, 0.043), (15, -0.112), (16, -0.011), (17, 0.056), (18, -0.027), (19, 0.009), (20, -0.06), (21, -0.052), (22, -0.082), (23, -0.026), (24, 0.005), (25, 0.051), (26, -0.006), (27, -0.072), (28, -0.069), (29, -0.06), (30, 0.02), (31, -0.085), (32, -0.061), (33, 0.113), (34, 0.056), (35, -0.109), (36, -0.005), (37, 0.001), (38, -0.08), (39, 0.144), (40, 0.01), (41, -0.084), (42, -0.023), (43, 0.097), (44, -0.053), (45, 0.022), (46, -0.036), (47, 0.054), (48, -0.094), (49, -0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93905747 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon

2 0.67658406 270 nips-2011-Statistical Performance of Convex Tensor Decomposition

Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima

3 0.55654812 211 nips-2011-Penalty Decomposition Methods for Rank Minimization

Author: Yong Zhang, Zhaosong Lu

Abstract: In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We ﬁrst show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. 1

4 0.53481346 202 nips-2011-On the Universality of Online Mirror Descent

Author: Nati Srebro, Karthik Sridharan, Ambuj Tewari

Abstract: We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee. 1

5 0.48650223 102 nips-2011-Generalised Coupled Tensor Factorisation

Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli

Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1

6 0.48645079 259 nips-2011-Sparse Estimation with Structured Dictionaries

7 0.4580135 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

8 0.45430869 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

9 0.43502635 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

10 0.4146122 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

11 0.40556431 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

12 0.39635378 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

13 0.39328369 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data

14 0.38645685 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

15 0.37594748 5 nips-2011-A Denoising View of Matrix Completion

16 0.37264919 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

17 0.37026602 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs

18 0.36811993 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

19 0.3661103 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

20 0.35682994 276 nips-2011-Structured sparse coding via lateral inhibition

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.066), (4, 0.045), (20, 0.032), (22, 0.318), (26, 0.042), (31, 0.037), (43, 0.078), (45, 0.117), (57, 0.021), (74, 0.055), (83, 0.04), (84, 0.011), (99, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75847787 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon

2 0.58340871 163 nips-2011-MAP Inference for Bayesian Inverse Reinforcement Learning

Author: Jaedeug Choi, Kee-eung Kim

Abstract: The difﬁculty in inverse reinforcement learning (IRL) arises in choosing the best reward function since there are typically an inﬁnite number of reward functions that yield the given behaviour data as optimal. Using a Bayesian framework, we address this challenge by using the maximum a posteriori (MAP) estimation for the reward function, and show that most of the previous IRL algorithms can be modeled into our framework. We also present a gradient method for the MAP estimation based on the (sub)differentiability of the posterior distribution. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms. 1

3 0.50052708 29 nips-2011-Algorithms and hardness results for parallel large margin learning

Author: Phil Long, Rocco Servedio

Abstract: We study the fundamental problem of learning an unknown large-margin halfspace in the context of parallel computation. Our main positive result is a parallel algorithm for learning a large-margin halfspace that is based on interior point methods from convex optimization and fast parallel algorithms for matrix computations. We show that this algorithm learns an unknown γ-margin halfspace over n dimensions using poly(n, 1/γ) processors ˜ and runs in time O(1/γ) + O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have Ω(1/γ 2 ) runtime dependence on γ. Our main negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We give an information-theoretic proof that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized: the ability to call the weak learner multiple times in parallel within a single boosting stage does not reduce the overall number of successive stages of boosting that are required. 1

4 0.49758464 186 nips-2011-Noise Thresholds for Spectral Clustering

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

5 0.49707091 265 nips-2011-Sparse recovery by thresholded non-negative least squares

Author: Martin Slawski, Matthias Hein

Abstract: Non-negative data are commonly encountered in numerous ﬁelds, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike ﬁrst intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overﬁtting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our ﬁndings suggest that NNLS may be the method of choice. 1

6 0.49639368 80 nips-2011-Efficient Online Learning via Randomized Rounding

7 0.49473253 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

8 0.49432394 202 nips-2011-On the Universality of Online Mirror Descent

9 0.49306443 22 nips-2011-Active Ranking using Pairwise Comparisons

10 0.49301252 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

11 0.49108207 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

12 0.49071488 251 nips-2011-Shaping Level Sets with Submodular Functions

13 0.48605046 294 nips-2011-Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning

14 0.48532993 187 nips-2011-Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

15 0.48414099 198 nips-2011-On U-processes and clustering performance

16 0.48413649 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample

17 0.48252928 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

18 0.48242223 199 nips-2011-On fast approximate submodular minimization

19 0.48183209 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

20 0.4816277 239 nips-2011-Robust Lasso with missing and grossly corrupted observations