nips nips2010 nips2010-219 knowledge-graph by maker-knowledge-mining

219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Source: pdf

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semideﬁnite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 As a result, semideﬁnite programs (SDPs) have appeared as the basis for many procedures in machine learning, such as sparse PCA [8], distance metric learning [24], nonlinear dimensionality reduction [23], multiple kernel learning [14], multitask learning [19], and matrix completion [2]. [sent-15, score-0.125]

2 Generalpurpose solvers, often based on interior point methods, do exist and readily provide high-accuracy solutions. [sent-17, score-0.226]

3 However, their memory requirements do not scale well with problem size, and they typically do not allow a ﬁne-grained tradeoff between optimization accuracy and speed, which is often a desirable tradeoff in machine learning problems that are based on random data. [sent-18, score-0.106]

4 Although some SDPs do admit tailored solvers which are fast and scalable (e. [sent-20, score-0.1]

5 , [17, 3, 7]), deriving and implementing these methods is often challenging, and an easily usable solver that alleviates these issues has been elusive. [sent-22, score-0.107]

6 In this work, we present Random Conic Pursuit, a randomized solver for general SDPs that is simple, easily implemented, scalable, and of inherent interest due to its novel construction. [sent-24, score-0.147]

7 k, (1) where f and the gj are convex real-valued functions, and denotes the ordering induced by the PSD cone. [sent-30, score-0.103]

8 Random Conic Pursuit minimizes the objective function iteratively, repeatedly randomly sampling a PSD matrix and optimizing over the random two-dimensional subcone given by this matrix and the current iterate. [sent-31, score-0.125]

9 This construction maintains feasibility while avoiding the computational expense of deterministically ﬁnding feasible directions or of projecting into the feasible set. [sent-32, score-0.129]

10 The resulting algorithm, despite its simplicity and randomized nature, converges fairly quickly to useful approximate solutions. [sent-35, score-0.086]

11 Unlike interior point methods, Random Conic Pursuit does not excel in producing highly exact solutions. [sent-36, score-0.224]

12 At each iteration, the algorithm considers the two-dimensional cone spanned by the current iterate, Xt , and a random rank one PSD matrix, Yt . [sent-40, score-0.195]

13 It selects as its next iterate, Xt+1 , the point in this cone that minimizes the objective f subject to the constraints gj (Xt+1 ) ≤ 0 in (1). [sent-41, score-0.302]

14 The distribution of the random matrices is periodically updated based on the current iterate (e. [sent-42, score-0.135]

15 , to match the current iterate in expectation); these updates yield random matrices that are better matched to the optimum of the SDP at hand. [sent-44, score-0.155]

16 The two-variable optimization (2) can be solved quickly in general via a two-dimensional bisection search. [sent-45, score-0.129]

17 Additionally, SDPs with a trace constraint tr X = 1 ˆ force α + β = 1 and therefore require only a one-dimensional optimization. [sent-47, score-0.076]

18 First, its iterates are feasible for (1) because each iterate is a positive sum of two PSD matrices, and because the constraints gj of (2) are also those of (1). [sent-49, score-0.301]

19 Second, the objective values decrease monotonically because β = 1, α = 0 is a feasible solution to (2). [sent-50, score-0.127]

20 We must also note two limitations of Random Conic Pursuit: it does not admit general equality constraints, and it requires a feasible starting point. [sent-51, score-0.106]

21 Nonetheless, for many of the SDPs that appear in machine learning, feasible points are easy to identify, and equality constraints are either absent or fortuitously pose no difﬁculty. [sent-52, score-0.106]

22 We can gain further intuition by observing that Random Conic Pursuit’s iterates, Xt , are positive weighted sums of random rank one matrices and so lie in the random polyhedral cones t x Ft := γi xt xt : γi ≥ 0 ⊂ {X : X 0}. [sent-53, score-0.75]

23 the gj constraints x within an expanding sequence of random cones {Ft }. [sent-57, score-0.197]

24 These cones yield successively better inner approximations of the PSD cone (a basis for which is the set of all rank one matrices) while allowing us to easily ensure that the iterates remain PSD. [sent-58, score-0.284]

25 In light of this discussion, one might consider approximating the original SDP by sampling a random x cone Fn in one shot and replacing the constraint X 0 in (1) with the simpler linear constraints x x X ∈ Fn . [sent-59, score-0.252]

26 For sufﬁciently large n, Fn would approximate the PSD cone well (see Theorem 2 below), yielding an inner approximation that upper bounds the original SDP; the resulting problem would be easier than the original (e. [sent-60, score-0.141]

27 , it would become a linear program if the gj were linear). [sent-62, score-0.084]

28 gj (αYt + βXt−1 ) ≤ 0, α, β ≥ 0 ˆ Set Xt ← αYt + βXt−1 ˆ if α > 0 then Update p based on Xt ˆ end return Xn j = 1. [sent-68, score-0.084]

29 k (2) [p ← N (0, Σ) with Σ = (1 − κ)Xt + κId ] x successfully resolves this issue by iteratively expanding the random cone Ft . [sent-71, score-0.164]

30 3 Applications and Experiments We assess the practical convergence and scaling properties of Random Conic Pursuit by applying it to three different machine learning tasks that rely on SDPs: distance metric learning, sparse PCA, and maximum variance unfolding. [sent-74, score-0.146]

31 For each, we compare the performance of Random Conic Pursuit (implemented in MATLAB) to that of a standard and widely used interior point solver, SeDuMi [21] (via cvx [9]), and to the best available solver which has been customized for each problem. [sent-75, score-0.386]

32 To evaluate convergence, we ﬁrst compute a ground-truth solution X ∗ for each problem instance by running the interior point solver with extremely low tolerance. [sent-76, score-0.359]

33 Then, for each algorithm, we plot the normalized objective value errors [f (Xt ) − f (X ∗ )]/|f (X ∗ )| of its iterates Xt as a function of the amount of time required to generate each iterate. [sent-77, score-0.112]

34 We evaluate scaling with problem dimensionality by running the various solvers on problems of different dimensionalities and computing various metrics on the solver runs as described below for each experiment. [sent-80, score-0.226]

35 Unless otherwise noted, we use the bracketed sampling scheme given in Algorithm 1 with κ = 10−4 for all runs of Random Conic Pursuit. [sent-81, score-0.102]

36 1 Metric Learning Given a set of datapoints in Rd and a pairwise similarity relation over them, metric learning extracts a Mahalanobis distance dA (x, y) = (x − y) A(x − y) under which similar points are nearby and ¯ dissimilar points are far apart [24]. [sent-83, score-0.121]

37 We solve the twovariable optimization (2) via a double bisection search: at each iteration, α is optimized out with a one-variable bisection search over α given ﬁxed β, yielding a function of β only. [sent-89, score-0.163]

38 (plots) Trajectories of objective value error (left) and Q (right) on UCI ionosphere data. [sent-127, score-0.073]

39 (table) Scaling experiments on synthetic data (IP = interior point, RCP = Random Conic Pursuit, PG = projected gradient), with two trials per d for RCP and times in seconds. [sent-128, score-0.203]

40 As the application-speciﬁc metric for this problem, we measure the extent to which the metric learning goal has been achieved: similar datapoints should be near each other, and dissimilar datapoints should be farther away. [sent-130, score-0.242]

41 We adopt the following metric of quality of a solution ma¯ trix X, where ζ = i |{j : (i, j) ∈ S}| · |{l : (i, l) ∈ S}| and 1[·] is the indicator function: 1 Q(X) = ζ i j:(i,j)∈S l:(i,l)∈S 1[dij (X) < dil (X)]. [sent-131, score-0.085]

42 ¯ To examine convergence behavior, we ﬁrst apply the metric learning SDP to the UCI ionosphere dataset, which has d = 34 and 351 datapoints with two distinct labels (S contains pairs with identical labels). [sent-132, score-0.177]

43 We selected this dataset from among those used in [24] because it is among the datasets which have the largest dimensionality and experience the greatest impact from metric learning in that work’s clustering application. [sent-133, score-0.077]

44 Because the interior point solver scales prohibitively badly in the number of datapoints, we subsampled the dataset to yield 4 × 34 = 136 datapoints. [sent-134, score-0.346]

45 The best known customized solver for the metric learning SDP is a projected gradient algorithm [24], for which we used code available from the author’s website. [sent-140, score-0.28]

46 The two trajectory plots, for an ionosphere data problem instance, show that Random Conic Pursuit converges to a very high-quality solution (with high Q and negligible objective value error) signiﬁcantly faster than interior point. [sent-142, score-0.272]

47 Additionally, our performance is comparable to that of the projected gradient method which has been customized for this task. [sent-143, score-0.116]

48 On this synthetic data, projected gradient appears to reach high Q somewhat more quickly, though Random Conic Pursuit consistently yields signiﬁcantly better objective values, indicating better-quality solutions. [sent-147, score-0.101]

49 A subsequent rounding step returns the dominant eigenvector of the SDP’s solution, yielding a sparse principal component. [sent-153, score-0.074]

50 We use the colon cancer dataset [1] that has been used frequently in past studies of sparse PCA and contains 2,000 microarray readings for 62 subjects. [sent-154, score-0.106]

51 13 0 1076 2152 3228 time (sec) 4304 f after 4 hrs sparsity after 4 hrs IP RCP DSPCA -10. [sent-162, score-0.097]

52 52 top eigenvector sparsity normalized objective value error d 120 120 120 IP RCP DSPCA failed -9. [sent-178, score-0.175]

53 All solvers quickly yield similar captured variance (not shown here). [sent-191, score-0.137]

54 (plots) Trajectories of objective value error (left) and sparsity (right), for a problem with d = 100. [sent-192, score-0.07]

55 (table) Scaling experiments (IP = interior point, RCP = Random Conic Pursuit), with two trials per d for RCP. [sent-193, score-0.171]

56 Thus, we can simplify the two-variable optimization (2) to a one-variable optimization, which we solve by bisection search. [sent-199, score-0.083]

57 The fastest available customized solver for the sparse PCA SDP is an adaptation of Nesterov’s smooth optimization procedure [8] (denoted by DSPCA), for which we used a MATLAB implementation with heavy MEX optimizations that is downloadable from the author’s web site. [sent-200, score-0.24]

58 We compute two application-speciﬁc metrics which capture the two goals of sparse PCA: high captured variance and high sparsity. [sent-201, score-0.073]

59 Given the top eigenvector u of a solution matrix X, its captured 1 variance is u Au, and its sparsity is given by d j 1[|uj | < τ ]; we take τ = 10−3 based on qualitative inspection of the raw microarray data covariance matrix A. [sent-202, score-0.126]

60 As seen in the two plots, on a problem instance with d = 100, Random Conic Pursuit quickly achieves an objective value within 4% of optimal and thereafter continues to converge, albeit more slowly; we also quickly achieve fairly high sparsity (compared to that of the exact SDP optimum). [sent-204, score-0.187]

61 In contrast, interior point is able to achieve lower objective value and even higher sparsity within the timeframe shown, but, unlike Random Conic Pursuit, it does not provide the option of spending less time to achieve a solution which is still relatively sparse. [sent-205, score-0.296]

62 All of the solvers quickly achieve very similar captured variances, which are not shown. [sent-206, score-0.117]

63 However, that procedure is highly customized (via several pages of derivation and an optimized implementation), whereas Random Conic Pursuit and interior point are general-purpose. [sent-208, score-0.258]

64 The table in Figure 2 illustrates scaling by reporting achieved objecive values and sparsities after the solvers have each run for 4 hours. [sent-209, score-0.079]

65 Interior point fails due to memory requirements for d > 130, whereas Random Conic Pursuit continues to function and provide useful solutions, as seen from the achieved sparsity values, which are much larger than those of the raw data covariance matrix. [sent-210, score-0.077]

66 (table) Scaling experiments showing convergence as a function of m (IP = interior point, RCP = Random Conic Pursuit, GD = gradient descent). [sent-249, score-0.223]

67 This scheme empirically yields improved performance for the MVU problem as compared to the bracketed sampling scheme in Algorithm 1. [sent-251, score-0.122]

68 MVU also admits a gradient descent algorithm, which serves as a straw-man large-scale solver for the MVU SDP. [sent-254, score-0.151]

69 At each iteration, the step size is picked by a line search, and the spectrum of the iterate is truncated to maintain PSDness. [sent-255, score-0.071]

70 To quantify the amount of time it takes a solver to converge, we run it until its objective curve appears qualitatively ﬂat and declare the convergence point to be the earliest iterate whose ˆ objective is within 1% of the best objective value seen so far (which we denote by f ). [sent-258, score-0.368]

71 Figure 3 illustrates that Random Conic Pursuit’s objective values converge quickly, and on problems where the interior point solver achieves the optimum, Random Conic Pursuit nearly achieves that optimum. [sent-259, score-0.373]

72 The interior point solver runs out of memory when m > 400 and also fails on smaller problems if its tolerance parameter is not tuned. [sent-260, score-0.305]

73 Random Conic Pursuit easily runs on larger problems for which interior point fails, and for smaller problems its running time is within a small factor of that of the interior point solver; Random Conic Pursuit typically converges within 1000 iterations. [sent-261, score-0.396]

74 The gradient descent solver is orders of magnitude slower than the other solvers and failed to converge to a meaningful solution for m ≥ 400 even after 2000 iterations (which took 8 hours). [sent-262, score-0.329]

75 4 Analysis Analysis of Random Conic Pursuit is complicated by the procedure’s use of randomness and its handling of the constraints gj ≤ 0 explicitly in the sub-problem (2), rather than via penalty functions or projections. [sent-263, score-0.109]

76 We thus obtain a bound on the rate at which the objective values of Random Conic Pursuit’s iterates converge to the SDP’s optimal value when the problem has no constraints of the form gj ≤ 0: Theorem 1 (Convergence rate of Random Conic Pursuit when f is weakly convex and k = 0). [sent-265, score-0.263]

77 Xt be the iterates of Algorithm 1 when applied to this problem starting at iterate X0 (using the bracketed sampling scheme given in the algorithm speciﬁcation), and suppose Xt −X ∗ is bounded. [sent-270, score-0.24]

78 In particular, the choice Yt := γt (xt )xt xt with γt (x) = N (x|0, X ∗ )/N (x|0, Σt ) satisﬁes this. [sent-278, score-0.265]

79 t t Despite the extremely simple and randomized nature of Random Conic Pursuit, the theorem guarantees that its objective values converge at the rate O(1/t) on an important subclass of SDPs. [sent-287, score-0.134]

80 We omit here some readily available extensions: for example, the probability that a trajectory of iterates violates the above rate can be bounded by noting that the iterates’ objective values behave as a ﬁnite difference sub-martingale. [sent-288, score-0.14]

81 Directly characterizing the convergence of Random Conic Pursuit on problems with constraints appears to be signiﬁcantly more difﬁcult and seems to require introduction of new quantities depending on the constraint set (e. [sent-290, score-0.079]

82 , condition number of the constraint set and its overlap with the PSD cone) whose implications for the algorithm are difﬁcult to explicitly characterize with respect to d and the properties of the gj , X ∗ , and the Yt sampling distribution. [sent-292, score-0.147]

83 A more general analysis is the subject of continuing work, though our experiments conﬁrm empirically that we realize usefully fast convergence of Random Conic Pursuit even when it is applied to a variety of constrained SDPs. [sent-295, score-0.072]

84 We obtain a different analytical perspective by recalling that Random Conic Pursuit computes a x solution within the random polyhedral cone Fn , deﬁned in (3) above. [sent-296, score-0.229]

85 The distance between this cone and the optimal matrix X ∗ is closely related to the quality of solutions produced by Random x Conic Pursuit. [sent-297, score-0.153]

86 The following theorem characterizes the distance between a sampled cone Fn and ∗ any ﬁxed X in the PSD cone: Theorem 2. [sent-298, score-0.121]

87 As expected, Fn provides a progressively better approximation to the PSD cone (with high probability) as n grows. [sent-307, score-0.121]

88 The constant Γ in Theorem 1 can hide a dependence on the dimensionality of the problem d, though the proof of Theorem 2 helps to elucidate the dependence of Γ on d and X ∗ for the particular case when Σ does not vary over time (the constants in Theorem 2 arise from bounding γt (xt )xt xt ). [sent-309, score-0.285]

89 However, our empirical results in Section 3 show that Random Conic Pursuit does indeed decrease the objective function usefully quickly on real problems with relatively large d and solution matrices X ∗ which are rank one, a case predicted by the analysis to be among the most difﬁcult. [sent-313, score-0.195]

90 Our procedure is closely related to feasible direction methods [22], which move along descent directions in the feasible set deﬁned by the constraints at the current iterate. [sent-315, score-0.174]

91 Random Conic Pursuit overcomes the difﬁculty of ﬁnding feasible descent directions or cutting planes, respectively, by sampling directions randomly and also allowing the current iterate to be rescaled. [sent-317, score-0.248]

92 Pursuit-based optimization methods [6, 13] return a solution within the convex hull of an a priorispeciﬁed convenient set of points M. [sent-318, score-0.07]

93 At each iteration, they reﬁne their solution to a point between the current iterate and a point in M. [sent-319, score-0.153]

94 For SDPs having only a trace equality constraint and with M the set of rank one PSD matrices, Hazan [10] shows that such points in M can be found via an eigenvalue computation, thereby obtaining a convergence rate of O(1/t). [sent-321, score-0.112]

95 Finally, whereas Random Conic Pursuit utilizes a randomized polyhedral inner approximation of the PSD cone, the work of Calaﬁore and Campi [5] yields a randomized outer approximation to the PSD cone obtained by replacing the PSD constraint X 0 with a set of sampled linear inequality constraints. [sent-325, score-0.264]

96 It can be shown that for linear SDPs, the dual of the interior LP relaxation is identical to the exterior LP relaxation of the dual of the SDP. [sent-326, score-0.171]

97 6 Conclusion We have presented Random Conic Pursuit, a simple, easily implemented randomized solver for general SDPs. [sent-328, score-0.147]

98 Unlike interior point methods, our procedure does not excel at producing highly exact solutions. [sent-329, score-0.224]

99 A direct formulation for sparse pca using semideﬁnite programming. [sent-391, score-0.075]

100 A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. [sent-417, score-0.495]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conic', 0.477), ('pursuit', 0.467), ('sdps', 0.3), ('xt', 0.265), ('rcp', 0.24), ('yt', 0.207), ('interior', 0.171), ('psd', 0.151), ('sdp', 0.143), ('dspca', 0.135), ('cone', 0.121), ('solver', 0.107), ('ip', 0.092), ('gj', 0.084), ('failed', 0.08), ('mvu', 0.075), ('iterate', 0.071), ('iterates', 0.067), ('semide', 0.067), ('datapoints', 0.064), ('customized', 0.06), ('pg', 0.06), ('bisection', 0.06), ('fn', 0.058), ('metric', 0.057), ('feasible', 0.054), ('tr', 0.05), ('solvers', 0.047), ('pca', 0.046), ('sec', 0.046), ('quickly', 0.046), ('objective', 0.045), ('bracketed', 0.045), ('cones', 0.045), ('random', 0.043), ('randomized', 0.04), ('sampling', 0.037), ('polyhedral', 0.037), ('hrs', 0.036), ('alg', 0.036), ('scaling', 0.032), ('projected', 0.032), ('gd', 0.032), ('solutions', 0.032), ('rank', 0.031), ('cala', 0.03), ('colon', 0.03), ('ore', 0.03), ('sedumi', 0.03), ('sparse', 0.029), ('rd', 0.029), ('ionosphere', 0.028), ('convergence', 0.028), ('solution', 0.028), ('readily', 0.028), ('scalable', 0.028), ('point', 0.027), ('equality', 0.027), ('constraint', 0.026), ('excel', 0.026), ('extremely', 0.026), ('sparsity', 0.025), ('eigenvector', 0.025), ('constraints', 0.025), ('continues', 0.025), ('admit', 0.025), ('inexpensive', 0.024), ('usefully', 0.024), ('cutting', 0.024), ('microarray', 0.024), ('gradient', 0.024), ('captured', 0.024), ('optimization', 0.023), ('berkeley', 0.023), ('converge', 0.023), ('readings', 0.023), ('burer', 0.023), ('id', 0.022), ('ft', 0.022), ('nonetheless', 0.022), ('cvx', 0.021), ('optimizations', 0.021), ('badly', 0.021), ('iteration', 0.021), ('matrices', 0.021), ('directions', 0.021), ('plots', 0.021), ('continuing', 0.02), ('tradeoff', 0.02), ('yield', 0.02), ('descent', 0.02), ('matlab', 0.02), ('yielding', 0.02), ('programming', 0.02), ('scheme', 0.02), ('metrics', 0.02), ('trajectories', 0.02), ('dimensionality', 0.02), ('programs', 0.019), ('convex', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

2 0.18804176 231 nips-2010-Robust PCA via Outlier Pursuit

Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi

Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efﬁciently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative ﬁltering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efﬁcient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisﬁed, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identiﬁes the corrupted points. Such identiﬁcation of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and ﬁnancial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.

3 0.11767657 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efﬁcient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efﬁcient method for implementing mixture forecasting strategies. 1

4 0.11493398 265 nips-2010-The LASSO risk: asymptotic results and real world examples

Author: Mohsen Bayati, José Pereira, Andrea Montanari

Abstract: We consider the problem of learning a coefﬁcient vector x0 ∈ RN from noisy linear observation y = Ax0 + w ∈ Rn . In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x. In this case, a popular approach consists in solving an ℓ1 -penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the ﬁrst rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efﬁcient algorithm, that is inspired from graphical models ideas. Through simulations on real data matrices (gene expression data and hospital medical records) we observe that these results can be relevant in a broad array of practical applications.

5 0.11069065 182 nips-2010-New Adaptive Algorithms for Online Classification

Author: Francesco Orabona, Koby Crammer

Abstract: We propose a general framework to online learning for classiﬁcation problems with time-varying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classiﬁcation algorithms. By optimizing the general bound we derive a new online classiﬁcation algorithm, called NAROW, that hybridly uses adaptive- and ﬁxed- second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1

6 0.10512754 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

7 0.1015175 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

8 0.075844176 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

9 0.074952856 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

10 0.071634933 134 nips-2010-LSTD with Random Projections

11 0.067489773 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

12 0.067020878 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

13 0.06194482 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

14 0.061473306 224 nips-2010-Regularized estimation of image statistics by Score Matching

15 0.061089635 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification

16 0.059864771 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

17 0.059653603 287 nips-2010-Worst-Case Linear Discriminant Analysis

18 0.057993792 63 nips-2010-Distributed Dual Averaging In Networks

19 0.056870311 61 nips-2010-Direct Loss Minimization for Structured Prediction

20 0.056607127 228 nips-2010-Reverse Multi-Label Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.001), (2, 0.091), (3, 0.04), (4, 0.071), (5, -0.053), (6, 0.014), (7, 0.036), (8, -0.09), (9, 0.084), (10, 0.074), (11, 0.004), (12, 0.078), (13, 0.023), (14, 0.033), (15, 0.041), (16, -0.003), (17, -0.115), (18, 0.009), (19, 0.042), (20, 0.061), (21, 0.058), (22, 0.04), (23, -0.019), (24, -0.104), (25, -0.114), (26, 0.063), (27, -0.021), (28, -0.08), (29, -0.056), (30, -0.103), (31, 0.092), (32, -0.061), (33, 0.059), (34, 0.025), (35, -0.004), (36, -0.194), (37, -0.062), (38, -0.127), (39, -0.035), (40, -0.047), (41, 0.116), (42, -0.036), (43, 0.02), (44, -0.073), (45, 0.036), (46, 0.003), (47, -0.055), (48, 0.187), (49, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93847424 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

2 0.70281905 231 nips-2010-Robust PCA via Outlier Pursuit

Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi

3 0.55152333 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

4 0.4855971 45 nips-2010-CUR from a Sparse Optimization Viewpoint

Author: Jacob Bien, Ya Xu, Michael W. Mahoney

Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.

5 0.45406163 182 nips-2010-New Adaptive Algorithms for Online Classification

Author: Francesco Orabona, Koby Crammer

6 0.43855181 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

7 0.43826365 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

8 0.4340325 265 nips-2010-The LASSO risk: asymptotic results and real world examples

9 0.42852503 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations

10 0.4277086 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

11 0.39879876 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

12 0.3953827 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

13 0.39099306 19 nips-2010-A rational decision making framework for inhibitory control

14 0.38272583 287 nips-2010-Worst-Case Linear Discriminant Analysis

15 0.38093209 61 nips-2010-Direct Loss Minimization for Structured Prediction

16 0.36953205 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

17 0.35810408 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

18 0.34262004 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks

19 0.33657444 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

20 0.33116996 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.076), (16, 0.269), (17, 0.013), (27, 0.03), (30, 0.051), (35, 0.027), (45, 0.19), (50, 0.072), (52, 0.02), (60, 0.067), (77, 0.053), (78, 0.013), (90, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8184486 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semideﬁnite program (LRSDP) with the advantage that: 1) an efﬁcient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm ﬁnds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the afﬁne SfM problem, non-rigid SfM and photometric stereo problems.

same-paper 2 0.7858361 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan

3 0.74537355 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

Author: Silvia Chiappa, Jan R. Peters

Abstract: Many time-series such as human movement data consist of a sequence of basic actions, e.g., forehands and backhands in tennis. Automatically extracting and characterizing such actions is an important problem for a variety of different applications. In this paper, we present a probabilistic segmentation approach in which an observed time-series is modeled as a concatenation of segments corresponding to different basic actions. Each segment is generated through a noisy transformation of one of a few hidden trajectories representing different types of movement, with possible time re-scaling. We analyze three different approximation methods for dealing with model intractability, and demonstrate how the proposed approach can successfully segment table tennis movements recorded using a robot arm as haptic input device. 1

4 0.71527874 1 nips-2010-(RF)^2 -- Random Forest Random Field

Author: Nadia Payet, Sinisa Todorovic

Abstract: We combine random forest (RF) and conditional random ﬁeld (CRF) into a new computational framework, called random forest random ﬁeld (RF)2 . Inference of (RF)2 uses the Swendsen-Wang cut algorithm, characterized by MetropolisHastings jumps. A jump from one state to another depends on the ratio of the proposal distributions, and on the ratio of the posterior distributions of the two states. Prior work typically resorts to a parametric estimation of these four distributions, and then computes their ratio. Our key idea is to instead directly estimate these ratios using RF. RF collects in leaf nodes of each decision tree the class histograms of training examples. We use these class histograms for a nonparametric estimation of the distribution ratios. We derive the theoretical error bounds of a two-class (RF)2 . (RF)2 is applied to a challenging task of multiclass object recognition and segmentation over a random ﬁeld of input image regions. In our empirical evaluation, we use only the visual information provided by image regions (e.g., color, texture, spatial layout), whereas the competing methods additionally use higher-level cues about the horizon location and 3D layout of surfaces in the scene. Nevertheless, (RF)2 outperforms the state of the art on benchmark datasets, in terms of accuracy and computation time.

5 0.69459844 212 nips-2010-Predictive State Temporal Difference Learning

Author: Byron Boots, Geoffrey J. Gordon

Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identiﬁcation. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identiﬁcation (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD ﬁnds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difﬁcult optimal stopping problem. 1

6 0.6487959 265 nips-2010-The LASSO risk: asymptotic results and real world examples

7 0.64876926 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

8 0.64706796 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

9 0.64669156 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

10 0.64603335 196 nips-2010-Online Markov Decision Processes under Bandit Feedback

11 0.64600229 63 nips-2010-Distributed Dual Averaging In Networks

12 0.64555031 117 nips-2010-Identifying graph-structured activation patterns in networks

13 0.64365757 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

14 0.64310706 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

15 0.64256245 148 nips-2010-Learning Networks of Stochastic Differential Equations

16 0.64163154 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

17 0.64115584 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

18 0.6409899 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

19 0.64043868 287 nips-2010-Worst-Case Linear Discriminant Analysis

20 0.63968915 222 nips-2010-Random Walk Approach to Regret Minimization