nips nips2012 nips2012-34 knowledge-graph by maker-knowledge-mining

34 nips-2012-Active Learning of Multi-Index Function Models

Source: pdf

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is deﬁned on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Active Learning of Multi-Index Function Models Hemant Tyagi and Volkan Cevher LIONS – EPFL Abstract We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . [sent-1, score-0.141]

2 We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. [sent-3, score-0.403]

3 Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. [sent-4, score-0.235]

4 We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. [sent-5, score-0.238]

5 Therefore, further assumptions on the multivariate functions beyond smoothness are needed for the tractability of successful learning [13, 14, 16, 19]. [sent-21, score-0.192]

6 To this end, we seek to learn low-dimensional function models f (x) = g(Ax) that decompose as k k gi (aT x) | Model 2: f (x) = aT x + i 1 Model 1: f (x) = i=1 gi (aT x), i (1) i=2 thereby constraining f to effectively live on k-dimensional subspaces, where k d. [sent-22, score-0.264]

7 In stark contrast to the classical regression setting where (yi , xi )m is given a priori, we posit the i=1 active learning setting where we can query the function to obtain ﬁrst an explicit approximation of A and subsequently of f . [sent-24, score-0.246]

8 As a stylized example of the active learning setting, consider numerical solutions of parametric partial differential equations (PDE). [sent-25, score-0.161]

9 As we have the ability to choose the samples, we can minimize the number of queries to the PDE solver in order to learn an explicit approximation of f [13]. [sent-27, score-0.175]

10 active learning), what the underlying low-dimensional model is (low-rank vs. [sent-30, score-0.129]

11 Noting the differentiability of f , we observe that the gradients f (x) = AT g(Ax) live within the low-dimensional subspaces of AT . [sent-34, score-0.102]

12 Assuming that g has sufﬁcient richness to span kdimensional subspaces of the rows of A, we use the given samples to obtain Hessian estimates via local smoothing techniques, such as kernel estimates, nearest-neighbor, or spline methods. [sent-35, score-0.105]

13 In some cases, we can even establish asymptotic distribution of A estimates but not ﬁnite sample complexity bounds. [sent-37, score-0.166]

14 Regression/sparse [8–12]: We add sparsity restrictions on the function models: for instance, we assume only one coordinate is active per additive term in (1). [sent-38, score-0.208]

15 We employ greedy algorithms, back-ﬁtting approaches, or convex regularizers to not only estimate the active coordinates but also the function itself. [sent-40, score-0.129]

16 We can then establish ﬁnite sample complexity rates with guarantees of the form f − f L2 ≤ δ, which grow logarithmically with d as well as match the minimax bounds for the learning problem. [sent-41, score-0.241]

17 Moreover, the function estimation incurs a linear cost in k since the problem formulation affords a rotation-free structure between x and gi ’s. [sent-42, score-0.107]

18 Active learning [13–15]: The majority of the (rather limited) literature on active non-parametric function learning makes sparsity assumptions on A to obtain guarantees of the form f − f L∞ ≤ δ, where f ∈ C s with s > 1. [sent-43, score-0.205]

19 1 For instance, we consider the form f (x) = g(Ax), where the rows of A live in a weak q ball with q < 2 (i. [sent-44, score-0.133]

20 2 We then leverage a prescribed random sampling, and prove that the sample complexity grows logarithmically with d and is inversely proportional to the k-th singular value of a “Hessian” matrix H f (for a precise deﬁnition of H f , see (7)). [sent-47, score-0.323]

21 Thus far, the only known characterization for the k-th singular value of H f is for radial basis functions, i. [sent-48, score-0.178]

22 Just recently, we also see a low-rank model to handle f (x) = g(aT x) for a general a (k = 1) with a sample complexity proportional to d [15]. [sent-51, score-0.122]

23 Our contributions: In this paper, which is a summary of [26], we take the active learning perspective via low-rank methods, where we have a general A with only C s assumptions on gi ’s. [sent-52, score-0.276]

24 k-th singular value of H f [14, 15]: Based on the random sampling schemes of [14, 15], we rigorously establish the ﬁrst high-dimensional scaling characterization of the k-th singular value of H f , which governs the sample complexity in both sparse and general A for the multi-index models in (1). [sent-54, score-0.52]

25 Generalization of [13–15]: We derive the ﬁrst sample complexity bound for the C s functions in (1) with arbitrary number of linear parameters k without the compressibility assumptions on the rows of A. [sent-57, score-0.322]

26 Along the way, we leverage the conventional low-rank models in regression approaches and bridge them with the recent low-rank recovery algorithms. [sent-58, score-0.243]

27 Our result also lifts the sparse additive models in regression [8–12] to a basis-free setting. [sent-59, score-0.146]

28 Impact of additive noise: We analytically show how additive white Gaussian noise in the function queries impacts the sample complexity of our low-rank approach. [sent-61, score-0.492]

29 1 Not to be confused with the online active learning approaches, which “optimize” a function, such as ﬁnding its maximum [25]. [sent-62, score-0.129]

30 In contrast, we would like to obtain uniform approximation guarantees on f , which might lead to redundant samples if we truly are only interested in ﬁnding a critical point of the function. [sent-63, score-0.119]

31 2 As having one known basis to sparsify all k-dimensions in order to obtain a sparse A is rather restrictive, this model does not provide a basis-free generalization of the sparse additive models in regression [8–12]. [sent-64, score-0.202]

32 2 2 A recipe for active learning of low-dimensional non-parametric models This section provides the preliminaries for our low-rank active learning approach for multi-index models in (1). [sent-65, score-0.295]

33 We ﬁrst introduce our sampling scheme (based on [14, 15]), summarize our main observation model (based on [6, 7, 14, 15]), and explain our algorithmic approach (based on [15]). [sent-66, score-0.157]

34 Our sampling scheme: Our sampling approach relies on a speciﬁc interaction of two sets: sampling centers and an associated set of directions for each center. [sent-68, score-0.31]

35 We denote the set of sampling centers as X = {ξj ∈ Sd−1 ; j = 1, . [sent-69, score-0.118]

36 We form X by sampling points uniformly at random in Sd−1 (the unit sphere in d-dimensions) according to the uniform measure µSd−1 . [sent-73, score-0.149]

37 |φmΦ ,j ] , and construct the sampling directions operator Φ for j = 1, . [sent-77, score-0.112]

38 Our low-rank observation model: We ﬁrst write the Taylor series approximation of f as follows φT 2 f (ζ(x, φ))φ, (3) 2 where 1, E(x, , φ) is the curvature error, and ζ(x, φ) ∈ [x, x + φ] ∈ BRd (1 + r). [sent-87, score-0.118]

39 Based on (4), we then derive the following linear system via the operator Φ : Rd×mX → RmΦ mX y = Φ(X) + ε; yi = −1 [f (ξj + φi,j ) − f (ξj )] , (5) j=1 where y ∈ RmΦ are the perturbed measurements of X with [Φ(X)]j = trace ΦT X , and ε = j E(X , , Φ) is the curvature perturbations. [sent-90, score-0.151]

40 The formulation (5) motivates us to leverage afﬁne rankminimization algorithms [27–29] for low-rank matrix recovery since rank(X) ≤ k d. [sent-91, score-0.237]

41 Our active low-rank learning algorithm Algorithm 1 outlines the main steps involved in our approximation scheme. [sent-92, score-0.181]

42 Step 3 maps the recovered low-rank matrix to A using the singular value decomposition (SVD) and rank-k approximation. [sent-95, score-0.12]

43 2: Obtain X via a stable low-rank recovery algorithm (see Section 3 for an example). [sent-98, score-0.193]

44 3: Compute SVD(X) = UΣVT and set AT = U(k) , corresponding to k largest singular values. [sent-99, score-0.087]

45 We uniformly approximate the function g by ﬁrst sampling it on a rectangular grid: hZk ∩ (−(1 + ¯), (1 + ¯))k with uniformly spaced points in each direction (step size h). [sent-102, score-0.156]

46 We then use quasi-interpolants to interpolate in between the points thereby obtaining the approximation gh , ˆ where the complexity is exponential in k (see the tractability discussion in the introduction). [sent-103, score-0.176]

47 3 3 Stable low-rank recovery algorithms within our learning scheme By stable low-rank recovery in Algorithm 1, we mean any algorithm that returns an X with the following guarantee: X − X F ≤ c1 X − Xk F + c2 ε 2 , where c1,2 are constants, and Xk is the best rank-k approximation of X. [sent-105, score-0.484]

48 Since there exists a vast set of algorithms with such guarantees, we use the matrix Dantzig selector [29] as a running example. [sent-106, score-0.123]

49 This discussion is intended to expose the reader to the key elements necessary to re-derive the sample complexity of our scheme in Section 4 for different algorithms, which might offer additional computational trade-offs. [sent-107, score-0.231]

50 Stable embedding: We ﬁrst explain an elementary result stating that our sampling mechanism satisﬁes the restricted isometry property (RIP) for all rank-k matrices with overwhelming probabil2 2 2 ity. [sent-108, score-0.166]

51 This property can be used in establishing stability of virtually all low-rank recovery algorithms. [sent-110, score-0.194]

52 9 Recovery algorithm and its tuning parameters: The Dantzig selector criteria is given by XDS = arg min M M ∗ s. [sent-114, score-0.09]

53 dm 2 mΦ Proposition 1 is a new result that provides the typical low-rank recovery algorithm tuning parameters for the random sampling scheme in Section 2. [sent-123, score-0.349]

54 Note that the dimension d appears in the bound as we do not make any compressibility assumption on A. [sent-125, score-0.103]

55 Stability of low-rank recovery: We ﬁrst restate a stability result from [29] for bounded noise in Theorem 1. [sent-130, score-0.11]

56 The approximation guarantee in Corollary 1 can be tightened if other low-rank recovery algorithms are employed in estimation of X. [sent-138, score-0.214]

57 However, we note again that the Dantzig selector enables us to highlight the key steps that lead to the sample complexity of our approach in the next section. [sent-139, score-0.212]

58 This is critical in ensuring that G explores the full k-dimensional subspaces as spanned by AT lest X is rank deﬁcient. [sent-141, score-0.09]

59 This property is typically key in proving low-rank recovery guarantees. [sent-144, score-0.162]

60 : The step-size in (3) manages the impact of the curvature effects E in the linear system (5). [sent-145, score-0.148]

61 Unfortunately, this leads to a collateral damage of amplifying the impact of noise if the queries are corrupted. [sent-146, score-0.257]

62 By our set up, g also lives over a compact set, hence all its partial derivatives till the order of two are bounded as a result of the Stone-Weierstrass theorem: sup|β|≤2 Dβ g ∞ ≤ C2 ; D β g = ∂ |β| β ∂y1 1 β . [sent-156, score-0.164]

63 Finally, the effectiveness of our sampling approach depends on whether or not the following “Hessian” matrix H f is well-conditioned: H f := Sd−1 f (x) f (x)T dµSd−1 (x). [sent-160, score-0.113]

64 (7) That is, for the singular values of H f , we assume σ1 (H f ) ≥ · · · ≥ σk (H f ) ≥ α > 0 for some α. [sent-161, score-0.087]

65 Restricted singular values of multi-index models: Our ﬁrst main technical contribution provides a local condition in Proposition 2 that fully characterizes α for multi-index models in (1) below. [sent-163, score-0.125]

66 Assume that g ∈ C 2 : BRk → R has Lipschitz continuous second order partial derivatives in an open neighborhood of the origin, Uθ = BRk (θ) for some ﬁxed θ = O d−(s+1) , and for some s > 0: ∂2g ∂2g (y ) − (y ) ∂yi ∂yj 1 ∂yi ∂yj 2 y1 − y2 ≤ Li,j ∀y1 , y2 ∈ Uθ , y1 = y2 , i, j = 1, . [sent-166, score-0.094]

67 Also assume, The proof of Proposition 2 also leads to the following proposition for tractability of learning the general set f (x) = g(Ax) without the particular modular decomposition as in (1): Proposition 3. [sent-179, score-0.186]

68 3 Unless further assumptions are made on f or gi ’s, we can only identify the subspace spanned by the rows of A up to a rotation. [sent-182, score-0.2]

69 Hence, while we discuss approximation results on A, the reader should keep in mind that our ﬁnal guarantees only apply to the function f and not necessarily for A and g individually. [sent-183, score-0.12]

70 This is not a restriction, but is a consequence of our scheme as we work with directional derivatives of f at points on the unit sphere Sd−1 . [sent-187, score-0.172]

71 Theorem 2 characterizes the necessary scaling of the sample complexity for our active learning scheme in order to obtain uniform approximation guarantees on f with overwhelming probability: k log k α mX = O , mΦ = O(k(d + mX )), and = O √d . [sent-192, score-0.557]

72 Finally, we also mention that the sample complexity can be written differently to trade-off δ among mX , mΦ , and . [sent-194, score-0.152]

73 For instance, we can remove δ dependence in the sampling bound for : let δ < 1, then we just need to scale mX by δ −2 , and mΦ by δ −4 . [sent-195, score-0.109]

74 Note that the sample complexity in [14] for learning compressible A is m = 4−q 2 O k 2−q d 2−q log(k) with uniform approximation guarantees on f ∈ C 2 . [sent-197, score-0.278]

75 Surprisingly, our sample complexity for multi-index models (1) not only generalizes this result for general A but features a better dimensional dependence for q ∈ (1, 2) : m = O k 3 d2 (log(k))2 . [sent-199, score-0.122]

76 Of course, we require more computation since we use low-rank recovery as opposed to sparse recovery methods. [sent-200, score-0.352]

77 Our motivation is to understand how additive noise in function queries, a realistic assumption in many applications, can impact our learning scheme, which will form the basis of our third main technical contribution. [sent-202, score-0.267]

78 , σ reduces with d), we can only declare that our learning scheme with the matrix Dantzig selector is sensitive to noise unless we resample the same data points O( −1 )-times and average. [sent-214, score-0.307]

79 If the noise variance σ 2 is constant, this would keep the impact of noise below a constant times the impact of the curvature errors, √ which our scheme can handle. [sent-215, score-0.463]

80 The sample complexity then becomes m = O d/α mX (mΦ + 1), since we choose mX (mΦ + 1) unique points, and then re-query and average the same points √ √ O d/α -times. [sent-216, score-0.122]

81 Unfortunately, we cannot qualitatively improve the O d/α -expansion for noise robustness by simply changing the low-rank recovery algorithm since it depends on the relative ratio of the curvature errors ε 2 to the norm of the noise vector z . [sent-217, score-0.384]

82 5 Numerical Experiments We present simulation results on toy examples to empirically demonstrate the tightness of the sampling bounds. [sent-219, score-0.116]

83 In the sequel, we assume A to be row orthonormal and concern ourselves only with the recovery of A upto an orthonormal transformation. [sent-220, score-0.162]

84 We observe that for large values of d, the minimum number of directional ˆ derivatives needed to achieve the performance bound on | a, a | scales approximately linearly with d, with a scaling factor of around 1. [sent-235, score-0.261]

85 Sum of Gaussian functions (k > 1) We next consider functions of the form f (x) = g(Ax + k 2 2 b) = i=1 gi (aT x + bi ), where gi (y) = (2πσi )−1/2 exp −(y + bi )2 /(2σi ) . [sent-242, score-0.282]

86 In each trial, we select the rows of A over the left Haar measure on Sd−1 , and the parameter b uniformly at random on Sk−1 scaled by a factor 0. [sent-248, score-0.091]

87 f (x) = g(Ax) = Ax − b with the point queries corrupted with Gaussian noise. [sent-255, score-0.097]

88 For each d we perturb the point queries with Gaussian noise of standard deviation: 0. [sent-258, score-0.175]

89 This is the same as repeatedly sampling each random location approximately d3/2 times followed by averaging. [sent-260, score-0.12]

90 In Figure 2(b), we see that mΦ scales approximately linearly with d, which follows our sample complexity bound for mΦ in Theorem 2. [sent-265, score-0.22]

91 7 (a) k > 1 (Gaussian) (b) k > 1 (quadratic) with noise Figure 2: The empirical performance of our oracle-based low-rank learning scheme (circles) agrees well with the theoretical scaling (dashed). [sent-266, score-0.193]

92 By introducing a new analysis tool based on Lipschitz property on the second order derivatives, we provide the ﬁrst rigorous characterization of the dimension dependence of the k-restricted singular value of the “Hessian” matrix H f for general multi-index models. [sent-270, score-0.184]

93 We establish the ﬁrst sample complexity bound for learning non-parametric multi-index models with low-rank recovery algorithms and also analyze the impact of additive noise to the sample complexity of the scheme. [sent-271, score-0.718]

94 Lastly, we provide empirical evidence on toy examples to show the tightness of the sampling bounds. [sent-272, score-0.116]

95 Finally, while our active learning scheme ensures the tractability of learning non-parametric multi-index models, it does not establish a lowerbound on the sample complexity, which is left for future work. [sent-273, score-0.342]

96 The authors thank Jan Vybiral for useful discussions and Anastasios Kyrillidis for his help with the low-rank matrix recovery simulations. [sent-276, score-0.195]

97 Minimax-optimal rates for sparse additive models over kernel classes via convex programming. [sent-341, score-0.107]

98 Learning ridge functions with randomized sampling in high dimensions. [sent-375, score-0.14]

99 Learning non-parametric basis independent models from point queries via low-rank methods. [sent-451, score-0.125]

100 Tight oracle bounds for low-rank matrix recovery from a minimal e number of random measurements. [sent-472, score-0.195]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mx', 0.698), ('xds', 0.214), ('ax', 0.211), ('recovery', 0.162), ('brd', 0.161), ('sd', 0.14), ('active', 0.129), ('gi', 0.107), ('proposition', 0.106), ('queries', 0.097), ('dantzig', 0.09), ('rip', 0.09), ('selector', 0.09), ('singular', 0.087), ('impact', 0.082), ('sampling', 0.08), ('tyagi', 0.08), ('additive', 0.079), ('noise', 0.078), ('complexity', 0.077), ('scheme', 0.077), ('curvature', 0.066), ('hessian', 0.064), ('derivatives', 0.062), ('pde', 0.062), ('brk', 0.054), ('devore', 0.054), ('hemant', 0.054), ('volkan', 0.054), ('rows', 0.053), ('approximation', 0.052), ('isometry', 0.052), ('cand', 0.052), ('subspaces', 0.052), ('live', 0.05), ('tractability', 0.047), ('sample', 0.045), ('establish', 0.044), ('compressibility', 0.044), ('lipschitz', 0.043), ('xk', 0.042), ('leverage', 0.042), ('smoothness', 0.041), ('assumptions', 0.04), ('approximately', 0.04), ('logarithmically', 0.039), ('aat', 0.039), ('cevher', 0.039), ('haar', 0.039), ('regression', 0.039), ('rank', 0.038), ('uniformly', 0.038), ('characterizes', 0.038), ('scaling', 0.038), ('centers', 0.038), ('impacts', 0.037), ('compressible', 0.037), ('epfl', 0.037), ('recipe', 0.037), ('tightness', 0.036), ('guarantees', 0.036), ('annals', 0.035), ('till', 0.035), ('lives', 0.035), ('functions', 0.034), ('overwhelming', 0.034), ('characterization', 0.034), ('modular', 0.033), ('directional', 0.033), ('matrix', 0.033), ('contributions', 0.032), ('corollary', 0.032), ('partial', 0.032), ('reader', 0.032), ('directions', 0.032), ('stability', 0.032), ('uniform', 0.031), ('stable', 0.031), ('ball', 0.03), ('dm', 0.03), ('dimension', 0.03), ('needed', 0.03), ('yi', 0.03), ('mention', 0.03), ('aa', 0.03), ('differentiable', 0.029), ('radial', 0.029), ('linearly', 0.029), ('bound', 0.029), ('unless', 0.029), ('perturbed', 0.028), ('theorem', 0.028), ('sparse', 0.028), ('basis', 0.028), ('measurements', 0.027), ('explicit', 0.026), ('ridge', 0.026), ('nuclear', 0.026), ('august', 0.026), ('remark', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

2 0.14739044 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efﬁciently solved using convex programming. As a ﬁrst application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efﬁciently. A sufﬁcient condition for exact recovery of the minimum cardinality solution is derived for arbitrary afﬁne constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1

3 0.12832078 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

4 0.098598294 32 nips-2012-Active Comparison of Prediction Models

Author: Christoph Sawade, Niels Landwehr, Tobias Scheffer

Abstract: We address the problem of comparing the risks of two given predictive models—for instance, a baseline model and a challenger—as conﬁdently as possible on a ﬁxed labeling budget. This problem occurs whenever models cannot be compared on held-out training data, possibly because the training data are unavailable or do not reﬂect the desired test distribution. In this case, new test instances have to be drawn and labeled at a cost. We devise an active comparison method that selects instances according to an instrumental sampling distribution. We derive the sampling distribution that maximizes the power of a statistical test applied to the observed empirical risks, and thereby minimizes the likelihood of choosing the inferior model. Empirically, we investigate model selection problems on several classiﬁcation and regression tasks and study the accuracy of the resulting p-values. 1

5 0.097844429 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish ﬁnite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1

6 0.096960805 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

7 0.095126167 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

8 0.081965797 27 nips-2012-A quasi-Newton proximal splitting method

9 0.08146134 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

10 0.080796257 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

11 0.078732572 254 nips-2012-On the Sample Complexity of Robust PCA

12 0.075625025 16 nips-2012-A Polynomial-time Form of Robust Regression

13 0.07291729 247 nips-2012-Nonparametric Reduced Rank Regression

14 0.070853122 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

15 0.070080131 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

16 0.069939405 292 nips-2012-Regularized Off-Policy TD-Learning

17 0.069909289 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

18 0.068153128 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

19 0.065923691 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

20 0.063511215 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.193), (1, 0.035), (2, 0.108), (3, -0.064), (4, 0.038), (5, 0.085), (6, 0.002), (7, 0.017), (8, 0.034), (9, -0.036), (10, -0.019), (11, 0.003), (12, -0.01), (13, -0.043), (14, -0.022), (15, -0.004), (16, 0.056), (17, -0.015), (18, 0.029), (19, -0.036), (20, -0.058), (21, 0.07), (22, -0.056), (23, 0.056), (24, 0.031), (25, 0.057), (26, -0.022), (27, 0.038), (28, 0.036), (29, -0.013), (30, -0.046), (31, -0.012), (32, 0.019), (33, 0.071), (34, -0.059), (35, -0.013), (36, 0.013), (37, 0.018), (38, 0.106), (39, 0.033), (40, 0.02), (41, -0.009), (42, -0.108), (43, 0.044), (44, -0.027), (45, -0.109), (46, 0.125), (47, -0.068), (48, 0.045), (49, -0.069)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91193879 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

2 0.67892945 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran

3 0.67575395 63 nips-2012-CPRL -- An Extension of Compressive Sensing to the Phase Retrieval Problem

Author: Henrik Ohlsson, Allen Yang, Roy Dong, Shankar Sastry

Abstract: While compressive sensing (CS) has been one of the most vibrant research ﬁelds in the past few years, most development only applies to linear models. This limits its application in many areas where CS could make a difference. This paper presents a novel extension of CS to the phase retrieval problem, where intensity measurements of a linear system are used to recover a complex sparse signal. We propose a novel solution using a lifting technique – CPRL, which relaxes the NP-hard problem to a nonsmooth semideﬁnite program. Our analysis shows that CPRL inherits many desirable properties from CS, such as guarantees for exact recovery. We further provide scalable numerical solvers to accelerate its implementation. 1

4 0.67568201 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

5 0.62596017 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be deﬁned by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

6 0.61112797 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA

7 0.61001176 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

8 0.58086509 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

9 0.56932104 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

10 0.56352472 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

11 0.56289971 32 nips-2012-Active Comparison of Prediction Models

12 0.54588509 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

13 0.54448968 27 nips-2012-A quasi-Newton proximal splitting method

14 0.53494942 79 nips-2012-Compressive neural representation of sparse, high-dimensional probabilities

15 0.53184253 247 nips-2012-Nonparametric Reduced Rank Regression

16 0.52906877 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

17 0.52680159 254 nips-2012-On the Sample Complexity of Robust PCA

18 0.52342916 286 nips-2012-Random Utility Theory for Social Choice

19 0.52291989 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

20 0.52181786 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (1, 0.185), (21, 0.03), (38, 0.183), (42, 0.043), (53, 0.011), (54, 0.032), (55, 0.018), (68, 0.026), (74, 0.047), (76, 0.172), (80, 0.077), (92, 0.056)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94415522 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be deﬁned, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1

2 0.90058076 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

Abstract: A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. In this paper, we represent patient risk as a time series. In doing so, patient risk stratiﬁcation becomes a time-series classiﬁcation task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must ﬁrst be extracted. Thus, we begin by deﬁning and extracting approximate risk processes, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classiﬁcation with the goal of identifying high-risk patterns. We apply the classiﬁcation to the speciﬁc task of identifying patients at risk of testing positive for hospital acquired Clostridium difﬁcile. We achieve an area under the receiver operating characteristic curve of 0.79 on a held-out set of several hundred patients. Our two-stage approach to risk stratiﬁcation outperforms classiﬁers that consider only a patient’s current state (p<0.05). 1

3 0.89891428 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identiﬁcation of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efﬁcient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

same-paper 4 0.88175064 34 nips-2012-Active Learning of Multi-Index Function Models

Author: Tyagi Hemant, Volkan Cevher

5 0.87922567 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: ﬁxed budget and ﬁxed conﬁdence. We propose a unifying approach that leads to a meta-algorithm called uniﬁed gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing ﬁxed budget and ﬁxed conﬁdence algorithms. 1

6 0.85885888 188 nips-2012-Learning from Distributions via Support Measure Machines

7 0.83460718 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

8 0.82994682 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

9 0.82215154 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

10 0.82020891 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

11 0.8192659 179 nips-2012-Learning Manifolds with K-Means and K-Flats

12 0.81918544 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

13 0.81894499 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

14 0.81870317 38 nips-2012-Algorithms for Learning Markov Field Policies

15 0.8180775 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

16 0.81709814 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

17 0.81670147 361 nips-2012-Volume Regularization for Binary Classification

18 0.81659937 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

19 0.8161881 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

20 0.81617838 319 nips-2012-Sparse Prediction with the $k$-Support Norm