nips nips2012 nips2012-247 knowledge-graph by maker-knowledge-mining

247 nips-2012-Nonparametric Reduced Rank Regression


Source: pdf

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. [sent-2, score-0.12]

2 To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. [sent-3, score-0.242]

3 Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. [sent-4, score-0.323]

4 Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. [sent-5, score-0.18]

5 1 Introduction In the multivariate regression problem the objective is to estimate the conditional mean E(Y ∣ X) = m(X) = (m1 (X), . [sent-7, score-0.142]

6 , mq (X))⊺ where Y is a q-dimensional response vector and X is a pdimensional covariate vector. [sent-10, score-0.229]

7 Under a linear model, the mean is estimated as m(X) = BX where B ∈ Rq×p is a q × p matrix of regression coefficients. [sent-13, score-0.147]

8 In reduced rank regression the matrix B is estimated under a rank constraint r = rank(B) ≤ C, so that the rows or columns of B lie in an r-dimensional subspace of Rq or Rp . [sent-15, score-0.703]

9 In low dimensions, the constrained rank model can be computed as an orthogonal projection of the least squares solution; but in high dimensions this is not well defined. [sent-17, score-0.35]

10 Recent research has studied the use of the nuclear norm as a convex surrogate for the rank constraint. [sent-18, score-0.475]

11 The nuclear norm ∥B∥∗ , also known as the trace or Ky-Fan norm, is the sum of the singular vectors of B. [sent-19, score-0.308]

12 A rank constraint can be thought of as imposing sparsity, but in an unknown basis; the nuclear norm plays the role of the 1 norm in sparse estimation. [sent-20, score-0.557]

13 Its use for low rank estimation problems was proposed by Fazel in [2]. [sent-21, score-0.294]

14 More recently, nuclear norm regularization in multivariate linear regression has been studied by Yuan et al. [sent-22, score-0.403]

15 In this paper we study nonparametric parallels of reduced rank linear models. [sent-24, score-0.445]

16 We focus our attention on additive models, so that the regression function m(X) = (m1 (X), . [sent-25, score-0.216]

17 , mq (X))⊺ has each component mk (X) = ∑p mk (Xj ) equal to a sum of p functions, one for each covariate. [sent-28, score-0.454]

18 The objective j j=1 is then to estimate the q × p matrix of functions M (X) = [mk (Xj )]. [sent-29, score-0.104]

19 j The first problem we address, in Section 2, is to determine a replacement for the regularization penalty ∥B∥∗ in the linear model. [sent-30, score-0.261]

20 Because we must estimate a matrix of functions, the analogue of the nuclear norm is not immediately apparent. [sent-31, score-0.28]

21 We propose two related regularization penalties for 1 nonparametric low rank regression, and show how they specialize to the linear case. [sent-32, score-0.597]

22 We then study, in Section 4, the (infinite dimensional) subdifferential of these penalties. [sent-33, score-0.11]

23 In the population setting, this leads to stationary conditions for the minimizer of the regularized mean squared error. [sent-34, score-0.142]

24 This subdifferential calculus then justifies penalized backfitting algorithms for carrying out the optimization for a finite sample. [sent-35, score-0.14]

25 A uniform bound on the excess risk of the estimator relative to an oracle is given Section 6. [sent-38, score-0.148]

26 The analysis requires a concentration result for nonparametric covariance matrices in the spectral norm. [sent-40, score-0.167]

27 Experiments with gene data are given in Section 7, which are used to illustrate different facets of the proposed nonparametric reduced rank regression techniques. [sent-41, score-0.585]

28 2 Nonparametric Nuclear Norm Penalization We begin by presenting the penalty that we will use to induce nonparametric regression estimates to be low rank. [sent-42, score-0.467]

29 To motivate our choice of penalty and provide some intuition, suppose that f 1 (x), . [sent-43, score-0.229]

30 What does it mean for this collection of functions to be low rank? [sent-47, score-0.101]

31 We require that the n × q matrix of function values F(x1∶n ) = [f k (xi )] is low rank. [sent-52, score-0.099]

32 This matrix is of rank at most r < q for every set {xi } of arbitrary size n if and only if the functions {f k } are r-linearly independent—each function can be written as a linear combination of r of the other functions. [sent-53, score-0.35]

33 In the multivariate regression setting, but still assuming the domain is one-dimensional for simplicity (q > 1 and p = 1), we have a random sample X1 , . [sent-54, score-0.172]

34 , mq ) of q smooth (regression) functions, and suppose that n > q. [sent-61, score-0.132]

35 This suggests the penalty √ ∥M∥∗ = ∑q σs (M) = ∑q λs (M⊺ M), where {λs (A)} denotes the eigenvalues of a symmetric s=1 s=1 matrix A and {σs (B)} denotes the singular values of a matrix B. [sent-63, score-0.41]

36 Now, assuming the columns of M 1 ̂ are centered, and E[mk (X)] = 0 for each k, we recognize n M⊺ M as the sample covariance Σ(M ) k l of the population covariance Σ(M ) ∶= Cov(M (X)) = [E(m (X)m (X))]. [sent-64, score-0.207]

37 This motivates the following sample and population penalties, where A1/2 denotes the matrix square root: population penalty: ∥Σ(M )1/2 ∥∗ = ∥ Cov(M (X))1/2 ∥∗ 1 ̂ sample penalty: ∥Σ(M )1/2 ∥∗ = √ ∥M∥∗ . [sent-65, score-0.325]

38 We consider the family of additive models, with regression functions of the form m(X) = (m1 (X), . [sent-74, score-0.269]

39 , mq (X))⊺ = ∑p Mj (Xj ), where each term Mj (Xj ) = (m1 (Xj ), . [sent-77, score-0.132]

40 , mq (Xj ))⊺ is j j=1 j a q-vector of functions evaluated at Xj . [sent-80, score-0.185]

41 Assume that the functions mk (Xj ) j j j all have mean zero; this is required for identifiability in the additive model. [sent-86, score-0.334]

42 As a shorthand, let Σj = Σ(Mj ) = Cov(Mj (Xj )) denote the covariance matrix of the j-th component functions, with ̂ sample version Σj . [sent-87, score-0.116]

43 The population and sample versions of the first penalty are then given by 1/2 1/2 ∥Σ1 ∥∗ + ∥Σ2 ∥∗ + ⋯ + ∥Σ1/2 ∥∗ p (3. [sent-88, score-0.366]

44 2) The second penalty, intuitively, encourages the set of q vector-valued functions (mk , mk , . [sent-91, score-0.214]

45 This penalty is given by 1/2 ∥(Σ1 ⋯Σ1/2 )∥ p (3. [sent-95, score-0.229]

46 The corresponding p 1 population and empirical risk functionals, for the first penalty, are then p p 2 1 1/2 E∥Y − ∑ Mj (X)∥ + λ ∑ ∥Σj ∥∗ 2 2 j=1 j=1 (3. [sent-98, score-0.21]

47 Some straightforward calculation shows that 1/2 1/2 1/2 the penalties reduce to ∑p ∥Σj ∥∗ = ∑p ∥Bj ∥2 for the first penalty, and ∥Σ1 ⋯Σp ∥∗ = ∥B∥∗ j=1 j=1 for the second. [sent-104, score-0.138]

48 Thus, in the linear case the first penalty is encouraging B to be column-wise sparse, so that many of the Bj s are zero, meaning that Xj doesn’t appear in the fit. [sent-105, score-0.229]

49 The second penalty reduces to the nuclear norm regularization ∥B∥∗ used for high-dimensional reduced-rank regression. [sent-107, score-0.49]

50 4 Subdifferentials for Functional Matrix Norms A key to deriving algorithms for functional low-rank regression is computation of the subdifferenk tials of the penalties. [sent-108, score-0.143]

51 For k each column index j and row index k, fj is a function of a random variable Xj , and we will take expectations with respect to Xj implicitly. [sent-110, score-0.28]

52 We define the inner product between two matrices of functions as p q p k k ⟪F, G⟫ ∶= ∑ ∑ E(fj gj ) = ∑ E(Fj⊺ Gj ) = tr (E(F G⊺ )) , j=1 k=1 (4. [sent-112, score-0.179]

53 In fact, these two norms are dual—for any F , ∣∣∣F ∣∣∣∗ = sup ⟪G, F ⟫ , ∣∣∣G∣∣∣sp ≤1 3 (4. [sent-117, score-0.114]

54 The subdifferential of ∣∣∣F ∣∣∣∗ is the set √ −1 S(F ) ∶= {( E(F F ⊺ )) F + H ∶ ∣∣∣H∣∣∣sp ≤ 1, E(F H ⊺ ) = 0q×q , E(F F ⊺ )H = 0q×p a. [sent-121, score-0.11]

55 The fact that S(F ) contains the subdifferential ∂∣∣∣F ∣∣∣∗ can be proved by comparing our setting (matrices of functions) to the ordinary matrix case; see [9, 7]. [sent-126, score-0.161]

56 Next, let E(F F ⊺ ) = V DV ⊺ be a reduced singular value decomposition, where D is a positive diagonal matrix of size q′ ≤ q. [sent-137, score-0.194]

57 ̃̃ This implies that E(F F ⊺ ) ⋅ E(HH ⊺ ) = 0q×q and so these two symmetric matrices have orthogonal row spans and orthogonal column spans. [sent-145, score-0.126]

58 This gives the subdifferential of penalty 2, defined in (3. [sent-148, score-0.339]

59 We can view the first penalty update as just a special case of the second penalty update. [sent-150, score-0.458]

60 5) ∗ which is clearly just a special case of penalty 2 with a single q-vector of functions instead of p different q-vectors of functions. [sent-153, score-0.282]

61 6) 5 Stationary Conditions and Backfitting Algorithms Returning to the base case of p = 1 covariate, consider the population regularized risk optimization 1 (5. [sent-158, score-0.21]

62 Let E(P P ⊺ ) = U diag(τ )U ⊺ be the singular value decomposition and define √ M = U diag([1 − λ/ τ ]+ )U ⊺ P (5. [sent-178, score-0.106]

63 √ It remains to show that H satisfies the conditions of the subdifferential in (4. [sent-199, score-0.11]

64 The analysis above justifies a backfitting algorithm for estimating a constrained rank additive model with the first penalty, where the objective is p p 2 1 min{ E∥Y − ∑ Mj (Xj )∥ + λ ∑ ∣∣∣Mj ∣∣∣∗ }. [sent-210, score-0.391]

65 9) For a given coordinate j, we form the residual Zj = Y − ∑k≠j Mk , and then compute the projection Pj = E(Zj ∣ Xj ), with singular value decomposition E(Pj Pj⊺ ) = U diag(τ )U ⊺ . [sent-212, score-0.138]

66 This is a Gauss-Seidel procedure that parallels the population backfitting algorithm for S PAM [6]. [sent-215, score-0.148]

67 In the sample version we replace the conditional expectation ̂ Pj = E(Zj ∣ Xj ) by a nonparametric linear smoother, Pj = Sj Zj . [sent-216, score-0.124]

68 6 Excess Risk Bounds The population risk of a q × p regression matrix M (X) = [M1 (X1 )⋯Mp (Xp )] is R(M ) = E∥Y − M (X)1p ∥2 , 2 ̂ with sample version denoted R(M ). [sent-223, score-0.387]

69 The population risk can be reexpressed as ⊺ ⊺ −I Y Y −I R(M ) = tr {( q⊺ ) E [( )( ) ] ( q⊺ )} V (X)⊺ V (X)⊺ DU DU ⊺ ΣY −I = tr {( q⊺ ) ( ⊺ Y DU ΣY V ΣY V −I ) ( q )} ΣV V DU ⊺ ̂ and similarly for the sample risk, with Σn (V ) replacing Σ(V ) ∶= Cov((Y, V (X)⊺ )) above. [sent-225, score-0.354]

70 We can express the remaining “controllable” risk as ⊺ −2Iq 0 Rc (M ) = R(M ) − Ru = tr {( ) Σ(V ) ( q ⊺ )} . [sent-227, score-0.16]

71 1), it holds that ̂ ̂ sup ∥Σ(V ) − Σn (V )∥sp ≤ C sup sup w⊺ (Σ(V ) − Σn (V )) w V V w∈N where N is a 1/2-covering of the unit (q + r)-sphere, which has size ∣N ∣ ≤ 6q+r ≤ 36q ; see [8]. [sent-231, score-0.219]

72 We now assume that the functions vsj (xj ) are uniformly bounded from a Sobolev space of order two. [sent-232, score-0.134]

73 } denote a uniformly bounded, orthonormal basis with respect to L2 [0, 1], and assume that vsj ∈ Hj where ∞ ∞ k=0 k=0 Hj = {fj ∶ fj (xj ) = ∑ ajk ψjk (xj ), ∑ a2 k 4 ≤ K 2 } jk √ for some 0 < K < ∞. [sent-236, score-0.377]

74 ̂ Subtracting Ru − Ru from each of the bracketed differences, we obtain that ̂ ̂ ̂ ̂ ̂ R(M ) − inf R(M ) ≤ [Rc (M ) − Rc (M )] − [Rc (M∗ ) − Rc (M∗ )] M ∈Mn ̂ ≤ 2 sup {Rc (M ) − Rc (M )} M ∈Mn by (6. [sent-247, score-0.105]

75 Let M minimize the empirical risk n ∑i ∥Yi − ∑j Mj (Xij )∥2 over the class Mn . [sent-253, score-0.103]

76 M ∈Mn 7 Application to Gene Expression Data To illustrate the proposed nonparametric reduced rank regression techniques, we consider data on gene expression in E. [sent-255, score-0.624]

77 In this challenge genes were classified as transcription factors (TFs) or target genes (TGs). [sent-257, score-0.158]

78 Our motivation for analyzing these 33 genes is that, according to the gold standard gene regulatory network used for the DREAM 5 challenge, the 6 TFs form the parent set common to two additional TFs, which have the 27 TGs as their child nodes. [sent-260, score-0.214]

79 This means that if we treat the gold standard as a causal network, then up to noise, the functional relationship between X and Y is given by the composition of a map g ∶ R6 → R2 and a map h ∶ R2 → R27 . [sent-262, score-0.166]

80 If g and h are both linear, their composition h ○ g is a linear map of rank no more than 2. [sent-263, score-0.289]

81 As observed in Section 2, such a reduced rank linear model is a special case of an additive model with reduced rank in the sense of penalty 2. [sent-264, score-0.969]

82 More generally, if g is an additive function and h is linear, then h ○ g has rank at most 2 in the sense of penalty 2. [sent-265, score-0.595]

83 Higher rank can in principle occur 1 http://wiki. [sent-266, score-0.246]

84 under functional composition, since even a univariate additive map h ∶ R → Rq may have rank up to q under our penalties (recall that penalty 1 and 2 coincide for univariate maps). [sent-273, score-0.852]

85 The backfitting algorithm of Figure 1 with penalty 1 and a rather aggressive choice of the tuning parameter λ produces the estimates shown in Figure 2, for which we have selected three of the 27 TGs. [sent-274, score-0.255]

86 Under such strong regularization, the 5th column of functions is rank zero and, thus, identically zero. [sent-275, score-0.299]

87 The remaining columns have rank one; the estimated fitted values are scalar multiples of one another. [sent-276, score-0.246]

88 This property characterizes the difference between penalties 1 and 2; in an application of penalty 2, the scalings would have been the same across all functions in a given row. [sent-280, score-0.459]

89 Next, we illustrate a higher-rank solution for penalty 2. [sent-281, score-0.229]

90 Choosing the regularization parameter λ by ten-fold cross-validation gives a fit of rank 5, considerably lower than 27, the maximum possible rank. [sent-282, score-0.278]

91 Under rank five, each row of functions is a linear combination of up to five other, linearly independent rows. [sent-284, score-0.325]

92 Hence, if the causal relationships for these data were indeed additive and low rank, then the true low rank might well be smaller than five. [sent-286, score-0.49]

93 8 Summary This paper introduced two penalties that induce reduced rank fits in multivariate additive nonparametric regression. [sent-287, score-0.708]

94 Under linearity, the penalties specialize to group lasso and nuclear norm penalties for classical reduced rank regression. [sent-288, score-0.854]

95 Examining the subdifferentials of each of these penalties, we developed backfitting algorithms for the two resulting optimization problems that are based on softthresholding of singular values of smoothed residual matrices. [sent-289, score-0.165]

96 The algorithms were demonstrated on a gene expression data set constructed to have a naturally low-rank structure. [sent-290, score-0.124]

97 We also provided a persistence analysis that shows error tending to zero under a scaling assumption on the sample size n and the dimensions q and p of the regression problem. [sent-291, score-0.158]

98 Minimax-optimal rates for sparse additive models over kernel classes via convex programming. [sent-325, score-0.12]

99 Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. [sent-333, score-0.526]

100 How close is the sample covariance matrix to the actual covariance matrix? [sent-336, score-0.151]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sp', 0.396), ('mj', 0.29), ('fj', 0.254), ('rank', 0.246), ('penalty', 0.229), ('rc', 0.187), ('pj', 0.169), ('xj', 0.167), ('mk', 0.161), ('nuclear', 0.147), ('diag', 0.145), ('penalties', 0.138), ('mq', 0.132), ('hj', 0.127), ('hh', 0.124), ('additive', 0.12), ('du', 0.12), ('subdifferential', 0.11), ('tfs', 0.108), ('tgs', 0.108), ('population', 0.107), ('risk', 0.103), ('zj', 0.103), ('regression', 0.096), ('cram', 0.096), ('nonparametric', 0.094), ('gene', 0.085), ('mn', 0.084), ('ru', 0.083), ('norm', 0.082), ('pam', 0.081), ('vsj', 0.081), ('singular', 0.079), ('tting', 0.077), ('back', 0.074), ('sup', 0.073), ('pq', 0.073), ('reduced', 0.064), ('genes', 0.057), ('tr', 0.057), ('cov', 0.056), ('dream', 0.054), ('subdifferentials', 0.054), ('functions', 0.053), ('rq', 0.053), ('matrix', 0.051), ('bj', 0.051), ('gold', 0.048), ('low', 0.048), ('maryam', 0.048), ('functional', 0.047), ('multivariate', 0.046), ('excess', 0.045), ('transcription', 0.044), ('composition', 0.043), ('jk', 0.042), ('response', 0.042), ('op', 0.041), ('parallels', 0.041), ('norms', 0.041), ('fazel', 0.039), ('scalings', 0.039), ('specialize', 0.039), ('expression', 0.039), ('yuan', 0.039), ('tted', 0.038), ('matrices', 0.038), ('sj', 0.036), ('ming', 0.036), ('univariate', 0.036), ('stationary', 0.035), ('dv', 0.035), ('covariance', 0.035), ('regularization', 0.032), ('residual', 0.032), ('dd', 0.032), ('functionals', 0.032), ('scaling', 0.032), ('inf', 0.032), ('covariate', 0.031), ('gj', 0.031), ('orthogonal', 0.031), ('sample', 0.03), ('penalized', 0.03), ('soft', 0.03), ('smoother', 0.029), ('returning', 0.029), ('causal', 0.028), ('decomposition', 0.027), ('justi', 0.027), ('xij', 0.026), ('tuning', 0.026), ('row', 0.026), ('constrained', 0.025), ('yi', 0.025), ('network', 0.024), ('pdimensional', 0.024), ('marbach', 0.024), ('dissertation', 0.024), ('lgorithm', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1

2 0.15250382 319 nips-2012-Sparse Prediction with the $k$-Support Norm

Author: Andreas Argyriou, Rina Foygel, Nathan Srebro

Abstract: We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an 2 penalty. We show that this new k-support norm provides a tighter relaxation than the elastic net and can thus be advantageous in in sparse prediction problems. We also bound the looseness of the elastic net, thus shedding new light on it and providing justification for its use. 1

3 0.14462809 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Author: Benjamin Rolfs, Bala Rajaratnam, Dominique Guillot, Ian Wong, Arian Maleki

Abstract: The 1 -regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing 1 -regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log ε) iteration complexity to reach a tolerance of ε. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned. 1

4 0.1408436 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

5 0.14016154 208 nips-2012-Matrix reconstruction with the local max norm

Author: Rina Foygel, Nathan Srebro, Ruslan Salakhutdinov

Abstract: We introduce a new family of matrix norms, the “local max” norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems. We show that this new family can be used to interpolate between the (weighted or unweighted) trace norm and the more conservative max norm. We test this interpolation on simulated data and on the large-scale Netflix and MovieLens ratings data, and find improved accuracy relative to the existing matrix norms. We also provide theoretical results showing learning guarantees for some of the new norms. 1

6 0.1373938 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

7 0.10709526 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

8 0.10465504 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

9 0.10148942 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

10 0.096216843 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

11 0.095096998 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

12 0.087661222 86 nips-2012-Convex Multi-view Subspace Learning

13 0.083584219 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

14 0.080573373 16 nips-2012-A Polynomial-time Form of Robust Regression

15 0.077346116 327 nips-2012-Structured Learning of Gaussian Graphical Models

16 0.077287711 206 nips-2012-Majorization for CRFs and Latent Likelihoods

17 0.077146336 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

18 0.07291729 34 nips-2012-Active Learning of Multi-Index Function Models

19 0.072534218 32 nips-2012-Active Comparison of Prediction Models

20 0.071517758 188 nips-2012-Learning from Distributions via Support Measure Machines


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.199), (1, 0.056), (2, 0.109), (3, -0.088), (4, 0.027), (5, 0.092), (6, -0.005), (7, 0.022), (8, -0.022), (9, -0.033), (10, -0.004), (11, 0.008), (12, -0.057), (13, -0.026), (14, 0.084), (15, 0.002), (16, 0.13), (17, 0.031), (18, 0.033), (19, -0.096), (20, 0.005), (21, 0.026), (22, -0.04), (23, 0.022), (24, -0.086), (25, -0.038), (26, 0.064), (27, -0.062), (28, -0.001), (29, 0.018), (30, -0.102), (31, 0.032), (32, -0.047), (33, 0.105), (34, 0.012), (35, 0.225), (36, -0.02), (37, -0.157), (38, -0.042), (39, 0.053), (40, -0.12), (41, 0.062), (42, -0.066), (43, 0.085), (44, 0.05), (45, 0.01), (46, 0.055), (47, -0.048), (48, 0.059), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9627589 247 nips-2012-Nonparametric Reduced Rank Regression

Author: Rina Foygel, Michael Horrell, Mathias Drton, John D. Lafferty

Abstract: We propose an approach to multivariate nonparametric regression that generalizes reduced rank regression for linear models. An additive model is estimated for each dimension of a q-dimensional response, with a shared p-dimensional predictor variable. To control the complexity of the model, we employ a functional form of the Ky-Fan or nuclear norm, resulting in a set of function estimates that have low rank. Backfitting algorithms are derived and justified using a nonparametric form of the nuclear norm subdifferential. Oracle inequalities on excess risk are derived that exhibit the scaling behavior of the procedure in the high dimensional setting. The methods are illustrated on gene expression data. 1

2 0.7795673 208 nips-2012-Matrix reconstruction with the local max norm

Author: Rina Foygel, Nathan Srebro, Ruslan Salakhutdinov

Abstract: We introduce a new family of matrix norms, the “local max” norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems. We show that this new family can be used to interpolate between the (weighted or unweighted) trace norm and the more conservative max norm. We test this interpolation on simulated data and on the large-scale Netflix and MovieLens ratings data, and find improved accuracy relative to the existing matrix norms. We also provide theoretical results showing learning guarantees for some of the new norms. 1

3 0.75241297 319 nips-2012-Sparse Prediction with the $k$-Support Norm

Author: Andreas Argyriou, Rina Foygel, Nathan Srebro

Abstract: We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an 2 penalty. We show that this new k-support norm provides a tighter relaxation than the elastic net and can thus be advantageous in in sparse prediction problems. We also bound the looseness of the elastic net, thus shedding new light on it and providing justification for its use. 1

4 0.62402689 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

Author: Tingni Sun, Cun-hui Zhang

Abstract: This paper concerns the problem of matrix completion, which is to estimate a matrix from observations in a small subset of indices. We propose a calibrated spectrum elastic net method with a sum of the nuclear and Frobenius penalties and develop an iterative algorithm to solve the convex minimization problem. The iterative algorithm alternates between imputing the missing entries in the incomplete matrix by the current guess and estimating the matrix by a scaled soft-thresholding singular value decomposition of the imputed matrix until the resulting matrix converges. A calibration step follows to correct the bias caused by the Frobenius penalty. Under proper coherence conditions and for suitable penalties levels, we prove that the proposed estimator achieves an error bound of nearly optimal order and in proportion to the noise level. This provides a unified analysis of the noisy and noiseless matrix completion problems. Simulation results are presented to compare our proposal with previous ones. 1

5 0.62023181 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

Author: Zhihua Zhang, Bojun Tu

Abstract: In this paper we study sparsity-inducing nonconvex penalty functions using L´ vy e processes. We define such a penalty as the Laplace exponent of a subordinator. Accordingly, we propose a novel approach for the construction of sparsityinducing nonconvex penalties. Particularly, we show that the nonconvex logarithmic (LOG) and exponential (EXP) penalty functions are the Laplace exponents of Gamma and compound Poisson subordinators, respectively. Additionally, we explore the concave conjugate of nonconvex penalties. We find that the LOG and EXP penalties are the concave conjugates of negative Kullback-Leiber (KL) distance functions. Furthermore, the relationship between these two penalties is due to asymmetricity of the KL distance. 1

6 0.61340612 86 nips-2012-Convex Multi-view Subspace Learning

7 0.6070767 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

8 0.5769949 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

9 0.5716275 221 nips-2012-Multi-Stage Multi-Task Feature Learning

10 0.55699992 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

11 0.53681171 212 nips-2012-Minimax Multi-Task Learning and a Generalized Loss-Compositional Paradigm for MTL

12 0.51602906 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

13 0.51388115 34 nips-2012-Active Learning of Multi-Index Function Models

14 0.49960762 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

15 0.49959734 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

16 0.49450934 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection

17 0.49414074 254 nips-2012-On the Sample Complexity of Robust PCA

18 0.48029479 16 nips-2012-A Polynomial-time Form of Robust Regression

19 0.46197382 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

20 0.45957074 125 nips-2012-Factoring nonnegative matrices with linear programs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.031), (21, 0.011), (36, 0.01), (38, 0.122), (39, 0.015), (42, 0.011), (54, 0.023), (55, 0.041), (56, 0.036), (68, 0.013), (74, 0.03), (76, 0.47), (80, 0.072), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99371296 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

2 0.99231821 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1

3 0.9912622 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller

Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1

4 0.99049801 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

5 0.98996991 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

6 0.98944736 205 nips-2012-MCMC for continuous-time discrete-state systems

7 0.98806643 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

same-paper 8 0.97591299 247 nips-2012-Nonparametric Reduced Rank Regression

9 0.97291827 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

10 0.96829885 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

11 0.93444467 338 nips-2012-The Perturbed Variation

12 0.927109 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

13 0.925147 142 nips-2012-Generalization Bounds for Domain Adaptation

14 0.91552079 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

15 0.91449291 291 nips-2012-Reducing statistical time-series problems to binary classification

16 0.91299337 41 nips-2012-Ancestor Sampling for Particle Gibbs

17 0.9128961 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

18 0.91018337 95 nips-2012-Density-Difference Estimation

19 0.90941107 327 nips-2012-Structured Learning of Gaussian Graphical Models

20 0.90682328 222 nips-2012-Multi-Task Averaging