nips nips2008 nips2008-155 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Han Liu, Larry Wasserman, John D. Lafferty
Abstract: We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. [sent-2, score-0.519]
2 The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. [sent-3, score-0.106]
3 The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. [sent-4, score-0.872]
4 The methods are illustrated with experiments on synthetic data and gene microarray data. [sent-5, score-0.141]
5 In a multi-category classification problem, it is required to discriminate between the different categories using a set of high-dimensional feature vectors—for instance, classifying the type of tumor in a cancer patient from gene expression data. [sent-7, score-0.257]
6 In a multi-task regression problem, it is of interest to form several regression estimators for related data sets that share common types of covariates—for instance, predicting test scores across different school districts. [sent-8, score-0.221]
7 In other areas, such as multi-channel signal processing, it is of interest to simultaneously decompose multiple signals in terms of a large common overcomplete dictionary, which is a multi-response regression problem. [sent-9, score-0.114]
8 In each case, while the details of the estimators vary from instance to instance, across categories, or tasks, they may share a common sparsity pattern of relevant variables selected from a high-dimensional space. [sent-10, score-0.156]
9 How to find this common sparsity pattern is an interesting learning task. [sent-11, score-0.107]
10 In the parametric setting, progress has been recently made on such problems using regularization based on the sum of supremum norms (Turlach et al. [sent-12, score-0.182]
11 For ∑p (k) (k) (k) (k) (k) example, consider the K-task linear regression problem yi = β0 + j=1 βj xij + ϵi where the superscript k indexes the tasks, and the subscript i = 1, . [sent-15, score-0.272]
12 The sum of sup-norms regularization has the effect of “grouping” the elements in βj such that they can be shrunk towards zero simultaneously. [sent-23, score-0.131]
13 The problems of multi-response (or multivariate) regression and multi-category classification can be viewed as a special case of the multi-task regression problem where tasks share the same design matrix. [sent-24, score-0.198]
14 (2005) and Fornasier and Rauhut (2008) propose the same sum of sup-norms 1 regularization as in (1) for such problems in the linear model setting. [sent-26, score-0.131]
15 (2008) propose the sup-norm support vector machine, demonstrating its effectiveness on gene data. [sent-28, score-0.087]
16 In this paper we develop new methods for nonparametric estimation for such multi-task and multicategory regression and classification problems. [sent-29, score-0.217]
17 Rather than fitting a linear model, we instead estimate smooth functions of the data, and formulate a regularization framework that encourages joint functional sparsity, where the component functions can be different across tasks while sharing a common sparsity pattern. [sent-30, score-0.495]
18 Building on a recently proposed method called sparse additive models, or “SpAM” (Ravikumar et al. [sent-31, score-0.317]
19 , 2007), we propose a convex regularization functional that can be viewed as a nonparametric analog of the sum of sup-norms regularization for linear models. [sent-32, score-0.459]
20 Based on this regularization functional, we develop new models for nonparametric multi-task regression and classification, including multi-task sparse additive models (MT-SpAM), multi-response sparse additive models (MR-SpAM), and sparse multi-category additive logistic regression (SMALR). [sent-33, score-1.26]
21 , Xp ), n let Hj denote the Hilbert subspace L2 (PXj ) of PXj -measurable functions fj (xj ) of the single scalar variable Xj with zero mean, i. [sent-44, score-0.696]
22 , xp ) = α + j fj (xj ), with fj ∈ Hj for j = 1, . [sent-56, score-1.358]
23 1 Multi-task/Multi-response Sparse Additive Models (k) (k) In a K-task regression problem, we have observations {(xi , yi ), i = 1, . [sent-65, score-0.152]
24 , xip )T is a p-dimensional covariate vector, the superscript k indexes tasks and i indexes the i. [sent-74, score-0.191]
25 To encourage common sparsity patterns across different function components, we define the regularization functional ΦK (f ) by ΦK (f ) = p ∑ j=1 (k) max ∥fj ∥. [sent-91, score-0.368]
26 If each fj is a linear function, then ΦK (f ) reduces to the sum of sup-norms regularization term as in (1). [sent-97, score-0.794]
27 We shall employ ΦK (f ) to induce joint functional sparsity in nonparametric multi-task inference. [sent-98, score-0.302]
28 2 Using this regularization functional, the multi-task sparse additive model (MT-SpAM) is formulated as a penalized M-estimator, by framing the following optimization problem { } n K 1 ∑∑ (k) (k) (1) (K) f ,. [sent-99, score-0.446]
29 ,f = arg min (3) Qf (k) (xi , yi ) + λΦK (f ) 2n i=1 f (1) ,. [sent-102, score-0.105]
30 The multi-response sparse additive model (MR-SpAM) has exactly the same formulation as in (3) except that a common design matrix is used across the K different tasks. [sent-112, score-0.321]
31 , xip )T is a p-dimensional predictor vector and yi = (yi , . [sent-120, score-0.112]
32 , yi ) is a (K − 1)dimensional response vector in which at most one element can be one, with all the others being (k) zero. [sent-123, score-0.069]
33 Here, we adopt the common “1-of-K” labeling convention where yi = 1 if xi has category (k) k and yi = 0 otherwise; if all elements of yi are zero, then xi is assigned the K-th category. [sent-124, score-0.342]
34 The multi-category additive logistic regression model is ( ) exp f (k) (x) (k) (4) P(Y = 1 | X = x) = ( ) , k = 1, . [sent-125, score-0.323]
35 , K − 1 ∑K−1 1 + k′ =1 exp f (k′ ) (x) ∑p (k) where f (k) (x) = α(k) + j=1 fj (xj ) has an additive form. [sent-128, score-0.829]
36 , f (K−1) ) to (k) be a discriminant function and pf (x) = P(Y (k) = 1 | X = x) to be the conditional probability of category k given X = x. [sent-132, score-0.283]
37 The logistic regression classifier hf (·) induced by f , which is a mapping (k) from the sample space to the category labels, is simply given by hf (x) = arg maxk=1,. [sent-133, score-0.293]
38 (k) If a variable Xj is irrelevant, then all of the component functions fj are identically zero, for each k = 1, 2, . [sent-137, score-0.696]
39 This motivates the use of the regularization functional ΦK−1 (f ) to zero out (1) (K−1) entire vectors fj = (fj , . [sent-141, score-0.9]
40 Denoting ℓf (x, y) = K−1 ∑ ( y (k) (k) f (x) − log 1 + K−1 ∑ ) exp f (k′ ) (x) k′ =1 k=1 as the multinomial log-loss, the sparse multi-category additive logistic regression estimator (SMALR) is thus formulated as the solution to the optimization problem } { n 1∑ (1) (K−1) f ,. [sent-145, score-0.423]
41 ,f = arg min ℓf (xi , yi ) + λΦK−1 (f ) (5) − n i=1 f (1) ,. [sent-148, score-0.105]
42 ,f (K−1) (k) where fj 3 (k) ∈ Hj for j = 1, . [sent-151, score-0.663]
43 Simultaneous Sparse Backfitting We use a blockwise coordinate descent algorithm to minimize the functional defined in (3). [sent-158, score-0.106]
44 Finally, a finite sample version of the algorithm can be derived by plugging in nonparametric smoothers for these conditional expectations. [sent-161, score-0.134]
45 ∑ (1) (K) (k) (k) (k) For the j th block of component functions fj , . [sent-162, score-0.726]
46 , fj , let Rj = Y (k) − l̸=j fl (Xl ) denote the partial residuals. [sent-165, score-0.663]
47 Assuming all but the functions in the j th block to be fixed, the optimization problem is reduced to [K ( } { )2 ] ∑ 1 (k) (k) (k) (k) (1) (K) E Rj − fj (Xj ) + λ max ∥fj ∥ . [sent-166, score-0.726]
48 Let Pj = E Rj | Xj and sj = ∥Pj ∥, and order the indices according to (k1 ) sj (k2 ) ≥ sj (kK ) ≥ . [sent-178, score-0.705]
49 Then the solution to (6) is given by Pj(ki ) for i > m∗ [ m∗ ] (ki ) (ki ) Pj fj = 1 ∑ (ki′ ) ∗ for i ≤ m∗ . [sent-182, score-0.663]
50 sj −λ m (ki ) s i′ =1 + j (∑ ) (ki′ ) m 1 where m∗ = arg maxm m − λ and [·]+ denotes the positive part. [sent-183, score-0.335]
51 i′ =1 sj (7) Therefore, the optimization problem in (6) is solved by a soft-thresholding operator, given in equation (7), which we shall denote as (1) (K) (fj , . [sent-184, score-0.235]
52 (8) While the proof of this result is lengthy, we sketch the key steps below, which are a functional extension of the subdifferential calculus approach of Fornasier and Rauhut (2008) in the linear setting. [sent-191, score-0.226]
53 The functions fj (k) are solutions to (6) if and only if fj (k) − Pj + λuk vk = 0 (almost (k) surely), for k = 1, . [sent-194, score-1.394]
54 Here the former one denotes the subdifferential of the convex functional ∥ · ∥∞ evaluated at (1) (K) (∥fj ∥, . [sent-207, score-0.197]
55 Next, the following proposition from Rockafellar and Wets (1998) is used to characterize the subdifferential of sup-norms. [sent-212, score-0.091]
56 The subdifferential of ∥ · ∥∞ on RK is { 1 B (1) if x = 0 ∂∥ · ∥∞ x = conv{sign(xk )ek : |xk | = ∥x∥∞ } otherwise. [sent-214, score-0.091]
57 Using Lemma 2 and Lemma 3, the proof of Theorem 1 proceeds by considering three cases for the (1) (K) (k) sup-norm subdifferential evaluated at (∥fj ∥, . [sent-216, score-0.091]
58 The sup-norm is attained precisely at m > 1 entries if only if m is the largest number (∑ ) (k ) m−1 (ki′ ) 1 such that sj m ≥ m−1 −λ . [sent-231, score-0.235]
59 ′ =1 sj i The proof of Theorem 1 then follows from the above lemmas and some calculus. [sent-232, score-0.235]
60 Based on this result, the data version of the soft-thresholding operator is obtained by replacing the conditional (k) (k) (k) (k) (k) (k) expectation Pj = E(Rj | Xj ) by Sj Rj , where Sj is a nonparametric smoother for (k) variable Xj , e. [sent-233, score-0.151]
61 The resulting simultaneous sparse backfitting algorithm for multi-task and multi-response sparse additive models (MT-SpAM and MR-SpAM) is shown in Figure 2. [sent-236, score-0.418]
62 ≥ sj b ; for i > m∗ for i ≤ m∗ ; b b ← fj − mean(fj ) for k = 1, . [sent-254, score-0.898]
63 M ULTI - TASK AND M ULTI - RESPONSE S PAM (k) (k) Input: Data (xi , yi ), i = 1, . [sent-263, score-0.069]
64 Figure 2: The simultaneous sparse backfitting algorithm for MT-SpAM or MR-SpAM. [sent-296, score-0.152]
65 1 Penalized Local Scoring Algorithm for SMALR We now derive a penalized local scoring algorithm for sparse multi-category additive logistic regression (SMALR), which can be viewed as a variant of Newton’s method in function space. [sent-299, score-0.514]
66 At each iteration, a quadratic approximation to the loss is used as a surrogate functional with the regularization term added to induce joint functional sparsity. [sent-300, score-0.372]
67 , 2007) for sparse binary nonparametric logistic regression does not apply. [sent-302, score-0.348]
68 A second-order Lagrange form Taylor expansion to L(f ) at f is then [ ] 1 [ ] L(f ) = L(f ) + E ∇L(f )T (f − f ) + E (f − f )T H(f )(f − f ) (9) 2 for some function f , where the gradient is ∇L(f ) = Y − pf (X) with pf (X) = (pf (Y (1) = b b b ( ) 1 | X), . [sent-309, score-0.446]
69 , pf (Y (K−1) = 1 | X))T , and the Hessian is H(f ) = −diag pf (X) + pf (X)pf (X)T . [sent-312, score-0.669]
70 “ P PK−1 (k′ ) ”” (k) n b(k) Initialize: fj = 0 and α(k) = log b n− n , k = 1, . [sent-322, score-0.663]
71 , K − 1 i=1 yi i=1 k′ =1 yi Iterate until convergence: (k) (1) Compute pf (xi ) ≡ P(Y (k) = 1 | X = xi ) as in (4) for k = 1, . [sent-325, score-0.397]
72 , K − 1; b “ ” P (k) (k) (k) b(k) (2) Calculate the transformed responses Zi = 4 yi − pf (xi ) + α(k) + p fj (xij ) b b j=1 for k = 1, . [sent-328, score-0.955]
73 Figure 3: The penalized local scoring algorithm for SMALR. [sent-341, score-0.091]
74 The solution f that)maximizes the righthand side of (10) is equivalent to the solution ( that minimizes 1 E ∥Z − Af ∥2 where A = (−B)1/2 and Z = A−1 (Y − pf ) + Af . [sent-344, score-0.223]
75 b n 2 Recalling that f (k) = α(k) + auxiliary functional ∑p j=1 (k) fj , equation (9) and Lemma 5 then justify the use of the K−1 [( )2 ] ∑p 1 ∑ ′(k) (k) + λ′ ΦK−1 (f ) (11) E Z − j=1 f (Xj ) 2 k=1 ( ) √ ∑p (k) where Z ′(k) = 4 Y (k) − Pf (Y (k) = 1 | X) + α(k) + j=1 fj (Xj ) and λ′ = 2λ. [sent-345, score-1.432]
76 4 Experiments In this section, we first use simulated data to investigate the performance of the MT-SpAM simultaneous sparse backfitting algorithm. [sent-348, score-0.152]
77 We then apply SMALR to a tumor classification and biomarker identification problem. [sent-349, score-0.177]
78 To choose the regularization parameter λ, we simply use J-fold cross-validation or the GCV score from (Ravikumar et al. [sent-352, score-0.182]
79 1 Synthetic Data We generated n = 100 observations from a 10-dimensional three-task additive model with four ∑4 (k) (k) (k) (k) (k) relevant variables: yi = j=1 fj (xij ) + ϵi , k = 1, 2, 3, where ϵi ∼ N (0, 1); the com(k) ponent functions fj (k) Xj (k) (Wj are plotted in Figure 4. [sent-355, score-1.594]
80 The middle three figures show regularization paths as the parameter λ varies; each curve is a plot of the maximum empirical L1 norm of the component functions for each variable, with the red vertical line representing the selected model using the GCV score. [sent-501, score-0.219]
81 Using the same setup but with one common design matrix, we also compare the quantitative performance of MR-SpAM with MARS (Friedman, 1991), which is a popular method for multi-response additive regression. [sent-503, score-0.197]
82 (The MARS simulations are carried out in R, using the default options of the mars function in the mda library. [sent-505, score-0.094]
83 2 Gene Microarray Data Here we apply the sparse multi-category additive logistic regression model to a microarray dataset for small round blue cell tumors (SRBCT) (Khan et al. [sent-507, score-0.614]
84 The data consist of expression profiles of 2,308 genes (Khan et al. [sent-509, score-0.193]
85 , 2001) with tumors classified into 4 categories: neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL), and the Ewing family of tumors (EWS). [sent-510, score-0.172]
86 The main purpose is to identify important biomarkers, which are a small set of genes that can accurately predict the type of tumor of a patient. [sent-513, score-0.255]
87 (2008) identify 53 genes using the sup-norm support vector machine. [sent-519, score-0.142]
88 , 2008), and use a very simple screening step based on the marginal correlation to first reduce the number of genes to 500. [sent-522, score-0.142]
89 08, and the regularization parameter λ is tuned using 4-fold cross validation. [sent-524, score-0.131]
90 7 lambda Figure 5: SMALR results on gene data: heat map (left), marginal fits (center), and CV score (right). [sent-644, score-0.125]
91 The genes are ordered by hierarchical clustering of their expression profiles. [sent-646, score-0.142]
92 The heatmap clearly shows four block structures for the four tumor categories. [sent-647, score-0.113]
93 This suggests visually that the 20 genes selected are highly informative of the tumor type. [sent-648, score-0.28]
94 Interestingly, only 10 of the 20 identified genes from our method are among the 43 genes selected using the shrunken centroids approach of Tibshirani et al. [sent-656, score-0.467]
95 16 of them are are among the 96 genes selected by neural network approach of Khan et al. [sent-658, score-0.218]
96 5 Discussion and Acknowledgements We have presented new approaches to fitting sparse nonparametric multi-task regression models and sparse multi-category classification models. [sent-661, score-0.374]
97 Recovery algorithms for vector valued data with joint sparsity constraints. [sent-667, score-0.105]
98 Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. [sent-690, score-0.087]
99 Sparse multinomial logistic regression: Fast algorithms and generalization bounds. [sent-697, score-0.074]
100 Diagnosis of multiple cancer types by shrunken centroids of gene expression. [sent-720, score-0.221]
wordName wordTfidf (topN-words)
[('fj', 0.663), ('sj', 0.235), ('pf', 0.223), ('smalr', 0.193), ('additive', 0.166), ('rj', 0.163), ('genes', 0.142), ('regularization', 0.131), ('pj', 0.123), ('tumor', 0.113), ('ki', 0.109), ('functional', 0.106), ('sparse', 0.1), ('xj', 0.095), ('mars', 0.094), ('nonparametric', 0.091), ('subdifferential', 0.091), ('spam', 0.09), ('gene', 0.087), ('tumors', 0.086), ('regression', 0.083), ('sparsity', 0.076), ('khan', 0.075), ('logistic', 0.074), ('yi', 0.069), ('nk', 0.065), ('biomarker', 0.064), ('gcv', 0.064), ('intercepts', 0.064), ('maxm', 0.064), ('qf', 0.064), ('shrunken', 0.064), ('ulti', 0.056), ('microarray', 0.054), ('simultaneous', 0.052), ('ravikumar', 0.052), ('et', 0.051), ('hj', 0.05), ('xij', 0.05), ('penalized', 0.049), ('indexes', 0.046), ('maxk', 0.046), ('lemma', 0.044), ('centroids', 0.043), ('fornasier', 0.043), ('iu', 0.043), ('multicategory', 0.043), ('pxj', 0.043), ('rockafellar', 0.043), ('smoothers', 0.043), ('xip', 0.043), ('scoring', 0.042), ('sub', 0.042), ('tting', 0.039), ('heat', 0.038), ('multiresponse', 0.038), ('rauhut', 0.038), ('zhang', 0.037), ('arg', 0.036), ('xi', 0.036), ('classi', 0.035), ('vk', 0.035), ('hf', 0.034), ('kk', 0.034), ('hang', 0.034), ('turlach', 0.034), ('smoother', 0.033), ('functions', 0.033), ('category', 0.032), ('tasks', 0.032), ('conv', 0.032), ('cv', 0.032), ('wasserman', 0.032), ('xp', 0.032), ('common', 0.031), ('han', 0.03), ('norm', 0.03), ('th', 0.03), ('categories', 0.03), ('joint', 0.029), ('calculus', 0.029), ('hu', 0.029), ('smoothing', 0.029), ('discriminant', 0.028), ('ek', 0.028), ('gj', 0.028), ('tibshirani', 0.027), ('df', 0.027), ('cancer', 0.027), ('covariates', 0.027), ('operator', 0.027), ('back', 0.027), ('selected', 0.025), ('pro', 0.025), ('residuals', 0.025), ('px', 0.025), ('iterate', 0.024), ('univariate', 0.024), ('across', 0.024), ('superscript', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
Author: Han Liu, Larry Wasserman, John D. Lafferty
Abstract: We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
2 0.13959102 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
3 0.12548488 202 nips-2008-Robust Regression and Lasso
Author: Huan Xu, Constantine Caramanis, Shie Mannor
Abstract: We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to 1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a unified robustness perspective. 1
4 0.10773381 156 nips-2008-Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
Author: Vincent Q. Vu, Bin Yu, Thomas Naselaris, Kendrick Kay, Jack Gallant, Pradeep K. Ravikumar
Abstract: We propose a novel hierarchical, nonlinear model that predicts brain activity in area V1 evoked by natural images. In the study reported here brain activity was measured by means of functional magnetic resonance imaging (fMRI), a noninvasive technique that provides an indirect measure of neural activity pooled over a small volume (≈ 2mm cube) of brain tissue. Our model, which we call the V-SPAM model, is based on the reasonable assumption that fMRI measurements reflect the (possibly nonlinearly) pooled, rectified output of a large population of simple and complex cells in V1. It has a hierarchical filtering stage that consists of three layers: model simple cells, model complex cells, and a third layer in which the complex cells are linearly pooled (called “pooled-complex” cells). The pooling stage then obtains the measured fMRI signals as a sparse additive model (SpAM) in which a sparse nonparametric (nonlinear) combination of model complex cell and model pooled-complex cell outputs are summed. Our results show that the V-SPAM model predicts fMRI responses evoked by natural images better than a benchmark model that only provides linear pooling of model complex cells. Furthermore, the spatial receptive fields, frequency tuning and orientation tuning curves of the V-SPAM model estimated for each voxel appears to be consistent with the known properties of V1, and with previous analyses of this data set. A visualization procedure applied to the V-SPAM model shows that most of the nonlinear pooling consists of simple compressive or saturating nonlinearities. 1
5 0.10420334 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
Author: Steven J. Phillips, Miroslav Dudík
Abstract: We apply robust Bayesian decision theory to improve both generative and discriminative learners under bias in class proportions in labeled training data, when the true class proportions are unknown. For the generative case, we derive an entropybased weighting that maximizes expected log likelihood under the worst-case true class proportions. For the discriminative case, we derive a multinomial logistic model that minimizes worst-case conditional log loss. We apply our theory to the modeling of species geographic distributions from presence data, an extreme case of labeling bias since there is no absence data. On a benchmark dataset, we find that entropy-based weighting offers an improvement over constant estimates of class proportions, consistently reducing log loss on unbiased test data. 1
6 0.099545762 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization
7 0.099005707 143 nips-2008-Multi-label Multiple Kernel Learning
8 0.097342454 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
9 0.096437104 62 nips-2008-Differentiable Sparse Coding
10 0.091053829 235 nips-2008-The Infinite Hierarchical Factor Regression Model
11 0.086420253 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule
12 0.085209057 179 nips-2008-Phase transitions for high-dimensional joint support recovery
13 0.078898042 99 nips-2008-High-dimensional support union recovery in multivariate regression
14 0.073952429 226 nips-2008-Supervised Dictionary Learning
15 0.069432482 194 nips-2008-Regularized Learning with Networks of Features
16 0.068899751 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
17 0.065307587 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
18 0.064734243 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
19 0.064687297 214 nips-2008-Sparse Online Learning via Truncated Gradient
20 0.060923725 113 nips-2008-Kernelized Sorting
topicId topicWeight
[(0, -0.194), (1, -0.047), (2, -0.071), (3, 0.075), (4, 0.093), (5, 0.01), (6, -0.065), (7, 0.044), (8, -0.044), (9, 0.11), (10, -0.012), (11, -0.047), (12, -0.011), (13, -0.091), (14, 0.031), (15, -0.023), (16, -0.052), (17, -0.098), (18, 0.012), (19, 0.028), (20, -0.0), (21, 0.149), (22, -0.128), (23, -0.091), (24, -0.036), (25, 0.003), (26, 0.11), (27, 0.057), (28, -0.002), (29, -0.073), (30, 0.049), (31, -0.107), (32, 0.072), (33, 0.168), (34, -0.031), (35, -0.032), (36, 0.014), (37, -0.056), (38, -0.151), (39, 0.033), (40, -0.076), (41, -0.049), (42, 0.05), (43, -0.002), (44, -0.051), (45, -0.073), (46, -0.067), (47, -0.089), (48, 0.072), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.93806255 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
Author: Han Liu, Larry Wasserman, John D. Lafferty
Abstract: We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
2 0.6415382 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
3 0.58250219 9 nips-2008-A mixture model for the evolution of gene expression in non-homogeneous datasets
Author: Gerald Quon, Yee W. Teh, Esther Chan, Timothy Hughes, Michael Brudno, Quaid D. Morris
Abstract: We address the challenge of assessing conservation of gene expression in complex, non-homogeneous datasets. Recent studies have demonstrated the success of probabilistic models in studying the evolution of gene expression in simple eukaryotic organisms such as yeast, for which measurements are typically scalar and independent. Models capable of studying expression evolution in much more complex organisms such as vertebrates are particularly important given the medical and scientific interest in species such as human and mouse. We present Brownian Factor Phylogenetic Analysis, a statistical model that makes a number of significant extensions to previous models to enable characterization of changes in expression among highly complex organisms. We demonstrate the efficacy of our method on a microarray dataset profiling diverse tissues from multiple vertebrate species. We anticipate that the model will be invaluable in the study of gene expression patterns in other diverse organisms as well, such as worms and insects. 1
4 0.53811026 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization
Author: Tong Zhang
Abstract: We study learning formulations with non-convex regularizaton that are natural for sparse linear models. There are two approaches to this problem: • Heuristic methods such as gradient descent that only find a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. • Convex relaxation such as L1 -regularization that solves the problem under some conditions. However it often leads to sub-optimal sparsity in reality. This paper tries to remedy the above gap between theory and practice. In particular, we investigate a multi-stage convex relaxation scheme for solving problems with non-convex regularization. Theoretically, we analyze the behavior of a resulting two-stage relaxation scheme for the capped-L1 regularization. Our performance bound shows that the procedure is superior to the standard L1 convex relaxation for learning sparse targets. Experiments confirm the effectiveness of this method on some simulation and real data. 1
5 0.52841175 156 nips-2008-Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
Author: Vincent Q. Vu, Bin Yu, Thomas Naselaris, Kendrick Kay, Jack Gallant, Pradeep K. Ravikumar
Abstract: We propose a novel hierarchical, nonlinear model that predicts brain activity in area V1 evoked by natural images. In the study reported here brain activity was measured by means of functional magnetic resonance imaging (fMRI), a noninvasive technique that provides an indirect measure of neural activity pooled over a small volume (≈ 2mm cube) of brain tissue. Our model, which we call the V-SPAM model, is based on the reasonable assumption that fMRI measurements reflect the (possibly nonlinearly) pooled, rectified output of a large population of simple and complex cells in V1. It has a hierarchical filtering stage that consists of three layers: model simple cells, model complex cells, and a third layer in which the complex cells are linearly pooled (called “pooled-complex” cells). The pooling stage then obtains the measured fMRI signals as a sparse additive model (SpAM) in which a sparse nonparametric (nonlinear) combination of model complex cell and model pooled-complex cell outputs are summed. Our results show that the V-SPAM model predicts fMRI responses evoked by natural images better than a benchmark model that only provides linear pooling of model complex cells. Furthermore, the spatial receptive fields, frequency tuning and orientation tuning curves of the V-SPAM model estimated for each voxel appears to be consistent with the known properties of V1, and with previous analyses of this data set. A visualization procedure applied to the V-SPAM model shows that most of the nonlinear pooling consists of simple compressive or saturating nonlinearities. 1
6 0.52141654 202 nips-2008-Robust Regression and Lasso
7 0.51455331 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
8 0.48454282 226 nips-2008-Supervised Dictionary Learning
9 0.46480933 62 nips-2008-Differentiable Sparse Coding
10 0.4567948 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
11 0.43160826 75 nips-2008-Estimating vector fields using sparse basis field expansions
12 0.41830027 25 nips-2008-An interior-point stochastic approximation method and an L1-regularized delta rule
13 0.41805306 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
14 0.40663719 235 nips-2008-The Infinite Hierarchical Factor Regression Model
15 0.39882934 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
16 0.39195621 185 nips-2008-Privacy-preserving logistic regression
17 0.38710499 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
18 0.38112503 143 nips-2008-Multi-label Multiple Kernel Learning
19 0.36123058 149 nips-2008-Near-minimax recursive density estimation on the binary hypercube
20 0.357869 153 nips-2008-Nonlinear causal discovery with additive noise models
topicId topicWeight
[(4, 0.016), (6, 0.101), (7, 0.084), (12, 0.037), (15, 0.031), (28, 0.142), (46, 0.258), (57, 0.058), (63, 0.038), (71, 0.019), (77, 0.039), (83, 0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.8167485 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
Author: Han Liu, Larry Wasserman, John D. Lafferty
Abstract: We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
2 0.63634634 62 nips-2008-Differentiable Sparse Coding
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
3 0.62887698 75 nips-2008-Estimating vector fields using sparse basis field expansions
Author: Stefan Haufe, Vadim V. Nikulin, Andreas Ziehe, Klaus-Robert Müller, Guido Nolte
Abstract: We introduce a novel framework for estimating vector fields using sparse basis field expansions (S-FLEX). The notion of basis fields, which are an extension of scalar basis functions, arises naturally in our framework from a rotational invariance requirement. We consider a regression setting as well as inverse problems. All variants discussed lead to second-order cone programming formulations. While our framework is generally applicable to any type of vector field, we focus in this paper on applying it to solving the EEG/MEG inverse problem. It is shown that significantly more precise and neurophysiologically more plausible location and shape estimates of cerebral current sources from EEG/MEG measurements become possible with our method when comparing to the state-of-the-art. 1
4 0.62815136 202 nips-2008-Robust Regression and Lasso
Author: Huan Xu, Constantine Caramanis, Shie Mannor
Abstract: We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to 1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a unified robustness perspective. 1
5 0.62489104 194 nips-2008-Regularized Learning with Networks of Features
Author: Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar
Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature’s weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge significantly improve prediction accuracy. 1
6 0.62449169 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
7 0.62110269 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
8 0.62057287 143 nips-2008-Multi-label Multiple Kernel Learning
9 0.61923975 156 nips-2008-Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images
10 0.61785668 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
11 0.6167528 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
12 0.61669225 226 nips-2008-Supervised Dictionary Learning
13 0.61617053 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
14 0.61308593 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
15 0.61266112 196 nips-2008-Relative Margin Machines
16 0.61229885 54 nips-2008-Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform
17 0.61148691 182 nips-2008-Posterior Consistency of the Silverman g-prior in Bayesian Model Choice
18 0.61132056 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
19 0.61130416 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
20 0.61124712 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning