nips nips2010 nips2010-147 knowledge-graph by maker-knowledge-mining

147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty


Source: pdf

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. [sent-5, score-0.604]

2 Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. [sent-6, score-0.44]

3 Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. [sent-7, score-1.204]

4 To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. [sent-9, score-1.032]

5 We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. [sent-10, score-0.219]

6 1 Introduction Learning multiple tasks has been studied for more than a decade [6, 24, 11]. [sent-12, score-0.258]

7 Research in the following two directions has drawn considerable interest: learning a common feature representation shared by tasks [1, 12, 30, 2, 3, 9, 23], and directly inferring the relatedness of tasks [4, 26, 21, 29]. [sent-13, score-0.603]

8 Both have a natural interpretation if we view learning multiple tasks as estimating a matrix of model parameters, where the rows and columns correspond to tasks and features. [sent-14, score-0.543]

9 From this perspective, learning the feature structure corresponds to discovering the structure of the columns in the parameter matrix, and modeling the task relatedness aims to find and utilize the relations among rows. [sent-15, score-0.337]

10 Regularization methods have shown promising results in finding either feature or task structure [1, 2, 12, 21]. [sent-16, score-0.238]

11 The key contribution is a matrix-normal penalty with sparse inverse covariances, which provides a framework for characterizing and coupling the model parameters of related tasks. [sent-18, score-0.297]

12 Following the matrix normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row and column covariances, which correspond to task and feature structures in multi-task learning. [sent-19, score-1.06]

13 To address overfitting and select task and feature structures, we incorporate sparse covariance selection techniques into our matrix-normal regularization framework via ℓ1 penalties on task and feature inverse covariances. [sent-20, score-1.032]

14 We compare the proposed method to related models on two real-world data sets: detecting landmines in multiple fields and recognizing faces between different subjects. [sent-21, score-0.219]

15 For joint learning of multiple tasks, connections need to be established to couple related tasks. [sent-23, score-0.102]

16 One direction is to find a common feature structure shared by tasks. [sent-24, score-0.176]

17 Along this direction, researchers proposed to infer task structure via principal components [1, 12], independent components [30] and covariance [2, 3] in the parameter space, to select a common subset of features [9, 23], as well as to use shared hidden nodes in neural networks [6, 11]. [sent-25, score-0.534]

18 Specifically, learning a shared feature covariance for model parameters [2] is a special case of our proposed framework. [sent-26, score-0.38]

19 On the other hand, assuming models of all tasks are equally similar is risky. [sent-27, score-0.18]

20 Researchers recently began exploring methods to infer the relatedness of tasks. [sent-28, score-0.108]

21 The present paper uses the matrix normal density and ℓ1-regularized sparse covariance selection to specify a structured penalty, which provides a systematic way to characterize and select both task and feature structures in multiple parametric models. [sent-30, score-0.978]

22 Matrix normal distributions have been studied in probability and statistics for several decades [13, 16, 18] and applied to predictive modeling in the Bayesian literature. [sent-31, score-0.131]

23 For example, the standard matrix normal can serve as a prior for Bayesian variable selection in multivariate regression [9], where MCMC is used for sampling from the resulting posterior. [sent-32, score-0.314]

24 Recently, matrix normal distributions have also been used in nonparametric Bayesian approaches, especially in learning Gaussian Processes (GPs) for multi-output prediction [7] and collaborative filtering [27, 28]. [sent-33, score-0.208]

25 In this case, the covariance function of the GP prior is decomposed as the Kronecker product of a covariance over functions and a covariance over examples. [sent-34, score-0.777]

26 We note that the proposed matrix-normal penalty with sparse inverse covariances in this paper can also be viewed as a new matrix-variate prior, upon which Bayesian inference can be performed. [sent-35, score-0.371]

27 1 Definition The matrix-variate normal distribution is one of the most widely studied matrix-variate distributions [18, 13, 16]. [sent-38, score-0.131]

28 Since we can vectorize W to be a mp × 1 vector, the normal distribution on a matrix W can be considered as a multivariate normal distribution on a vector of mp dimensions. [sent-40, score-0.594]

29 However, such an ordinary multivariate distribution ignores the special structure of W as an m × p matrix, and as a result, the covariance characterizing the elements of W is of size mp × mp. [sent-41, score-0.425]

30 2 Maximum likelihood estimation (MLE) Consider a set of n samples {Wi }n where each Wi is a m×p matrix generated by a matrix-variate i=1 normal distribution as eq. [sent-46, score-0.237]

31 If (Ω∗ , Σ∗ ) is an MLE estimate for the row and column covariances, for any α > 0, 1 (αΩ∗ , α Σ∗ ) will lead to the same log density and thus is also an MLE estimate. [sent-51, score-0.136]

32 Classical regularization penalties (for single-task learning) can be interpreted as assuming a multivariate prior distribution on the parameter vector and performing maximum-a-posterior estimation, e. [sent-55, score-0.194]

33 , ℓ2 penalty and ℓ1 penalty correspond to multivariate Gaussian and Laplacian priors, respectively. [sent-57, score-0.321]

34 In this section, we propose a matrix-normal penalty with sparse inverse covariances for learning multiple related tasks. [sent-59, score-0.42]

35 1 we start with learning multiple tasks with a matrix-normal penalty. [sent-61, score-0.229]

36 2 we study how to incorporate sparse covariance selection into our framework by further imposing ℓ1 penalties on task and feature inverse covariances. [sent-63, score-0.709]

37 1 Learning with a Matrix Normal Penalty Consider a multi-task learning problem with m tasks in a p-dimensional feature space. [sent-68, score-0.267]

38 The training (t) (t) sets are {Dt }m , where each set Dt contains nt examples {(xi , yi )}nt . [sent-69, score-0.15]

39 We want to learn t=1 i=1 m models for the m tasks but appropriately share knowledge among tasks. [sent-70, score-0.212]

40 Model parameters are represented by an m × p matrix W, where parameters for a task correspond to a row. [sent-71, score-0.223]

41 When we fix Ω = Im and Σ = Ip , the penalty term can be decomposed into standard ℓ2-norm penalties on the m rows of W. [sent-82, score-0.253]

42 In this case, the m tasks in (5) can be learned almost independently using single-task ℓ2 regularization (but tasks are still tied by sharing the parameter λ). [sent-83, score-0.434]

43 When we fix Ω = Im , tasks are linked only by a shared feature covariance Σ. [sent-84, score-0.56]

44 This corresponds to a multi-task feature learning framework [2, 3] which optimizes eq. [sent-85, score-0.116]

45 When we fix Σ = Ip , tasks are coupled only by a task similarity matrix Ω. [sent-90, score-0.41]

46 W and Ω, with additional constraints on the singular values of Ω that are motivated and derived from task clustering. [sent-95, score-0.14]

47 3 We usually do not know task and feature structures in advance. [sent-98, score-0.255]

48 When Ω has a sparse inverse, task pairs corresponding to zero entries in Ω−1 will not be explicitly coupled in the penalty of (6). [sent-118, score-0.349]

49 Also, note that a clustering of tasks can be expressed by block-wise sparsity of Ω−1 . [sent-120, score-0.21]

50 Covariance selection aims to select nonzero entries in the Gaussian inverse covariance and discover conditional independence between variables (indicated by zero entries in the inverse covariance) [14, 5, 17, 15]. [sent-121, score-0.592]

51 (6) enables us to perform sparse covariance selection to regularize and select task and feature structures. [sent-123, score-0.608]

52 Due to the property of matrix normal distributions that only Σ ⊗ Ω is identifiable, we can safely reduce the complexity of choosing regularization parameters by considering the restriction: λΩ = λΣ (12) The following lemma proves that restricting λΩ and λΣ to be equal in eq. [sent-128, score-0.34]

53 Step 2) needs to solve ℓ1 regularized covariance selection problems as (11). [sent-143, score-0.298]

54 We use the state of the art technique [17], but more efficient optimization for large covariances is still desirable. [sent-144, score-0.114]

55 For example, off-diagonal entries of task covariance Ω characterize the task similarity; diagonal entries indicate different amounts of regularization on tasks, which may be fixed as a constant if we prefer tasks to be equally regularized. [sent-154, score-0.853]

56 As a result, the “flip-flop” algorithm in (11) needs to solve ℓ1 penalized covariance selection with equality constraints (15) or (16), where the dual block coordinate descent [5] and graphical lasso [17] are no longer directly applicable. [sent-171, score-0.381]

57 We will study this direction (efficient constrained sparse covariance selection) in the future work. [sent-173, score-0.3]

58 5 Empirical Studies In this section, we present our empirical studies on a landmine detection problem and a face recognition problem, where multiple tasks correspond to detecting landmines at different landmine fields and classifying faces between different subjects, respectively. [sent-174, score-1.012]

59 1 Data Sets and Experimental Settings The landmine detection data set from [26] contains examples collected from different landmine fields. [sent-178, score-0.476]

60 Each example in the data set is represented by a 9-dimensional feature vector extracted from radar imaging, which includes moment-based features, correlation-based features, an energy ratio feature and a spatial variance feature. [sent-179, score-0.174]

61 Following [26], we jointly learn 19 tasks from landmine fields 1 − 10 and 19 − 24 in the data set. [sent-181, score-0.431]

62 As a result, the model parameters W are a 19 × 10 matrix, corresponding to 19 tasks and 10 coefficients (including the intercept) for each task. [sent-182, score-0.18]

63 Therefore, we use the average AUC (Area Under the ROC Curve) over 19 tasks as the performance measure. [sent-184, score-0.18]

64 We vary the size of the training set for each task as 30, 40, 80 and 160. [sent-185, score-0.169]

65 Note that we intentionally keep the training sets small because the need for cross-task learning diminishes as the training set becomes large relative to the number of parameters being learned. [sent-186, score-0.106]

66 For each training set size, we randomly select training examples for each task and the rest is used as the testing set. [sent-187, score-0.268]

67 The face recognition data set is the Yale face database, which contains 165 images of 15 subjects. [sent-194, score-0.183]

68 Choice of features is important for face recognition problems. [sent-202, score-0.107]

69 In each random run, we extract 30 orthogonal Laplacianfaces using the selected training set of all 8 subjects2 , and conduct experiments of all 28 classification tasks in the extracted feature space. [sent-204, score-0.351]

70 STL: learn ℓ2 regularized logistic regression for each task separately. [sent-208, score-0.148]

71 MTL-C: clustered multi-task learning [21], which encourages task clustering in regularization. [sent-209, score-0.197]

72 MTL-F: multi-task feature learning [2], which corresponds to fixing the task covariance Ω as Im and optimizing (6) with only the feature covariance Σ. [sent-213, score-0.768]

73 MTL(Ω&Ip; ): learn W and task covariance Ω using (9), with feature covariance Σ fixed as Ip . [sent-215, score-0.713]

74 MTL(Im &Σ): learn W and feature covariance Σ using (9), with task covariance Ω fixed as Im . [sent-216, score-0.713]

75 MTL(Ω&Σ): learn W, Ω and Σ using (9), inferring both task and feature structures. [sent-217, score-0.273]

76 15) Table 1: Average AUC scores (%) on landmine detection: means (and standard errors) over 30 random runs. [sent-296, score-0.219]

77 For each column, the best model is marked with ∗ and competitive models (by paired t-tests) are shown in bold. [sent-297, score-0.112]

78 3 Results on Landmine Detection The results on landmine detection are shown in Table 1. [sent-301, score-0.257]

79 For small training sizes, restricted Ω and Σ (Ωii = Σjj = 1) offer better prediction; for large training size (160 per task), free Ω and Σ give the best performance. [sent-311, score-0.136]

80 , even the simplest coupling among tasks (by sharing λ) can be helpful when the size of training data is small. [sent-315, score-0.273]

81 Consider the performance of MTL(Ω&Ip; ) and MTL(Im &Σ), which learn either a task structure or a feature structure. [sent-316, score-0.27]

82 , 30 or 40), coupling by task similarity is more effective, and as the training size increases, learning a common feature representation is more helpful. [sent-319, score-0.333]

83 MTL(Ω&Σ)Ωii =1 performs similarly to MTL(Ω&Σ)Ωii =Σjj =1 , indicating no significant variation of feature importance in this problem. [sent-323, score-0.117]

84 4 Results on Face Recognition Empirical results on face recognition are shown in Table 2, with the best model in each column marked with ∗ and competitive models displayed in bold. [sent-325, score-0.229]

85 One possible explanation is that, since tasks are to classify faces between different subjects, there may not be a clustered structure over tasks and thus a cluster norm will be inappropriate. [sent-327, score-0.49]

86 In this case, using a task similarity matrix may be more appropriate than clustering over tasks. [sent-328, score-0.26]

87 Compared to MTL(Ω&Σ), MTL(Ω&Σ)Ωii =1 imposes restrictions on diagonal entries of task covariance Ω: all tasks seem to be similarly difficult and should be equally regularized. [sent-330, score-0.64]

88 Compared to MTL(Ω&Σ)Ωii =Σjj =1 , MTL(Ω&Σ)Ωii =1 allows the diagonal entries of feature covariance Σ to capture varying degrees of importance of Laplacianfaces. [sent-331, score-0.376]

89 34)∗ Table 2: Average classification errors (%) on face recognition: means (and standard errors) over 30 random runs. [sent-386, score-0.111]

90 For each column, the best model is marked with ∗ and competitive models (by paired t-tests) are shown in bold. [sent-387, score-0.112]

91 6 Conclusion We propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. [sent-388, score-0.604]

92 The proposed framework provides an effective and flexible way to characterize and select both task and feature structures for learning multiple tasks. [sent-389, score-0.378]

93 Several recently proposed methods can be viewed as variants of the special cases of our formulation and our empirical results on landmine detection and face recognition show that we consistently outperform previous methods. [sent-390, score-0.364]

94 The first part is the empirical loss on training examples, depending only on W (and training data). [sent-406, score-0.162]

95 The second part is the log-density of matrix normal distributions, which depends on W and Σ ⊗ Ω. [sent-407, score-0.232]

96 (9) are not changed: 1) W′ = W so the first part remains unchanged; 2) Σ′ ⊗ Ω′ = Σ ⊗ Ω so the second part of the matrix normal log-density is the same; 3) by our construction, the third part is not changed. [sent-411, score-0.28]

97 A framework for learning predictive structures from multiple tasks and unlabeled data. [sent-417, score-0.281]

98 Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. [sent-443, score-0.167]

99 Joint covariate selection and joint subspace selection for multiple classification problems. [sent-565, score-0.167]

100 Learning multiple related tasks using latent independent component analysis. [sent-611, score-0.229]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mtl', 0.609), ('covariance', 0.239), ('im', 0.231), ('landmine', 0.219), ('jj', 0.191), ('tasks', 0.18), ('laplacianfaces', 0.146), ('normal', 0.131), ('obj', 0.128), ('penalty', 0.122), ('task', 0.116), ('covariances', 0.114), ('mp', 0.104), ('mle', 0.104), ('ii', 0.098), ('landmines', 0.097), ('kronecker', 0.096), ('tr', 0.089), ('feature', 0.087), ('wt', 0.079), ('op', 0.078), ('matrix', 0.077), ('face', 0.076), ('ip', 0.075), ('auc', 0.074), ('regularization', 0.074), ('inverse', 0.074), ('stl', 0.073), ('penalties', 0.073), ('nt', 0.068), ('relatedness', 0.064), ('sparse', 0.061), ('selection', 0.059), ('shared', 0.054), ('couple', 0.053), ('training', 0.053), ('structures', 0.052), ('wi', 0.051), ('clustered', 0.051), ('entries', 0.05), ('multiple', 0.049), ('multivariate', 0.047), ('select', 0.046), ('column', 0.046), ('faces', 0.044), ('infer', 0.044), ('determinant', 0.043), ('marked', 0.042), ('ec', 0.041), ('coupling', 0.04), ('bonilla', 0.039), ('detection', 0.038), ('inferring', 0.038), ('similarity', 0.037), ('paired', 0.036), ('errors', 0.035), ('glasso', 0.035), ('structure', 0.035), ('equality', 0.034), ('subjects', 0.034), ('competitive', 0.034), ('density', 0.033), ('safely', 0.033), ('avg', 0.033), ('learn', 0.032), ('loss', 0.032), ('recognition', 0.031), ('yu', 0.031), ('orthogonal', 0.031), ('tth', 0.031), ('formula', 0.031), ('decomposed', 0.031), ('per', 0.03), ('indicating', 0.03), ('restrictions', 0.03), ('clustering', 0.03), ('correspond', 0.03), ('row', 0.03), ('product', 0.029), ('yi', 0.029), ('samples', 0.029), ('detecting', 0.029), ('optimizes', 0.029), ('argyriou', 0.029), ('decade', 0.029), ('runs', 0.028), ('tresp', 0.028), ('characterize', 0.028), ('log', 0.027), ('rows', 0.027), ('bayesian', 0.027), ('elds', 0.026), ('classi', 0.025), ('lasso', 0.025), ('imposes', 0.025), ('lemma', 0.025), ('decomposes', 0.024), ('constraints', 0.024), ('part', 0.024), ('exible', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

2 0.41986254 146 nips-2010-Learning Multiple Tasks using Manifold Regularization

Author: Arvind Agarwal, Samuel Gerber, Hal Daume

Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1

3 0.23196597 138 nips-2010-Large Margin Multi-Task Metric Learning

Author: Shibin Parameswaran, Kilian Q. Weinberger

Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1

4 0.16409305 217 nips-2010-Probabilistic Multi-Task Feature Selection

Author: Yu Zhang, Dit-Yan Yeung, Qian Xu

Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1

5 0.12430516 177 nips-2010-Multitask Learning without Label Correspondences

Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan

Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1

6 0.08658056 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

7 0.083225578 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

8 0.076369494 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

9 0.076042011 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

10 0.075716011 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

11 0.073725037 287 nips-2010-Worst-Case Linear Discriminant Analysis

12 0.072603188 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

13 0.069970898 162 nips-2010-Link Discovery using Graph Feature Tracking

14 0.069276661 103 nips-2010-Generating more realistic images using gated MRF's

15 0.069180928 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

16 0.068990469 5 nips-2010-A Dirty Model for Multi-task Learning

17 0.068638898 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

18 0.06698031 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

19 0.065195926 23 nips-2010-Active Instance Sampling via Matrix Partition

20 0.064086974 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.231), (1, 0.066), (2, 0.047), (3, -0.012), (4, 0.041), (5, -0.081), (6, 0.048), (7, -0.003), (8, -0.193), (9, -0.068), (10, 0.029), (11, 0.094), (12, 0.202), (13, -0.004), (14, 0.121), (15, 0.015), (16, 0.022), (17, -0.093), (18, -0.018), (19, 0.015), (20, -0.335), (21, -0.193), (22, 0.023), (23, 0.121), (24, -0.077), (25, -0.085), (26, -0.208), (27, -0.063), (28, 0.116), (29, 0.147), (30, -0.025), (31, -0.003), (32, 0.073), (33, -0.035), (34, 0.01), (35, -0.037), (36, -0.063), (37, -0.016), (38, -0.024), (39, -0.067), (40, -0.039), (41, -0.054), (42, 0.069), (43, -0.081), (44, -0.034), (45, 0.029), (46, -0.101), (47, 0.042), (48, 0.047), (49, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92616993 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

2 0.90280545 146 nips-2010-Learning Multiple Tasks using Manifold Regularization

Author: Arvind Agarwal, Samuel Gerber, Hal Daume

Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1

3 0.63715589 217 nips-2010-Probabilistic Multi-Task Feature Selection

Author: Yu Zhang, Dit-Yan Yeung, Qian Xu

Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1

4 0.60545766 138 nips-2010-Large Margin Multi-Task Metric Learning

Author: Shibin Parameswaran, Kilian Q. Weinberger

Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1

5 0.59402138 177 nips-2010-Multitask Learning without Label Correspondences

Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan

Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1

6 0.49402642 114 nips-2010-Humans Learn Using Manifolds, Reluctantly

7 0.41789216 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

8 0.41029653 5 nips-2010-A Dirty Model for Multi-task Learning

9 0.37554032 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans

10 0.37548539 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

11 0.37248567 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods

12 0.36107326 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference

13 0.35852313 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

14 0.35464677 287 nips-2010-Worst-Case Linear Discriminant Analysis

15 0.34925288 99 nips-2010-Gated Softmax Classification

16 0.34726191 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

17 0.32849583 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

18 0.32247373 108 nips-2010-Graph-Valued Regression

19 0.31560296 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

20 0.31343842 158 nips-2010-Learning via Gaussian Herding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.099), (17, 0.012), (27, 0.044), (30, 0.038), (35, 0.026), (45, 0.176), (50, 0.434), (52, 0.027), (60, 0.023), (77, 0.019), (90, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92737997 120 nips-2010-Improvements to the Sequence Memoizer

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

2 0.92624557 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models

Author: Danny Bickson, Carlos Guestrin

Abstract: Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy-tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L´ vy and Gaussian distrie butions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise. 1

3 0.91062987 101 nips-2010-Gaussian sampling by local perturbations

Author: George Papandreou, Alan L. Yuille

Abstract: We present a technique for exact simulation of Gaussian Markov random fields (GMRFs), which can be interpreted as locally injecting noise to each Gaussian factor independently, followed by computing the mean/mode of the perturbed GMRF. Coupled with standard iterative techniques for the solution of symmetric positive definite systems, this yields a very efficient sampling algorithm with essentially linear complexity in terms of speed and memory requirements, well suited to extremely large scale probabilistic models. Apart from synthesizing data under a Gaussian model, the proposed technique directly leads to an efficient unbiased estimator of marginal variances. Beyond Gaussian models, the proposed algorithm is also very useful for handling highly non-Gaussian continuously-valued MRFs such as those arising in statistical image modeling or in the first layer of deep belief networks describing real-valued data, where the non-quadratic potentials coupling different sites can be represented as finite or infinite mixtures of Gaussians with the help of local or distributed latent mixture assignment variables. The Bayesian treatment of such models most naturally involves a block Gibbs sampler which alternately draws samples of the conditionally independent latent mixture assignments and the conditionally multivariate Gaussian continuous vector and we show that it can directly benefit from the proposed methods. 1

4 0.89169759 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

same-paper 5 0.87361223 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Author: Yi Zhang, Jeff G. Schneider

Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.

6 0.80491865 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

7 0.66370684 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

8 0.66346759 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

9 0.65819782 54 nips-2010-Copula Processes

10 0.64608341 113 nips-2010-Heavy-Tailed Process Priors for Selective Shrinkage

11 0.64396781 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models

12 0.6293416 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

13 0.62574077 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

14 0.62002712 96 nips-2010-Fractionally Predictive Spiking Neurons

15 0.62001097 217 nips-2010-Probabilistic Multi-Task Feature Selection

16 0.61819208 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

17 0.6062535 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

18 0.60149109 158 nips-2010-Learning via Gaussian Herding

19 0.60035515 117 nips-2010-Identifying graph-structured activation patterns in networks

20 0.59782612 257 nips-2010-Structured Determinantal Point Processes