nips nips2010 nips2010-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. [sent-7, score-0.849]
2 This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. [sent-8, score-0.242]
3 One proposed method uses the projection distance from the manifold to regularize the task parameters. [sent-9, score-0.846]
4 The manifold structure and the task parameters are learned using an alternating optimization framework. [sent-10, score-0.734]
5 When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. [sent-11, score-0.747]
6 An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. [sent-12, score-0.713]
7 1 Introduction Recently, it has been shown that learning multiple tasks together helps learning [8, 19, 9] when the tasks are related, and one is able to use an appropriate notion of task relatedness. [sent-14, score-0.471]
8 This notion of relatedness is usually incorporated in the form of a regularizer [4, 16, 13] or a prior [15, 22, 21]. [sent-17, score-0.256]
9 In this work we present a novel approach for multitask learning (MTL) that considers a notion of relatedness based on ideas from manifold regularization1 . [sent-18, score-0.78]
10 Our approach is based on the assumption that the parameters of related tasks can not vary arbitrarily but rather lie on a low dimensional manifold. [sent-19, score-0.271]
11 A similar idea underlies the standard manifold learning problems: the data does not change arbitrarily, but instead follows a manifold structure. [sent-20, score-1.012]
12 Our assumption is also a generalization of the assumption made in [1] which assumes that all tasks share a linear subspace, and a learning framework consists of learning this linear subspace and task parameters simultaneously. [sent-21, score-0.533]
13 In our proposed approach we learn the task parameters and the task-manifold alternatively, learning one while keeping the other fixed, similar to [4]. [sent-23, score-0.235]
14 First, we learn all task parameters using a single task learning (STL) method, and then use these task parameters to learn the initial task manifold. [sent-24, score-0.716]
15 The task-manifold is then used to relearn the task parameters using manifold regularization. [sent-25, score-0.737]
16 Learning of manifold and task parameters is repeated until convergence. [sent-26, score-0.684]
17 We emphasize that when we learn the task parameters (keeping the manifold structure fixed), the MTL framework decomposes across the ∗ This work was done at School of Computing, University of Utah, Salt Lake City, Utah It is not to be confused with the manifold regularization presented in [7]. [sent-27, score-1.373]
18 Note that unlike most manifold learning algorithms, our framework learns an explicit representation of the manifold and naturally extends to new tasks. [sent-32, score-1.039]
19 Whenever a new task arrives, one can simply use the existing manifold to learn the parameters of the new task. [sent-33, score-0.722]
20 Given a black box for manifold learning, STL algorithms can be adapted to the proposed MTL setting. [sent-36, score-0.506]
21 2 Related Work In MTL, task relatedness is a fundamental question and models differ in the ways they answer this question. [sent-40, score-0.305]
22 Like our method, most of the existing methods first assume a structure that defines the task relatedness, and then incorporate this structure in the MTL framework in the form of a regularizer [4, 16, 13]. [sent-41, score-0.291]
23 One plausible approach is to assume that all task parameters lie in a subspace [1]. [sent-42, score-0.282]
24 The tasks are learned by forcing the parameters to lie in a common linear subspace therefore exploiting the assumed relatedness in the model. [sent-43, score-0.505]
25 In this work, the relatedness structure is forced by applying a function F on a covariance matrix D which yields a regularization of the form tr(F (D)W W T ) on the parameters W . [sent-47, score-0.261]
26 Here, the function F can model different kind of relatedness structures among tasks including the linear subspace structure [1]. [sent-48, score-0.429]
27 Given a function F , this framework learns both, the relatedness matrix D and the task parameters W . [sent-49, score-0.368]
28 Our framework generalizes the linear framework by introducing the nonlinearity through the manifold structure learned automatically from the data, and thus avoids the need of any external function. [sent-52, score-0.668]
29 This method in spirit is very similar to our method except that we learn an explicit manifold therefore our method is naturally extensible to new tasks. [sent-56, score-0.649]
30 Another work that models the task relatedness in the form of proximity of the parameters is [16] which assumes that task parameters wt for each task is close to some common task w0 with some variance vt . [sent-57, score-0.867]
31 The task parameters are learned under this cluster assumption by minimizing a combination of different penalty functions. [sent-60, score-0.226]
32 There is another line of work [10], where task relatedness is modeled in term of a matrix B which needs to be provided externally. [sent-61, score-0.305]
33 There is also a large body of work on multitask learning that find the shared structure in the tasks using Bayesian inference [23, 24, 9], which in spirit, is similar to the above approaches, but done in a Bayesian way. [sent-62, score-0.271]
34 As mentioned earlier, our framework assumes that the tasks parameters lie on a manifold which is a step further to the assumption made in [1] i. [sent-66, score-0.784]
35 , the task parameters lie on a linear subspace or share a common set of features. [sent-68, score-0.324]
36 Similar to the linear subspace algorithm [1] that learns the task parameters (and the shared subspace) by regularizing the STL framework with the orthogonal projections of the task parameters onto the subspace, we propose to learn the task parameters (and non-linear subspace i. [sent-69, score-0.842]
37 , task-manifold) by 2 regularizing the STL with the projection distance of the task parameters from this task-manifold (see Figure 1). [sent-71, score-0.36]
38 Now for each task t, let θt be the parameter vector, referred as the task parameter. [sent-88, score-0.284]
39 Restricting ft to the functions in the RKHS and denoting it by f (x, θt ) = θt , φ(x) , single task learning solves the following optimization problem: 2 L(f (x; θt ), y) + λ ||ft ||Hk , ∗ θt = arg min θt (1) x∈Xt here λ is a regularization parameter. [sent-95, score-0.24]
40 w w∗ Figure 1: Projection of the estimated parameters w of the task in hand on the manifold learned from all tasks parameters. [sent-98, score-0.865]
41 In MTL, tasks are related, this notion of relatedness is incorporated through a regularizer. [sent-101, score-0.337]
42 For example, for the assumption that the task parameters are close to a common task θ0 , regularizer would just be θt − θ0 2 . [sent-113, score-0.416]
43 , θT ) into T different regularizers u(θt , M) such that u(θt , M) regularizes the parameter of task t while considering the effect of other tasks through the manifold M. [sent-117, score-0.803]
44 If manifold structure M is fixed then the above optimization problem decomposes into T independent optimization problems. [sent-126, score-0.557]
45 In our approach, the regularizer depends on the structure of the manifold constructed from the task parameters {θ1 , . [sent-127, score-0.782]
46 Now one can use this projection distance as a regularizer u(θt , M) in the cost function since all task parameters are assumed to lie on the task manifold M. [sent-132, score-1.101]
47 H (4) x∈Xt Since the manifold structure is not known, the cost function (4) needs to be optimized simultaneously for the task parameters (θ1 . [sent-134, score-0.708]
48 Next, we fix the manifold M, and learn the task parameters by minimizing (4). [sent-140, score-0.722]
49 an expression for computing the projection distance of task parameters from the manifold. [sent-143, score-0.341]
50 1 Manifold Regularization Our approach relies heavily on the capability to learn a manifold, and to be able to compute the gradient of the projection distances onto the manifold. [sent-146, score-0.219]
51 Much recent work in manifold learning focused on uncovering low dimensional representation [18, 6, 17, 20] of the data. [sent-147, score-0.526]
52 Recent work [11] addresses this issues and proposes a manifold learning algorithm, based on the idea of principal surfaces [12]. [sent-151, score-0.548]
53 It explicitly represents the manifold in the ambient space as a parametric surface which can be used to compute the projection distance and its gradient. [sent-152, score-0.669]
54 The method is based on minimizing the expected reconstruction error E[g(h(θ)) − θ] of the task parameter θ onto the manifold M. [sent-154, score-0.709]
55 Here h is the mapping from the manifold to the lower dimensional Euclidean space and g is the mapping from the lower dimensional Euclidean space to the manifold. [sent-155, score-0.546]
56 Thus, the composition g ◦ h maps a point belonging to manifold to the manifold, using the mapping to the Euclidean space as an intermediate step. [sent-156, score-0.506]
57 These mappings g and h can be formulated in terms of kernel regressions over the data points: h(θ) = T X j=1 Kθ (θ − θj ) zj PT l=1 Kθ (θ − θl ) (5) with Kθ a kernel function and zj a set of parameters to be estimated in the manifold learning process. [sent-158, score-0.673]
58 In [11] it is shown that in the limit, as the number of samples to learn from increases, h indeed yields an orthogonal projection onto g. [sent-166, score-0.233]
59 the manifold learning, can be done through gradient descent on the sample mean of the projection distance T 1 i=1 g(h(θi )) − θi using a global manifold learning approach for initialization. [sent-169, score-1.206]
60 Once h is estiT mated, the projection distance is immediate by PM = θ − g(h(θ)) 2 = θ − θM 2 (7) For the optimization of (4) we need the gradient of the projection distance which is dPM (θ) dg(r) dh(θ) = 2(g(h(θ)) − θ) |r=h(θ) . [sent-170, score-0.357]
61 dθ dr dθ (8) The projection distance for a single task parameters is O(n) due to the definition of h and g as kernel regressions which show up in the projection distance gradient in dg(r) |r=h(θ) and dh(θ) . [sent-171, score-0.652]
62 dr dθ This is fairly expensive therefore we propose an approximation, justified by the convergence to an orthogonal projection of h, to the exact projection gradient. [sent-172, score-0.332]
63 end while The proposed manifold regularization approximation allows to use any STL method without much change in the optimization of the STL problem. [sent-196, score-0.579]
64 The proposed method for MTL pipelines manifold learning with the STL. [sent-197, score-0.541]
65 , one does not have to worry about moving the projection of the point on the manifold during the gradient step. [sent-201, score-0.661]
66 This approximation allows one to treat the manifold regularizer similar to the RKHS regularizer θt 2 and solve the generalized learning problem ˜M (4) with non-linear kernels. [sent-203, score-0.654]
67 In the learning framework (4), the loss function is L(x, y, wt ) = (y − wt , x )2 with linear kernel k(x, y) = x, y . [sent-208, score-0.229]
68 The cost function for linear regression can now be written as: CP = T X“ X t=1 (y − wt , x )2 + x∈Xt ” λ ||wt ||2 + γPM (wt ) 2 (11) This cost function may be convex or non-convex depending upon the manifold terms PM (wt ). [sent-210, score-0.616]
69 wt is the orthogonal projection of w on the manifold. [sent-214, score-0.233]
70 3 Algorithm Description The algorithm for MTL with manifold regularization is straightforward and shown in Algorithm 1. [sent-216, score-0.544]
71 Keeping the manifold structure fixed, we relearn all task parameters using manifold regularization. [sent-221, score-1.267]
72 Equation (9) is used to compute the gradient of the projection distance used in relearning the parameters. [sent-222, score-0.225]
73 Once the parameters for all tasks are learned, the manifold is re-estimated based on the updated task parameters. [sent-225, score-0.839]
74 This data is generated from the task parameters sampled from a known manifold (swiss roll). [sent-231, score-0.684]
75 The data is generated by first sampling the points from the 3-dimensional swiss roll, and then using these points as the task parameters to generate the examples using the linear regression model. [sent-232, score-0.299]
76 First, the task at hand (this is linear) is a relatively easy task and more number of examples give a nearly perfect regression model with the STL method itself, leaving almost no room for improvement. [sent-235, score-0.388]
77 Blue dots denote the MTL performance of our method while green crosses denote the performance of the baseline method [4]. [sent-244, score-0.239]
78 It is clear from Figure 2(a) that our method is able to use the manifold information therefore outperform both STL and MTL-baseline methods. [sent-247, score-0.566]
79 2 Real Regression Dataset We now evaluate our method on two real datasets school dataset and computer survey dataset [14], the same datasets as used in the baseline model [4]. [sent-258, score-0.462]
80 Moreover they have also been used in previous MTL studies, for example, school dataset in [5, 10] and computer dataset in [14]. [sent-259, score-0.252]
81 Green crosses are the baseline method and blue dots are the manifold method. [sent-303, score-0.691]
82 Similar to the synthetic dataset, hyperparameters of the baseline method and manifold method (γ and λ) were tuned on a small validation dataset picked randomly from the training set. [sent-328, score-0.845]
83 In order to see if learning tasks simultaneously helps, we did not consider the zero value while tuning the hyperparameters of MTL to avoid the reduction of MTL method to STL ones. [sent-332, score-0.228]
84 Figure 3(a) and Figure 3(b) shows the taskwise performance of the computer and school datasets respectively. [sent-333, score-0.227]
85 For the school dataset, we perform better than both STL and the baseline method though relative performance improvement is not as significant as in the computer dataset. [sent-338, score-0.297]
86 On the school dataset, the baseline method has a mixed behavior relative to the STL method, performing good on some tasks while performing worse on others. [sent-339, score-0.401]
87 We emphasize that the baseline method has two parameters 7 that are very important, the regularization parameter and the P . [sent-345, score-0.249]
88 6 0 50 100 150 200 0 50 100 150 Number of tasks Number of tasks (a) (b) Figure 4: RMSE Vs number of tasks for (a) computer dataset (b) school dataset Now we show the performance variation with respect to the number of training examples. [sent-358, score-0.717]
89 Note that when the number of examples is relatively low, the baseline method outperforms our method because we do not have enough examples to estimate the parameters of the task which is used for the manifold construction. [sent-361, score-0.953]
90 But as we increase the number of examples, we get better estimate of the parameters, hence better manifold regularization. [sent-362, score-0.506]
91 Variation of the performance with n is not shown for the computer dataset because computer dataset has only 20 examples per task. [sent-364, score-0.22]
92 Performance variation with respect to the number of tasks for school and computer datasets is shown in Figure 4. [sent-365, score-0.311]
93 We outperform STL method and the baseline method for the computer dataset while perform better/equal on the school dataset. [sent-366, score-0.394]
94 It suggests that tasks in school datasets are related linearly (Manifold and baseline methods have the same performance 4 ) while tasks in the computer dataset are related non-linearly, which is why baseline method performs poor compared to the STL method. [sent-368, score-0.787]
95 This is especially true for the computer dataset since it only has 13 features and only a few tasks are required to learn the task relatedness structure. [sent-371, score-0.586]
96 In summary, our method improves the performance over STL in all of these datasets (no negative transfer), while baseline method performs comparatively on the school dataset and performs worse on the computer dataset. [sent-372, score-0.42]
97 5 Conclusion We have presented a novel method for multitask learning based on a natural and intuitive assumption about the task relatedness. [sent-373, score-0.291]
98 We have used the manifold assumption to enforce the task relatedness which is a generalization of the previous notions of relatedness. [sent-374, score-0.852]
99 We emphasize that unlike the baseline method, we improve over single task learning in almost all cases and do not encounter the negative transfer. [sent-380, score-0.301]
100 This is the reason we perform equal on the school dataset when the number of tasks is high. [sent-383, score-0.319]
wordName wordTfidf (topN-words)
[('manifold', 0.506), ('mtl', 0.469), ('stl', 0.455), ('rmse', 0.204), ('relatedness', 0.163), ('tasks', 0.155), ('task', 0.142), ('projection', 0.124), ('baseline', 0.111), ('school', 0.1), ('multitask', 0.092), ('regularizer', 0.074), ('pm', 0.074), ('argyriou', 0.074), ('taskwise', 0.071), ('subspace', 0.066), ('dataset', 0.064), ('wt', 0.064), ('xt', 0.062), ('ft', 0.06), ('utah', 0.057), ('dg', 0.057), ('kernel', 0.053), ('avgbaseline', 0.053), ('avgmanifold', 0.053), ('relearn', 0.053), ('rkhs', 0.048), ('avg', 0.048), ('orthogonal', 0.045), ('examples', 0.044), ('micchelli', 0.042), ('students', 0.04), ('kr', 0.04), ('distance', 0.039), ('dh', 0.039), ('evgeniou', 0.039), ('dr', 0.039), ('lie', 0.038), ('regularization', 0.038), ('learn', 0.038), ('external', 0.037), ('parameters', 0.036), ('gerber', 0.035), ('method', 0.035), ('synthetic', 0.033), ('belkin', 0.033), ('datasets', 0.032), ('salt', 0.031), ('dpm', 0.031), ('relearning', 0.031), ('swiss', 0.031), ('gradient', 0.031), ('cp', 0.03), ('emphasize', 0.029), ('lake', 0.029), ('roll', 0.029), ('carin', 0.029), ('transfer', 0.027), ('improvement', 0.027), ('framework', 0.027), ('decomposes', 0.027), ('learned', 0.026), ('giving', 0.026), ('euclidean', 0.026), ('onto', 0.026), ('outperform', 0.025), ('examination', 0.025), ('liao', 0.025), ('regressions', 0.025), ('rated', 0.025), ('regression', 0.025), ('computer', 0.024), ('daum', 0.024), ('structure', 0.024), ('yt', 0.023), ('tuned', 0.023), ('principal', 0.023), ('representer', 0.022), ('hal', 0.022), ('pontil', 0.022), ('assumption', 0.022), ('share', 0.021), ('linear', 0.021), ('dimensional', 0.02), ('dots', 0.02), ('nt', 0.02), ('crosses', 0.019), ('regularizing', 0.019), ('dimensionality', 0.019), ('hyperparameters', 0.019), ('reduction', 0.019), ('keeping', 0.019), ('surfaces', 0.019), ('picked', 0.019), ('negative', 0.019), ('enforce', 0.019), ('notion', 0.019), ('green', 0.019), ('clustered', 0.018), ('city', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
2 0.41986254 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
3 0.20131718 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
Author: Hariharan Narayanan, Sanjoy Mitter
Abstract: The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume log 1 and curvature is unavoidable. Whether the known lower bound of O( k + 2 δ ) 2 for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of 2 log 1 log4 k log 1 O( k2 + 2 δ ) to O k min k, 2 + 2 δ . Based on these results, we 2 devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension. 1
4 0.19166465 138 nips-2010-Large Margin Multi-Task Metric Learning
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
5 0.15924658 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
6 0.1569379 177 nips-2010-Multitask Learning without Label Correspondences
7 0.13427962 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
8 0.074925311 250 nips-2010-Spectral Regularization for Support Estimation
9 0.073773548 217 nips-2010-Probabilistic Multi-Task Feature Selection
10 0.072998747 124 nips-2010-Inductive Regularized Learning of Kernel Functions
11 0.069814786 182 nips-2010-New Adaptive Algorithms for Online Classification
12 0.062041838 220 nips-2010-Random Projection Trees Revisited
13 0.058069091 280 nips-2010-Unsupervised Kernel Dimension Reduction
14 0.057053741 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans
15 0.055630669 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
16 0.051986579 133 nips-2010-Kernel Descriptors for Visual Recognition
17 0.051792875 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
18 0.050836258 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
19 0.050358985 59 nips-2010-Deep Coding Network
20 0.049999479 202 nips-2010-Parallelized Stochastic Gradient Descent
topicId topicWeight
[(0, 0.163), (1, 0.038), (2, 0.077), (3, -0.045), (4, 0.093), (5, -0.002), (6, 0.091), (7, -0.044), (8, -0.18), (9, -0.012), (10, 0.08), (11, 0.025), (12, 0.197), (13, -0.021), (14, 0.131), (15, 0.04), (16, 0.052), (17, -0.236), (18, -0.052), (19, 0.01), (20, -0.328), (21, -0.202), (22, 0.017), (23, 0.051), (24, -0.065), (25, -0.092), (26, -0.319), (27, -0.097), (28, 0.055), (29, 0.144), (30, -0.008), (31, 0.042), (32, 0.159), (33, -0.001), (34, -0.062), (35, -0.146), (36, -0.031), (37, 0.039), (38, -0.071), (39, -0.051), (40, 0.012), (41, -0.007), (42, 0.064), (43, -0.078), (44, 0.011), (45, -0.02), (46, -0.097), (47, 0.068), (48, 0.07), (49, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.96325004 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
2 0.74970376 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
3 0.61628956 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
4 0.49576554 138 nips-2010-Large Margin Multi-Task Metric Learning
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
5 0.48405507 177 nips-2010-Multitask Learning without Label Correspondences
Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1
6 0.45352334 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
7 0.3970516 220 nips-2010-Random Projection Trees Revisited
8 0.39474386 232 nips-2010-Sample Complexity of Testing the Manifold Hypothesis
9 0.36472288 217 nips-2010-Probabilistic Multi-Task Feature Selection
10 0.2770645 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans
11 0.25825393 124 nips-2010-Inductive Regularized Learning of Kernel Functions
12 0.24163733 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
13 0.23625827 280 nips-2010-Unsupervised Kernel Dimension Reduction
14 0.23367836 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
15 0.23277555 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
16 0.22904912 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
17 0.22896472 99 nips-2010-Gated Softmax Classification
18 0.21597554 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
19 0.21412048 35 nips-2010-Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
20 0.21016395 5 nips-2010-A Dirty Model for Multi-task Learning
topicId topicWeight
[(13, 0.474), (17, 0.013), (27, 0.032), (30, 0.039), (35, 0.02), (45, 0.181), (50, 0.049), (52, 0.023), (60, 0.018), (77, 0.034), (90, 0.024)]
simIndex simValue paperId paperTitle
1 0.9376567 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
Author: Benjamin Miller, Nadya Bliss, Patrick J. Wolfe
Abstract: When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory” for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1
2 0.89281744 192 nips-2010-Online Classification with Specificity Constraints
Author: Andrey Bernstein, Shie Mannor, Nahum Shimkin
Abstract: We consider the online binary classification problem, where we are given m classifiers. At each stage, the classifiers map the input to the probability that the input belongs to the positive class. An online classification meta-algorithm is an algorithm that combines the outputs of the classifiers in order to attain a certain goal, without having prior knowledge on the form and statistics of the input, and without prior knowledge on the performance of the given classifiers. In this paper, we use sensitivity and specificity as the performance metrics of the meta-algorithm. In particular, our goal is to design an algorithm that satisfies the following two properties (asymptotically): (i) its average false positive rate (fp-rate) is under some given threshold; and (ii) its average true positive rate (tp-rate) is not worse than the tp-rate of the best convex combination of the m given classifiers that satisfies fprate constraint, in hindsight. We show that this problem is in fact a special case of the regret minimization problem with constraints, and therefore the above goal is not attainable. Hence, we pose a relaxed goal and propose a corresponding practical online learning meta-algorithm that attains it. In the case of two classifiers, we show that this algorithm takes a very simple form. To our best knowledge, this is the first algorithm that addresses the problem of the average tp-rate maximization under average fp-rate constraints in the online setting. 1
3 0.8748219 45 nips-2010-CUR from a Sparse Optimization Viewpoint
Author: Jacob Bien, Ya Xu, Michael W. Mahoney
Abstract: The CUR decomposition provides an approximation of a matrix X that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of X. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.
same-paper 4 0.84807646 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
5 0.83282739 284 nips-2010-Variational bounds for mixed-data factor analysis
Author: Mohammad E. Khan, Guillaume Bouchard, Kevin P. Murphy, Benjamin M. Marlin
Abstract: We propose a new variational EM algorithm for fitting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is significantly faster than previous variational methods. We show that EM is significantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further benefit of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method. 1
6 0.82406425 221 nips-2010-Random Projections for $k$-means Clustering
7 0.8101837 261 nips-2010-Supervised Clustering
8 0.73154813 262 nips-2010-Switched Latent Force Models for Movement Segmentation
9 0.71541929 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
10 0.65574223 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
11 0.65201926 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
12 0.6336115 226 nips-2010-Repeated Games against Budgeted Adversaries
13 0.62481725 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
14 0.62013763 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
15 0.61678416 166 nips-2010-Minimum Average Cost Clustering
16 0.61338848 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
17 0.60587299 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
18 0.60568851 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
19 0.60265946 182 nips-2010-New Adaptive Algorithms for Online Classification
20 0.60191178 117 nips-2010-Identifying graph-structured activation patterns in networks