nips nips2010 nips2010-138 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. [sent-9, score-0.341]
2 This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. [sent-10, score-0.35]
3 Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. [sent-11, score-0.382]
4 We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. [sent-12, score-0.653]
5 One particularly successful instance of multi-task learning is its adaptation to support vector machines (svm) [14, 15]. [sent-22, score-0.148]
6 As a consequence, the MTL adaptation of svm [15] requires all tasks to share an identical set of labels (or require side-information about task dependencies) for meaningful tranfer of knowledge. [sent-24, score-0.466]
7 This is a serious limitation in many domains (binary or non-binary) where different tasks might not share the same classes (e. [sent-25, score-0.253]
8 introduced Large Margin Nearest Neighbor (lmnn) [20], an algorithm that translates the maximum margin learning principle behind svms to k-nearest neighbor classification (kNN) [9]. [sent-29, score-0.265]
9 Similar to svms, the solution of lmnn is also obtained through a convex optimization problem that maximizes a large margin 1 between input vectors from different classes. [sent-30, score-0.538]
10 However, instead of positioning a separating hyperplane, lmnn learns a Mahalanobis metric. [sent-31, score-0.485]
11 show that the lmnn metric improves the kNN classification accuracy to be en par with kernelized svms [20] . [sent-33, score-0.669]
12 Our algorithm learns one metric that is shared amongst all the tasks and one specific metric unique to each task. [sent-38, score-0.754]
13 We demonstrate on several multi-task settings that these shared metrics significantly reduce the overall classification error. [sent-40, score-0.198]
14 Further, our algorithm tends to outperform multi-task neural networks [6] and svm [15] on tasks with many class-labels. [sent-41, score-0.237]
15 To our knowledge, this paper introduces the first multi-task metric learning algorithm for the kNN rule that explicitly models the commonalities and specifics of different tasks. [sent-42, score-0.305]
16 2 Large Margin Nearest Neighbor Euclidean Metric This section describes the large margin nearest neighbor algorithm as introduced in [20]. [sent-43, score-0.35]
17 For now, we will focus on a single-task learning framework, with a training set consisting of n examples of dimensionality d, {(xi , yi )}n , where xi ∈ Rd and yi ∈ {1, 2, . [sent-44, score-0.209]
18 The Mahalanobis distance between two inputs xi and xj is defined as dM (xi , xj ) = (xi − xj ) M(xi − xj ), Mahalanobis Metric local neighborhood xi (1) M xi margin Similarly labeled (target neighbor) where M is a symmetric positive definite matrix Differently labeled (impostor) (M 0). [sent-49, score-0.986]
19 (1) reduces to the EuDifferently labeled (impostor) clidean metric if we set M to the identity matrix, i. [sent-51, score-0.298]
20 The lmnn algorithm learns the matrix M for the Mahalanobis metric1 in eq. [sent-54, score-0.453]
21 (1) explicitly to enFigure 1: An illustration of a data set before and after hance k-nearest neighbor classification. [sent-55, score-0.164]
22 The circles represent points of equal distance to the Lmnn mimics the non-continuous and non- vector xi . [sent-57, score-0.138]
23 The Mahalanobis metric rescales directions to differentiable leave-one-out classification error of push impostors further away than target neighbors by a kNN with a convex loss function. [sent-58, score-0.501]
24 One of the advantages of lmnn over related work [12, 17] is that the (global) metric is optimized locally, which allows it to work with multi-modal data distributions and encourages better generalization. [sent-62, score-0.618]
25 To achieve this, the algorithm requires k target neighbors to be identified for every input prior to learning, which should become the k nearest neighbors after the optimization. [sent-63, score-0.328]
26 Usually, these are picked with the help of side-information, or in the absence thereof, as the k nearest neighbors within the same class based on Euclidean metric. [sent-64, score-0.209]
27 We use the notation j i to indicate that xj is a target neighbor of xi . [sent-65, score-0.471]
28 Lmnn learns a Mahalanobis metric that keeps each input xi closer to its target neighbors than other inputs with different class labels (impostors) — by a large margin. [sent-66, score-0.606]
29 For an input xi , target neighbor xj , and impostor xk , this relation can be expressed as a linear inequality constraint with respect to the squared distance d2 (·, ·): M d2 (xi , xk ) − d2 (xi , xj ) ≥ 1. [sent-67, score-0.804]
30 Here, all points on the circles have equal distance from xi . [sent-72, score-0.138]
31 Under the Mahalanobis metric this circle is deformed to an ellipsoid, which causes the impostors (marked as squares) to be further away than the target neighbors. [sent-73, score-0.397]
32 The semidefinite program (SDP) introduced by [20] moves target neighbors close by minimizing 2 j i dM (xi , xj ) while penalizing violations of the constraint in eq. [sent-74, score-0.274]
33 The latter is achieved through addi1 For simplicity we will refer to pseudo-metrics also as metrics as the distinction has no implications for our algorithm. [sent-76, score-0.128]
34 min M i, yk = yi }, the problem can be d2 (xi , xj ) + µ M j i ξijk (i,j,k)∈S subject to: (i, j, k) ∈ S: (1) d2 (xi , xk ) − d2 (xi , xj ) ≥ 1 − ξijk M M (2) ξijk ≥ 0 (3) M 0. [sent-83, score-0.337]
35 Each input (xi , yi ) belongs to exactly one of the tasks 1, . [sent-88, score-0.191]
36 An example xi ∈ It is classified by the rule yi = sign(xi (w0 + wt )). [sent-98, score-0.232]
37 ,wT T γ t wt t=0 2 2+ [1−yi (w0 + wt ) xi ]+ (3) t=1 i∈It where [a]+ = max(0, a). [sent-102, score-0.198]
38 In the extreme case, if γ0 → +∞, then w0 = 0 and all tasks are decoupled; on the other hand, when γ0 is small and γt>0 → +∞ we obtain wt>0 = 0 and all the tasks share the same decision function with weights w0 . [sent-105, score-0.368]
39 Although the mt-svm formulation is very elegant, it requires all tasks to share the same class labels. [sent-106, score-0.221]
40 4 Multi-Task Large Margin Nearest Neighbor In this section we combine large margin nearest neighbor classification from section 2 with the multi-task learning paradigm from section 3. [sent-108, score-0.387]
41 Our goal is to learn a metric dt (·, ·) for each of the T tasks that minimizes the kNN leave-one-out classification error. [sent-110, score-0.412]
42 Inspired by the methodology of the previous section, we model the commonalities between various tasks through a shared Mahalanobis metric with M0 0 and the task-specific idiosyncrasies with additional matrices M1 , . [sent-111, score-0.508]
43 We define the distance for task t as dt (xi , xj ) = (xi − xj ) (M0 + Mt )(xi − xj ). [sent-115, score-0.497]
44 (4) Intuitively, the metric defined by M0 picks up general trends across multiple data sets and Mt>0 specialize the metric further for each particular task. [sent-116, score-0.507]
45 If γ0 → ∞, the shared metric M0 reduces to the plain Euclidean metric and if γt>0 → ∞, the task-specific metrics Mt>0 become irrelevant zero matrices. [sent-139, score-0.648]
46 Therefore, if γt>0 → ∞ and γ0 is small, we learn a single metric M0 across all tasks. [sent-140, score-0.256]
47 In this case we want the result to be equivalent to applying lmnn on the union of all data sets. [sent-141, score-0.393]
48 In the other extreme case, when γ0 = 0 and γt>0 → ∞, we want our formulation to reduce to T independent lmnn algorithms. [sent-142, score-0.393]
49 Similar to the set of triples S defined in section 2, let St be the set of triples restricted to only vectors for task i, yk = yi }. [sent-143, score-0.241]
50 , St = {(i, j, k) ∈ I3 : j t of lmnn applied to each of the T tasks. [sent-147, score-0.393]
51 We refer to the resulting algorithm as multi-task large margin nearest neighbor (mt-lmnn). [sent-151, score-0.35]
52 ( 4) as dt (xi , xj ) = (xi − xj ) Lt Lt (xi − xj ), (6) which is equivalent to the Euclidean distance after the transformation xi → Lt xi . [sent-165, score-0.618]
53 , d(xi , xj ) = 0 if and only if xi = xj and d(·, ·) is a metric. [sent-173, score-0.324]
54 ,MT 2 F + γt Mt t=1 2 d2 (xi , xj ) + ξijk F + t (i,j)∈It ,j i (i,j,k)∈St subject to: ∀t, ∀(i, j, k) ∈ St : (1) d2 (xi , xk ) − d2 (xi , xj ) ≥ 1 − ξijk t t (2) ξijk ≥ 0 (3) M0 , M1 , . [sent-177, score-0.269]
55 One of the advantages of lmnn over alternative distance metric learning algorithms, for example NCA [17], is that it can be stated as a convex optimization problem. [sent-182, score-0.734]
56 This can be expressed as d2 (xi , xj ) = trace(M0 vij vij ) + trace(Mt vij vij ), (7) where vij = (xi − xj ). [sent-189, score-0.647]
57 We first provide a brief overview of the two datasets and then present results in various multi-task and domain adaptation settings. [sent-197, score-0.129]
58 The Isolet dataset was collected from 150 speakers uttering all characters in the English alphabet twice, i. [sent-198, score-0.164]
59 The five tasks differ because the groups of speakers vary greatly in the way they utter the characters of the English alphabets. [sent-206, score-0.268]
60 Our target variables consist of attributes 1, 4, 5, 6, 44 and 86, 2 Available for download from the UCI Machine Learning Repository. [sent-213, score-0.132]
61 10% Table 3: Error rates on label-compatible Isolet tasks when tested with task-specific train sets. [sent-267, score-0.186]
62 which indicate customer subtypes, customer age bracket, customer occupation, a discretized percentage of Roman Catholics in that area, contribution from a third party insurance and the last feature is a binary value that signifies if the customer has a caravan insurance policy. [sent-268, score-0.54]
63 The tasks have a different number of output labels but they share the same input data. [sent-269, score-0.263]
64 The neighborhood size k was fixed to k = 3, which is the setting recommended in the original lmnn publication [20]. [sent-277, score-0.421]
65 Our algorithm is very effective when the feature space is dense and when dealing with multi-label tasks with or without the same set of output labels. [sent-280, score-0.147]
66 The second subsection provides a brief demonstration of the use of mt-lmnn in the domain adaptation (or cold start) scenario. [sent-282, score-0.129]
67 In the label-compatible MTL scenario, all the tasks share the same label set. [sent-285, score-0.221]
68 The labelincompatible scenario arises when applying MTL to a group of multi-class classification tasks that do not share the same set of labels. [sent-286, score-0.312]
69 Label-Compatible Multi-task Learning The experiments in this setting were conducted on the Isolet data, where isolet1-5 are the 5 tasks and all of them share the same 26 labels. [sent-288, score-0.221]
70 85% is the metric obtained from lmnn trained 4 12. [sent-303, score-0.618]
71 01% and ignoring the multi-task aspect), “stlmnn” is single-task lmnn trained independent of other tasks. [sent-315, score-0.393]
72 A special case arises in terms of the kNN based classifiers in the label-compatible scenario: during the actual classification step, regardless what metric is used, the kNN training data set can either consist of only task specific 6 Task 1 2 3 4 5 Avg Euc 11. [sent-318, score-0.317]
73 90% Table 5: Error rates on Isolet label-incompatible tasks with task-specific train sets. [sent-360, score-0.186]
74 The kNN results obtained from using pooled training sets at the classification phase is shown in table 4. [sent-406, score-0.145]
75 Both sets of results, in table 3 and 4, show that mt-lmnn obtains considerable improvement over its single-task counterparts on all 5 tasks and generally outperforms the other multi-task algorithms based on neural networks and support vector machines. [sent-407, score-0.304]
76 Label-Incompatible Multi-task Learning To demonstrate mt-lmnn’s ability to learn multiple tasks having different sets of class labels, we ran experiments on the CoIL dataset and on artificially incompatible versions of Isolet tasks. [sent-408, score-0.173]
77 Also, U-lmnn cannot be used with CoIL data tasks since all of them share the same input. [sent-410, score-0.221]
78 For each original subset of Isolet we picked 10 labels at random and reduced the dataset to only examples with these labels (resulting in 600 data points per set and different sets of output labels). [sent-411, score-0.149]
79 Table 5 shows the results of the kNN algorithm under the various metrics along with single-task and multi-task versions of svm and neural networks on these tasks. [sent-412, score-0.218]
80 The classification error rates on CoIL data tasks are shown in Table 6. [sent-414, score-0.186]
81 The multi-task neural network and svm have a hard time with most of the tasks and, at times perform worse than their single-task versions. [sent-415, score-0.203]
82 Both svm and neural networks perform very well on the tasks with the least number of classes, whereas mt-lmnn does very well in tasks with many classes (in particular 40-way classification of task 1). [sent-417, score-0.479]
83 2 Domain Adaptation Domain adaptation attempts to learn a severely undersampled target domain, with the help of source domains with plenty of data, that may not have the same sample distribution as that of the target. [sent-419, score-0.183]
84 In such cases, we would like the learned classifier to gracefully adapt its recognition / classification rule to the target domain as more data becomes available. [sent-421, score-0.187]
85 Unlike the previous setting, we now have one specific target task which can be heavily under-sampled. [sent-422, score-0.162]
86 We evaluate the domain adaptation capability of mt-lmnn with isolet1-4 as the source and isolet5 as the target domain across varying amounts of available labeled target data. [sent-423, score-0.437]
87 7 11 Test error rate in % In the absence of any training data from 10 isolet5 (also referred to as the cold-start 9 scenario), we used the global metric M0 learned by mt-lmnn on tasks isolet1-4. [sent-425, score-0.401]
88 U8 EUC lmnn and mt-lmnn global metric perform U-LMNN 7 much better than the Euclidean metric, with MT-LMNN 6 U-lmnn giving slightly better classification. [sent-426, score-0.618]
89 6 Related Work Caruana was the first to demonstrate results on multi-task learning for k-nearest neighbor regression and locally weighted averaging [6]. [sent-438, score-0.164]
90 In contrast, our work focuses on classification and learns different metrics with shared components. [sent-440, score-0.258]
91 However, our method uses a different optimization and learns metrics rather than separating hyperplanes. [sent-447, score-0.245]
92 To our knowledge, it is the first metric learning algorithm that embraces the multi-task learning paradigm that goes beyond feature re-weighting for pooled training data. [sent-449, score-0.341]
93 Mt-lmnn consistently outperformed single-task metrics for kNN in almost all of the learning settings and obtains better classification results than multi-task neural networks and support-vector machines. [sent-451, score-0.162]
94 Addressing a major limitation of mt-svm, mt-lmnn is applicable (and effective) on multiple multi-class tasks with different sets of classes. [sent-452, score-0.173]
95 This MTL framework can also be easily adapted for other metric learning algorithms including the online version of lmnn [7]. [sent-453, score-0.618]
96 A further research extension is to incorporate known structure by introducing additional sub-global metrics that are shared only by a strict subset of the tasks. [sent-454, score-0.198]
97 The nearest neighbor classification rule is a natural fit for multi-task learning, if accompanied with a suitable metric. [sent-455, score-0.318]
98 By extending one of the state-of-the-art metric learning algorithms to the multi-task learning paradigm, mt-lmnn provides a more integrative methodology for metric learning across multiple learning problems. [sent-456, score-0.481]
99 Fast speaker adaptation using constrained estimation of Gaussian mixtures. [sent-538, score-0.139]
100 Distance metric learning for large margin nearest neighbor classification. [sent-590, score-0.575]
wordName wordTfidf (topN-words)
[('lmnn', 0.393), ('mt', 0.315), ('isolet', 0.292), ('knn', 0.256), ('mtl', 0.244), ('metric', 0.225), ('neighbor', 0.164), ('ijk', 0.15), ('tasks', 0.147), ('euc', 0.146), ('coil', 0.129), ('metrics', 0.128), ('mahalanobis', 0.128), ('xj', 0.116), ('nearest', 0.111), ('target', 0.099), ('customer', 0.098), ('impostor', 0.097), ('classi', 0.097), ('lt', 0.093), ('xi', 0.092), ('speakers', 0.09), ('adaptation', 0.084), ('vij', 0.083), ('weinberger', 0.077), ('margin', 0.075), ('share', 0.074), ('insurance', 0.074), ('impostors', 0.073), ('shared', 0.07), ('evgeniou', 0.067), ('avg', 0.067), ('euclidean', 0.064), ('task', 0.063), ('learns', 0.06), ('neighbors', 0.059), ('speech', 0.058), ('svm', 0.056), ('triples', 0.055), ('speaker', 0.055), ('wt', 0.053), ('pooled', 0.05), ('labelincompatible', 0.049), ('multitask', 0.046), ('distance', 0.046), ('semide', 0.045), ('domain', 0.045), ('convex', 0.045), ('yi', 0.044), ('rule', 0.043), ('kilian', 0.043), ('alphabet', 0.043), ('english', 0.042), ('scenario', 0.042), ('labels', 0.042), ('table', 0.04), ('dt', 0.04), ('rates', 0.039), ('clidean', 0.039), ('picked', 0.039), ('machines', 0.039), ('xk', 0.037), ('commonalities', 0.037), ('paradigm', 0.037), ('trace', 0.036), ('utterances', 0.035), ('networks', 0.034), ('labeled', 0.034), ('spoken', 0.033), ('download', 0.033), ('st', 0.033), ('separating', 0.032), ('counterparts', 0.032), ('louis', 0.032), ('classes', 0.032), ('across', 0.031), ('ers', 0.031), ('cation', 0.031), ('characters', 0.031), ('letter', 0.03), ('inputs', 0.029), ('matrices', 0.029), ('training', 0.029), ('neighborhood', 0.028), ('saul', 0.027), ('sdp', 0.027), ('cs', 0.027), ('amongst', 0.027), ('aspect', 0.027), ('dm', 0.026), ('svms', 0.026), ('sets', 0.026), ('optimization', 0.025), ('support', 0.025), ('improves', 0.025), ('elegant', 0.025), ('hyperplane', 0.024), ('trivially', 0.024), ('page', 0.024), ('yk', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 138 nips-2010-Large Margin Multi-Task Metric Learning
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
2 0.28886399 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
Author: Yung-kyun Noh, Byoung-tak Zhang, Daniel D. Lee
Abstract: We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models. 1
3 0.23196597 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
4 0.19166465 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
5 0.15327527 108 nips-2010-Graph-Valued Regression
Author: Han Liu, Xi Chen, Larry Wasserman, John D. Lafferty
Abstract: Undirected graphical models encode in a graph G the dependency structure of a random vector Y . In many applications, it is of interest to model Y given another random vector X as input. We refer to the problem of estimating the graph G(x) of Y conditioned on X = x as “graph-valued regression”. In this paper, we propose a semiparametric method for estimating G(x) that builds a tree on the X space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method “Graph-optimized CART”, or GoCART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data. 1
6 0.12044764 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
7 0.11830612 124 nips-2010-Inductive Regularized Learning of Kernel Functions
8 0.11000943 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
9 0.1067697 177 nips-2010-Multitask Learning without Label Correspondences
10 0.10054265 203 nips-2010-Parametric Bandits: The Generalized Linear Case
11 0.097864993 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
12 0.083761603 182 nips-2010-New Adaptive Algorithms for Online Classification
13 0.083058856 228 nips-2010-Reverse Multi-Label Learning
14 0.077305049 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
15 0.075944312 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation
16 0.071662858 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
17 0.071461365 217 nips-2010-Probabilistic Multi-Task Feature Selection
18 0.071204573 94 nips-2010-Feature Set Embedding for Incomplete Data
19 0.06792102 222 nips-2010-Random Walk Approach to Regret Minimization
20 0.067683794 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
topicId topicWeight
[(0, 0.226), (1, 0.074), (2, 0.064), (3, -0.101), (4, 0.095), (5, 0.035), (6, 0.024), (7, -0.074), (8, -0.093), (9, 0.0), (10, 0.055), (11, 0.034), (12, 0.202), (13, -0.073), (14, 0.095), (15, -0.002), (16, 0.015), (17, -0.144), (18, -0.093), (19, -0.1), (20, -0.299), (21, -0.188), (22, 0.007), (23, 0.12), (24, -0.039), (25, -0.061), (26, -0.003), (27, 0.023), (28, 0.01), (29, -0.032), (30, -0.066), (31, 0.062), (32, -0.059), (33, -0.105), (34, 0.096), (35, 0.028), (36, -0.026), (37, 0.003), (38, 0.036), (39, -0.042), (40, -0.146), (41, -0.006), (42, -0.13), (43, 0.075), (44, 0.07), (45, 0.086), (46, -0.031), (47, 0.044), (48, -0.026), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.9314571 138 nips-2010-Large Margin Multi-Task Metric Learning
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
2 0.72734803 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
Author: Yung-kyun Noh, Byoung-tak Zhang, Daniel D. Lee
Abstract: We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models. 1
3 0.63425153 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
Author: Arvind Agarwal, Samuel Gerber, Hal Daume
Abstract: We present a novel method for multitask learning (MTL) based on manifold regularization: assume that all task parameters lie on a manifold. This is the generalization of a common assumption made in the existing literature: task parameters share a common linear subspace. One proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes across tasks which can be learnt independently. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets. 1
4 0.63279754 147 nips-2010-Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
5 0.59987986 177 nips-2010-Multitask Learning without Label Correspondences
Author: Novi Quadrianto, James Petterson, Tibério S. Caetano, Alex J. Smola, S.v.n. Vishwanathan
Abstract: We propose an algorithm to perform multitask learning where each task has potentially distinct label sets and label correspondences are not readily available. This is in contrast with existing methods which either assume that the label sets shared by different tasks are the same or that there exists a label mapping oracle. Our method directly maximizes the mutual information among the labels, and we show that the resulting objective function can be efficiently optimized using existing algorithms. Our proposed approach has a direct application for data integration with different label spaces, such as integrating Yahoo! and DMOZ web directories. 1
6 0.50597852 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
7 0.47233969 287 nips-2010-Worst-Case Linear Discriminant Analysis
8 0.46792138 2 nips-2010-A Bayesian Approach to Concept Drift
9 0.46444511 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
10 0.46306741 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
11 0.44028407 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
12 0.42715171 124 nips-2010-Inductive Regularized Learning of Kernel Functions
13 0.42535207 280 nips-2010-Unsupervised Kernel Dimension Reduction
14 0.41751724 99 nips-2010-Gated Softmax Classification
15 0.41509464 158 nips-2010-Learning via Gaussian Herding
16 0.40809691 228 nips-2010-Reverse Multi-Label Learning
17 0.40335354 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
18 0.40259996 108 nips-2010-Graph-Valued Regression
19 0.39119181 217 nips-2010-Probabilistic Multi-Task Feature Selection
20 0.39109966 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
topicId topicWeight
[(13, 0.097), (17, 0.016), (27, 0.036), (30, 0.052), (35, 0.014), (41, 0.206), (45, 0.222), (50, 0.051), (52, 0.023), (60, 0.088), (77, 0.048), (78, 0.034), (90, 0.041)]
simIndex simValue paperId paperTitle
1 0.86790651 71 nips-2010-Efficient Relational Learning with Hidden Variable Detection
Author: Ni Lao, Jun Zhu, Liu Xinwang, Yandong Liu, William W. Cohen
Abstract: Markov networks (MNs) can incorporate arbitrarily complex features in modeling relational data. However, this flexibility comes at a sharp price of training an exponentially complex model. To address this challenge, we propose a novel relational learning approach, which consists of a restricted class of relational MNs (RMNs) called relation tree-based RMN (treeRMN), and an efficient Hidden Variable Detection algorithm called Contrastive Variable Induction (CVI). On one hand, the restricted treeRMN only considers simple (e.g., unary and pairwise) features in relational data and thus achieves computational efficiency; and on the other hand, the CVI algorithm efficiently detects hidden variables which can capture long range dependencies. Therefore, the resultant approach is highly efficient yet does not sacrifice its expressive power. Empirical results on four real datasets show that the proposed relational learning method can achieve similar prediction quality as the state-of-the-art approaches, but is significantly more efficient in training; and the induced hidden variables are semantically meaningful and crucial to improve the training speed and prediction qualities of treeRMNs.
2 0.84899408 176 nips-2010-Multiple Kernel Learning and the SMO Algorithm
Author: Zhaonan Sun, Nawanol Ampornpunt, Manik Varma, S.v.n. Vishwanathan
Abstract: Our objective is to train p-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the p-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than state-of-the-art specialised p-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core. 1
same-paper 3 0.83930123 138 nips-2010-Large Margin Multi-Task Metric Learning
Author: Shibin Parameswaran, Kilian Q. Weinberger
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers. 1
4 0.78042495 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
Author: Gergely Neu, Andras Antos, András György, Csaba Szepesvári
Abstract: We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in terms of the total reward received. In each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is a no-regret algorithm. In this paper we propose a new learning algorithm and, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of the new algorithm is O T 2/3 (ln T )1/3 , giving the first rigorously proved regret bound for the problem. 1
5 0.77601606 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
Author: Armand Joulin, Jean Ponce, Francis R. Bach
Abstract: Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering methods with added benefits in terms of number of hyperparameters and optimization. While the expectation-maximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a non-convex cost function. To avoid this problem, we introduce a local approximation of this cost function, which in turn leads to a quadratic non-convex optimization problem over a product of simplices. In order to maximize quadratic functions, we propose an efficient algorithm based on convex relaxations and lowrank representations of the data, capable of handling large-scale problems. Experiments on text document classification show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than existing discriminative clustering methods. 1
6 0.77430344 265 nips-2010-The LASSO risk: asymptotic results and real world examples
7 0.77078134 261 nips-2010-Supervised Clustering
8 0.77044916 222 nips-2010-Random Walk Approach to Regret Minimization
9 0.76924217 63 nips-2010-Distributed Dual Averaging In Networks
10 0.76912278 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
11 0.76899546 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
12 0.76845551 36 nips-2010-Avoiding False Positive in Multi-Instance Learning
13 0.76712549 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
14 0.766074 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
15 0.76595736 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
16 0.76591295 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
17 0.76465678 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
18 0.76453018 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
19 0.76331967 287 nips-2010-Worst-Case Linear Discriminant Analysis
20 0.76318318 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups