nips nips2007 nips2007-134 knowledge-graph by maker-knowledge-mining

134 nips-2007-Multi-Task Learning via Conic Programming

Source: pdf

Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai

Abstract: When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have signiﬁcantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efﬁciently. The usefulness of our approach is demonstrated through simulations with protein super-family classiﬁcation and ordinal regression problems.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Existing approaches to MTL often treat all the tasks as uniformly related to each other and the relatedness of the tasks is controlled globally. [sent-13, score-0.62]

2 For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have signiﬁcantly different solutions. [sent-14, score-0.703]

3 Our method makes use of a task network, which describes the relation structure among tasks. [sent-16, score-0.283]

4 This allows us to deal with intricate relation structures in a systematic way. [sent-17, score-0.151]

5 Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. [sent-18, score-0.651]

6 We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efﬁciently. [sent-19, score-0.174]

7 The usefulness of our approach is demonstrated through simulations with protein super-family classiﬁcation and ordinal regression problems. [sent-20, score-0.516]

8 1 Introduction In many practical situations, a classiﬁcation task can often be divided into related sub-tasks. [sent-21, score-0.246]

9 Typically, the ‘relatedness’ among tasks is implemented as imposing the solutions of related tasks to be similar (e. [sent-27, score-0.614]

10 The second problem is that the related tasks are often imposed to be close in the sense that the sum of the distances between solutions over all pairs of related tasks is upper-bounded [8] (which is often referred to as the global constraint [10]). [sent-33, score-0.827]

11 This implies that all the solutions of related tasks are not necessarily close, but some can be quite different. [sent-34, score-0.348]

12 We settle the ﬁrst issue by making use of a task network that describes the relation structure among tasks. [sent-36, score-0.351]

13 This enables us to deal with intricate relation structures in a systematic way. [sent-37, score-0.151]

14 We solve the second problem 1 by directly upper-bounding each distance between the solutions of related task pairs (which we call local constraints). [sent-38, score-0.333]

15 We apply this ideas in the framework of support vector machines (SVMs) and show that linear SVMs can be trained via a second order cone program (SOCP) [3] in the primal. [sent-39, score-0.135]

16 We further show that the kernelized version of the proposed method can be formulated as a matrix-fractional program (MFP) [3] in the dual, which can be again cast as an SOCP; thus the optimization problem of the kernelized variant is still convex and the global solution can be computed efﬁciently. [sent-41, score-0.369]

17 Through experiments with artiﬁcial and real-world protein super-family classiﬁcation data sets, we show that the proposed MTL method compares favorably with existing MTL methods. [sent-42, score-0.175]

18 We further test the performance of the proposed approach in ordinal regression scenarios [9], where the goal is to predict ordinal class labels such as users’ preferences (‘like’/‘neutral’/‘dislike’) or students’ grades (from ‘A’ to ‘F’). [sent-43, score-0.847]

19 The ordinal regression problems can be formulated as a set of one-versus-one classiﬁcation problems, e. [sent-44, score-0.489]

20 In ordinal regression, the relatedness among tasks is highly structured. [sent-49, score-0.711]

21 Our experiments demonstrate that the proposed method is also useful in the ordinal regression scenarios and tends to outperform existing approaches [9, 8] 2 Problem Setting In this section, we formulate the MTL problem. [sent-57, score-0.567]

22 Let {x t , yt }t=1 be the training set, where x t ∈ X and yt ∈ {±1} for t = 1, . [sent-60, score-0.128]

23 Each data sample (x t , yt ) has its target task; we denote the set of sample indices of the i-th task by I i . [sent-64, score-0.209]

24 The goal is to learn the score function of each classiﬁcation task: f i (x; wi , bi ) = wi x + bi , for i = 1, . [sent-68, score-0.386]

25 , M, where wi ∈ Rd and bi ∈ R are the model parameters of the i-th task. [sent-71, score-0.193]

26 The task network describes the relationships among tasks, where each node represents a task and two nodes are connected by an edge if they are related to each other 1. [sent-73, score-0.563]

27 1 Basic Idea When the relation among tasks is not available, we may just solve M penalized ﬁtting problems individually: 1 wi 2 2 Hinge(fi (xt ; wi , bi ), yt ), + Cα for i = 1, . [sent-77, score-0.701]

28 This individual approach tends to perform poorly if the number of training samples in each task is limited—the performance is expected to be improved if more training samples are available. [sent-81, score-0.145]

29 Here, we can exploit the information of the task network. [sent-82, score-0.145]

30 A naive 1 More generally, the tasks can be related in an inhomogeneous way, i. [sent-83, score-0.325]

31 , the strength of the relationship among tasks can be dependent on tasks. [sent-85, score-0.266]

32 2 idea would be to use the training samples of neighboring tasks in the task network for solving the target ﬁtting problem. [sent-88, score-0.47]

33 However, this does not fully make use of the network structure since there are many other indirectly connected tasks via some paths on the network. [sent-89, score-0.362]

34 To cope with this problem, we take another approach here, which is based on the expectation that the solutions of related tasks are close to each other. [sent-90, score-0.348]

35 More speciﬁcally, we impose the following constraint on the optimization problem (1): 1 2 wik − wjk ≤ ρ, for ∀k = 1, . [sent-91, score-0.195]

36 (2) 2 Namely, we upper-bound each difference between the solutions of related task pairs by a positive scalar ρ ∈ R+ . [sent-95, score-0.298]

37 We refer to this constraint as local constraint following [10]. [sent-96, score-0.169]

38 Note that we do not impose a constraint on the bias parameter b i since the bias could be signiﬁcantly different even among related tasks. [sent-97, score-0.253]

39 The constraint (2) allows us to implicitly increase the number of training samples over the task network in a systematic way through the solutions of related tasks. [sent-98, score-0.466]

40 (1) and (2) as 1 2M M wi 2 M Hinge(fi (xt ; θ), yt ) + Cρ ρ, + Cα i=1 (3) i=1 t∈Ii where Cρ is a positive trade-off parameter. [sent-100, score-0.175]

41 ≤ ρ, ∀k, and α yt wi xt + bi ≥ 1 − ξt , ∀t ∈ Ii , ∀i and α ξα ≡ [ξ1 , . [sent-104, score-0.305]

42 2 Primal MTL Learning by SOCP The second order cone program (SOCP) is a class of convex programs of minimizing a linear function over an intersection of second-order cones [3]: 2 Problem 2. [sent-112, score-0.2]

43 4 Local MTL with Task Network: Kernelization The previous section showed that a linear version of the proposed MTL method can be cast as an SOCP. [sent-128, score-0.132]

44 1 Dual Formulation Let Kfea be a positive semideﬁnite matrix with the (s, t)-th element being the inner-product of fea feature vectors x s and xt : Ks,t ≡ xs , xt . [sent-132, score-0.275]

45 Let Z ∈ NM× be the indicator of a task and a sample such that Z i,t = 1 if t ∈ Ii and Zi,t = 0 otherwise. [sent-137, score-0.145]

46 Then the information about the tasks are expressed by the × kernel matrix Z Knet (λ) Z. [sent-138, score-0.315]

47 We integrate the two kernel matrices K fea and Z Knet (λ) Z by Kint (λ) ≡ Kfea ◦ Z Knet (λ) Z , (6) where ◦ denotes the Hadamard product (a. [sent-139, score-0.268]

48 Based on the above notations, the dual formulation of Problem 1 can be expressed using the parameterized integrated kernel matrix K int (λ) as follows: Problem 3. [sent-143, score-0.143]

49 Changing the deﬁnition of K fea from the linear kernel to an arbitrary kernel, we can extend the proposed linear MTL method to non-linear domains. [sent-148, score-0.349]

50 Furthermore, we can also deal with nonvectorial structured data by employing a suitable kernel such as the string kernel and the Fisher kernel. [sent-149, score-0.178]

51 In the test stage, a new sample x in the j-th task is classiﬁed by M αt yt kfea (xt , x)knet (i, j)Zi,t + bj , fj (x) = (8) t=1 i=1 where kfea (·, ·) and knet (·, ·) are the kernel functions over features and tasks, respectively. [sent-150, score-0.658]

52 Here and denote the positive semideﬁnite cone where Pi ∈ and strictly positive deﬁnite cone of n × n matrices, respectively. [sent-157, score-0.182]

53 n×p n Sn + Sn ++ Let us re-deﬁne d as the rank of the feature kernel matrix K fea . [sent-158, score-0.268]

54 We introduce a matrix V fea ∈ R ×d which decomposes the feature kernel matrix as K fea = Vfea Vfea . [sent-159, score-0.447]

55 Deﬁne the -dimensional vectors fh ∈ R of the h-th feature as V fea ≡ [f1 , . [sent-160, score-0.333]

56 Let us further introduce the vector v k ∈ RM for each edge: v k = eik −ejk , where eik is a unit vector with the ik -th element being one. [sent-169, score-0.164]

57 Then we can re-express the graph Lagrangian matrix of tasks as Uλ = V lap diag(λ)Vlap . [sent-174, score-0.293]

58 5 Discussion In this section, we discuss the properties of the proposed MTL method and the relation to existing methods. [sent-182, score-0.18]

59 MTL with Common Bias A possible variant of the proposed MTL method would be to share the common bias parameter with all tasks (i. [sent-183, score-0.347]

60 The idea is expected to be useful particularly when the number of samples in each task is very small. [sent-186, score-0.145]

61 We can also apply the common bias idea in the kernelized version just by replacing the constraint Z diag(y)α = 0 M in Problem 3 by y α = 0. [sent-187, score-0.161]

62 Local Constraints Micchelli and Pontil [8] have proposed a related MTL method which upper-bounds the sum of the differences of K related task pairs, i. [sent-189, score-0.358]

63 This global constraint can also have k=1 wik − wjk 2 a similar effect to our local constraint (2), i. [sent-193, score-0.347]

64 , the related task pairs tend to have close solutions. [sent-195, score-0.242]

65 However, the global constraint can allow some of the distances to be large since only the sum is upper-bounded. [sent-196, score-0.156]

66 We note that the idea of local constraints is also used in the kernel learning problem [10]. [sent-198, score-0.159]

67 Indeed, when the number of tasks is one, Problem 3 is reduced to the standard SVM optimization problem. [sent-200, score-0.263]

68 Ordinal Regression As we mentioned in Section 1, MTL approaches are useful in ordinal regression problems. [sent-202, score-0.454]

69 Ordinal regression is a task of learning multiple quantiles, which can be formulated as a set of one-versus-one classiﬁcation problems. [sent-203, score-0.291]

70 A naive approach to ordinal regression is to individually train M SVMs with score functions f i (x) = wi , x + bi , i = 1, . [sent-204, score-0.714]

71 Each subﬁgure contains the 10-th, 30-th, 50-th, 70-th, and 90-th tasks in the top row and the 110-th, 130-th, 150-th, 170-th, and 190-th tasks in the bottom row. [sent-369, score-0.452]

72 and Levin [9] proposed an ordinal regression method called the support vector ordinal regression (SVOR), where the weight vectors are shared by all SVMs (i. [sent-370, score-0.989]

73 The proposed MTL method can be naturally employed in ordinal regression by constraining the weight vectors as wi − wi+1 2 ≤ ρ, i = 1, . [sent-373, score-0.646]

74 , the task network only has a weight between consecutive tasks. [sent-378, score-0.273]

75 This method actually includes the above two ordinal regression approaches as special cases—Cρ = 0 (i. [sent-379, score-0.485]

76 , ignoring the task network) yields the independent training of SVMs and Cρ = ∞ (i. [sent-381, score-0.145]

77 Thus, in the context of ordinal regression, the proposed method smoothly bridges two extremes and allows us to control the belief of task constraints. [sent-384, score-0.569]

78 1 Toy Multiple Classiﬁcation Tasks First, we illustrate how the proposed method behaves using a 2-dimensional toy data set, which includes 200 tasks (see Figure 1(a)). [sent-387, score-0.339]

79 Each task possesses a circular-shaped classiﬁcation boundary with different centers and a ﬁxed radius 0. [sent-388, score-0.145]

80 The location of the center in the i-th task is (−1 + 0. [sent-390, score-0.145]

81 We construct a task network where consecutive tasks are connected in a circular manner, i. [sent-394, score-0.59]

82 , (99, 100), and (100, 1) for the ﬁrst 100 tasks and (101, 102), (102, 103), . [sent-399, score-0.226]

83 However, the result is still not so reliable since non-consecutive unrelated tasks heavily damage the solutions. [sent-406, score-0.226]

84 Thus, given an appropriate task network, the proposed MTL-SVM(local/network) can effectively exploit information of the related tasks. [sent-408, score-0.261]

85 2 Protein Super-Family Classiﬁcation Next, we test the performance of the proposed method with real-world protein super-family classiﬁcation problems. [sent-471, score-0.143]

86 We use RBF kernels, where the kernel width σ rbf is set to the average of the squared distances to the ﬁfth nearest neighbors. [sent-476, score-0.18]

87 The task networks are constructed as follows: if the positive super-family or the negative superfamily is common to two tasks, the two tasks are regarded as a related task pair and connected by an edge. [sent-487, score-0.648]

88 One-SVM regards the multiple tasks as one big task and learns the big task once by a standard SVM. [sent-489, score-0.516]

89 In this simulation, the task network is constructed rather heuristically. [sent-496, score-0.244]

90 Even so, the proposed MTL-SVM(local/network) is shown to signiﬁcantly outperform MTL-SVM(global/full), which does not use the network structure. [sent-497, score-0.149]

91 This implies that the proposed method still works well even when the task network contains small errors. [sent-498, score-0.325]

92 It is interesting to note that MTL-SVM(global/network) actually does not work well in this simulation, implying that the task relatedness are not properly controlled by the global constraint. [sent-499, score-0.297]

93 3 Ordinal Regression As discussed in Section 5, MTL methods are useful in ordinal regression. [sent-502, score-0.343]

94 Here we create ﬁve ordinal regression data sets described in Table 2, where all the data sets are originally regression and the output values are divided into ﬁve quantiles. [sent-503, score-0.6]

95 Therefore, the overall task can be divided into four isolated classiﬁcation tasks, each of which estimates a quantile. [sent-504, score-0.18]

96 7 Table 2: The accuracy of each method in ordinal regression tasks. [sent-511, score-0.485]

97 004) The averaged performance over ﬁve runs is described in Table 2, showing that the proposed MTLSVM(local/network) is also promising in ordinal regression scenarios. [sent-562, score-0.504]

98 7 Conclusions In this paper, we proposed a new multi-task learning method, which overcomes the limitation of existing approaches by making use of a task network and local constraints. [sent-563, score-0.392]

99 We demonstrated through simulations that the proposed method is useful in multi-task learning scenario; moreover, it also works excellently in ordinal regression scenarios. [sent-564, score-0.535]

100 Other possible future works include the elucidation of the entire regularization path and the application to learning from multiple networks; developing algorithms for learning probabilistic models with a task network is also a promising direction to be explored. [sent-569, score-0.244]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mtl', 0.51), ('ordinal', 0.343), ('socp', 0.285), ('tasks', 0.226), ('fea', 0.179), ('fh', 0.154), ('knet', 0.154), ('task', 0.145), ('svms', 0.122), ('regression', 0.111), ('wi', 0.111), ('kfea', 0.103), ('relatedness', 0.102), ('network', 0.099), ('cone', 0.091), ('kernel', 0.089), ('bi', 0.082), ('kint', 0.077), ('mfp', 0.077), ('socps', 0.077), ('svor', 0.077), ('classi', 0.071), ('constraint', 0.067), ('lap', 0.067), ('wjk', 0.067), ('relation', 0.067), ('related', 0.066), ('programs', 0.065), ('diag', 0.064), ('yt', 0.064), ('hinge', 0.064), ('protein', 0.062), ('ik', 0.062), ('jk', 0.062), ('wik', 0.061), ('rm', 0.059), ('solutions', 0.056), ('circular', 0.054), ('kernelized', 0.054), ('dual', 0.054), ('semide', 0.053), ('rbf', 0.052), ('asai', 0.051), ('eik', 0.051), ('intricate', 0.051), ('kd', 0.051), ('mtlsvm', 0.051), ('scop', 0.051), ('uh', 0.051), ('vfea', 0.051), ('vlap', 0.051), ('neutral', 0.051), ('cast', 0.051), ('global', 0.05), ('proposed', 0.05), ('tokyo', 0.049), ('kernels', 0.049), ('sn', 0.048), ('xt', 0.048), ('boundaries', 0.046), ('shashua', 0.045), ('dislike', 0.045), ('program', 0.044), ('fold', 0.043), ('ii', 0.041), ('cation', 0.041), ('micchelli', 0.041), ('wm', 0.041), ('among', 0.04), ('bias', 0.04), ('distances', 0.039), ('binding', 0.038), ('yair', 0.038), ('reduced', 0.037), ('connected', 0.037), ('multitask', 0.036), ('fth', 0.036), ('amino', 0.036), ('rk', 0.036), ('formulated', 0.035), ('constraints', 0.035), ('local', 0.035), ('divided', 0.035), ('acid', 0.034), ('wrt', 0.034), ('individually', 0.034), ('naive', 0.033), ('systematic', 0.033), ('toy', 0.032), ('existing', 0.032), ('solved', 0.032), ('overcomes', 0.031), ('solvers', 0.031), ('pairs', 0.031), ('edge', 0.031), ('method', 0.031), ('folds', 0.03), ('saul', 0.03), ('regarded', 0.029), ('consecutive', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 134 nips-2007-Multi-Task Learning via Conic Programming

Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai

2 0.34139207 175 nips-2007-Semi-Supervised Multitask Learning

Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin

Abstract: A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classiﬁers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a softsharing prior imposed over the parameters of the classiﬁers. The unlabeled data are utilized by basing classiﬁer learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields signiﬁcant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL. 1

3 0.20229171 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr

Abstract: The problem of obtaining the maximum a posteriori estimate of a general discrete random ﬁeld (i.e. a random ﬁeld deﬁned using a ﬁnite and discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximate algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP - S: the linear programming (LP) relaxation proposed by Schlesinger [20] for a special case and independently in [4, 12, 23] for the general case; (ii) QP - RL: the quadratic programming (QP) relaxation by Ravikumar and Lafferty [18]; and (iii) SOCP - MS: the second order cone programming (SOCP) relaxation ﬁrst proposed by Muramatsu and Suzuki [16] for two label problems and later extended in [14] for a general label set. We show that the SOCP - MS and the QP - RL relaxations are equivalent. Furthermore, we prove that despite the ﬂexibility in the form of the constraints/objective function offered by QP and SOCP, the LP - S relaxation strictly dominates (i.e. provides a better approximation than) QP - RL and SOCP - MS. We generalize these results by deﬁning a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP - S relaxation. Based on these results we propose some novel SOCP relaxations which strictly dominate the previous approaches.

4 0.11619104 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Author: Ping Li, Qiang Wu, Christopher J. Burges

Abstract: We cast the ranking problem as (1) multiple classiﬁcation (“Mc”) (2) multiple ordinal classiﬁcation, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classiﬁcations result in perfect DCG scores and the DCG errors are bounded by classiﬁcation errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efﬁcient implementation of the boosting tree algorithm is also presented.

5 0.11367887 135 nips-2007-Multi-task Gaussian Process Prediction

Author: Edwin V. Bonilla, Kian M. Chai, Christopher Williams

Abstract: In this paper we investigate multi-task learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on input-dependent features and a “free-form” covariance matrix over tasks. This allows for good ﬂexibility when modelling inter-task dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noise-free observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of inter-task transfer occurs. We evaluate the beneﬁts of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1

6 0.10321877 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

7 0.095739096 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

8 0.092329107 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

9 0.082322568 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

10 0.07994362 62 nips-2007-Convex Learning with Invariances

11 0.071706437 146 nips-2007-On higher-order perceptron algorithms

12 0.071324594 160 nips-2007-Random Features for Large-Scale Kernel Machines

13 0.070397839 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

14 0.069009893 112 nips-2007-Learning Monotonic Transformations for Classification

15 0.067444079 109 nips-2007-Kernels on Attributed Pointsets with Applications

16 0.066869907 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

17 0.06566707 99 nips-2007-Hierarchical Penalization

18 0.063939378 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

19 0.061712395 97 nips-2007-Hidden Common Cause Relations in Relational Learning

20 0.060984913 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.216), (1, 0.032), (2, -0.108), (3, 0.125), (4, -0.005), (5, -0.007), (6, 0.084), (7, -0.022), (8, -0.012), (9, 0.055), (10, 0.077), (11, -0.129), (12, 0.034), (13, 0.12), (14, -0.04), (15, -0.025), (16, -0.169), (17, -0.125), (18, 0.18), (19, 0.01), (20, -0.167), (21, 0.147), (22, -0.041), (23, -0.028), (24, 0.154), (25, 0.046), (26, 0.129), (27, 0.347), (28, -0.063), (29, 0.151), (30, -0.013), (31, 0.147), (32, 0.094), (33, -0.018), (34, 0.098), (35, 0.136), (36, -0.095), (37, -0.052), (38, -0.081), (39, -0.098), (40, 0.09), (41, 0.099), (42, 0.076), (43, -0.015), (44, -0.03), (45, 0.055), (46, 0.063), (47, -0.133), (48, -0.068), (49, -0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94831884 134 nips-2007-Multi-Task Learning via Conic Programming

Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai

2 0.73234385 175 nips-2007-Semi-Supervised Multitask Learning

Author: Qiuhua Liu, Xuejun Liao, Lawrence Carin

3 0.48429871 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

Author: Andreas Argyriou, Massimiliano Pontil, Yiming Ying, Charles A. Micchelli

Abstract: Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efﬁciently by an alternating minimization algorithm. In addition, we provide a necessary and sufﬁcient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1

4 0.46861902 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

Author: Pawan Mudigonda, Vladimir Kolmogorov, Philip Torr

5 0.42602777 135 nips-2007-Multi-task Gaussian Process Prediction

Author: Edwin V. Bonilla, Kian M. Chai, Christopher Williams

6 0.4073236 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

7 0.39425626 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

8 0.3854717 112 nips-2007-Learning Monotonic Transformations for Classification

9 0.37435299 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

10 0.35223991 109 nips-2007-Kernels on Attributed Pointsets with Applications

11 0.34604231 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

12 0.34548157 99 nips-2007-Hierarchical Penalization

13 0.33895612 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

14 0.33386961 160 nips-2007-Random Features for Large-Scale Kernel Machines

15 0.32461786 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

16 0.3132661 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

17 0.27664912 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

18 0.27660376 118 nips-2007-Learning with Transformation Invariant Kernels

19 0.27626058 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

20 0.27332017 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.264), (5, 0.043), (13, 0.069), (16, 0.044), (21, 0.059), (31, 0.021), (34, 0.02), (35, 0.06), (47, 0.097), (49, 0.022), (83, 0.116), (85, 0.045), (87, 0.013), (90, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85247809 52 nips-2007-Competition Adds Complexity

Author: Judy Goldsmith, Martin Mundhenk

Abstract: It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for NEXPNP .

same-paper 2 0.74364847 134 nips-2007-Multi-Task Learning via Conic Programming

Author: Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai

3 0.58735764 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semideﬁnite relaxation that yields global training by eliminating local minima. 1

4 0.57640505 86 nips-2007-Exponential Family Predictive Representations of State

Author: David Wingate, Satinder S. Baveja

Abstract: In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufﬁcient statistic for history is necessary. Predictive representations of state (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the “Exponential family PSR,” which deﬁnes as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a parameter learning algorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the quality of our model with reinforcement learning by directly evaluating the control performance of the model. 1

5 0.57584995 84 nips-2007-Expectation Maximization and Posterior Constraints

Author: Kuzman Ganchev, Ben Taskar, João Gama

Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to ﬁnd a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difﬁcult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efﬁcient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1

6 0.57476956 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

7 0.57399261 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

8 0.57360011 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

9 0.57229638 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

10 0.57104492 24 nips-2007-An Analysis of Inference with the Universum

11 0.5707581 156 nips-2007-Predictive Matrix-Variate t Models

12 0.57072163 115 nips-2007-Learning the 2-D Topology of Images

13 0.5705989 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

14 0.57012814 158 nips-2007-Probabilistic Matrix Factorization

15 0.56791049 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

16 0.56757843 195 nips-2007-The Generalized FITC Approximation

17 0.5667612 102 nips-2007-Incremental Natural Actor-Critic Algorithms

18 0.56466991 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

19 0.56416464 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

20 0.56342185 180 nips-2007-Sparse Feature Learning for Deep Belief Networks