nips nips2013 nips2013-244 knowledge-graph by maker-knowledge-mining

244 nips-2013-Parametric Task Learning

Source: pdf

Author: Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima

Abstract: We introduce an extended formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle inﬁnitely many tasks parameterized by a continuous parameter. Our key ﬁnding is that, for a certain class of PTL problems, the path of the optimal task-wise solutions can be represented as piecewise-linear functions of the continuous task parameter. Based on this fact, we employ a parametric programming technique to obtain the common shared representation across all the continuously parameterized tasks. We show that our PTL formulation is useful in various scenarios such as learning under non-stationarity, cost-sensitive learning, and quantile regression. We demonstrate the advantage of our approach in these scenarios.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp Abstract We introduce an extended formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle inﬁnitely many tasks parameterized by a continuous parameter. [sent-13, score-0.325]

2 Our key ﬁnding is that, for a certain class of PTL problems, the path of the optimal task-wise solutions can be represented as piecewise-linear functions of the continuous task parameter. [sent-14, score-0.203]

3 Based on this fact, we employ a parametric programming technique to obtain the common shared representation across all the continuously parameterized tasks. [sent-15, score-0.28]

4 We show that our PTL formulation is useful in various scenarios such as learning under non-stationarity, cost-sensitive learning, and quantile regression. [sent-16, score-0.475]

5 1 Introduction Multi-task learning (MTL) has been studied for learning multiple related tasks simultaneously. [sent-18, score-0.049]

6 A key assumption behind MTL is that there exists a common shared representation across the tasks. [sent-19, score-0.083]

7 Many MTL algorithms attempt to ﬁnd such a common representation and at the same time to learn multiple tasks under that shared representation. [sent-20, score-0.132]

8 For example, we can enforce all the tasks to share a common feature subspace or a common set of variables by using an algorithm introduced in [1, 2] that alternately optimizes the shared representation and the task-wise solutions. [sent-21, score-0.281]

9 Although the standard MTL formulation can handle only a ﬁnite number of tasks, it is sometimes more natural to consider inﬁnitely many tasks parameterized by a continuous parameter, e. [sent-22, score-0.183]

10 , in learning under non-stationarity [3] where learning problems change over continuous time, costsensitive learning [4] where loss functions are asymmetric with continuous cost balance, and quantile regression [5] where the quantile is a continuous variable between zero and one. [sent-24, score-1.138]

11 In order to handle these inﬁnitely many parametrized tasks, we propose in this paper an extended formulation of MTL called parametric-task learning (PTL). [sent-25, score-0.073]

12 The key contribution of this paper is to show that, for a certain class of PTL problems, the optimal common representation shared across inﬁnitely many parameterized tasks can be obtainable. [sent-26, score-0.178]

13 Specif` ically, we develop an alternating minimization algorithm a la [1, 2] for ﬁnding the entire continuum of solutions and the common feature subspace (or the common set of variables) among inﬁnitely many parameterized tasks. [sent-27, score-0.299]

14 Our algorithm exploits the fact that, for those classes of PTL problems, the path of task-wise solutions is piecewise-linear in the task parameter. [sent-28, score-0.131]

15 We use the parametric programming technique [6, 7, 8, 9] for computing those piecewise linear solutions. [sent-29, score-0.162]

16 Let {(xi , yi )}i∈Nn be the set of n training instances, where xi ∈ X ⊆ Rd is the input and yi ∈ Y is the output. [sent-36, score-0.52]

17 We deﬁne wi (t) ∈ [0, 1], t ∈ NT as the weight of the ith instance for the tth task, where T is the number ⊤ of tasks. [sent-37, score-0.277]

18 It was shown [1] that the problem (1) is equivalent to ∑ ∑ γ ˜⊤ ˜ min wi (t)ℓt (r(yi , βt xi )) + ||B||2 , tr ˜ T {βt }t∈NT t∈NT i∈NN where B is the d × T matrix whose tth column is given by the vector βt , and ||B||tr := tr((BB ⊤ )1/2 ) is the trace norm of B. [sent-47, score-0.385]

19 As shown in [10], the trace norm is the convex upper envelope of the rank of B, and (1) can be interpreted as the problem of ﬁnding a common feature subspace across T tasks. [sent-48, score-0.143]

20 This problem is often referred to as multi-task feature learning. [sent-49, score-0.054]

21 If the matrix D is restricted to be diagonal, the formulation (1) is reduced to multi-task variable selection [11, 12]. [sent-50, score-0.053]

22 This algorithm alternately optimizes the task-wise solutions {βt }t∈NT and the com˜t can be independently mon representation matrix D. [sent-52, score-0.094]

23 3 Parametric-Task Learning (PTL) We consider the case where we have inﬁnitely many tasks parametrized by a single continuous parameter. [sent-58, score-0.104]

24 Let θ ∈ [θL , θU ] be a continuous task parameter. [sent-59, score-0.066]

25 Instead of the set of weights wi (t), t ∈ NT , we consider a weight function wi : [θL , θU ] → [0, 1] for each instance i ∈ Nn . [sent-60, score-0.348]

26 In PTL, we ˜ learn a parameter vector βθ ∈ Rd+1 as a continuous function of the task parameter θ: ∫ θU ∑ ∫ θU ⊤ ˜⊤ xi )) dθ + γ min wi (θ) ℓθ (r(yi , βθ ˜ βθ D−1 βθ dθ, (2) ˜ {βθ }θ∈[θL ,θU ] d D∈S++ ,tr(D)≤1 θL θL i∈Nn where, note that, the loss function ℓθ possibly depends on θ. [sent-61, score-0.364]

27 It takes 1 only if the ith instance is used in the tth task. [sent-64, score-0.08]

28 We slightly generalize the setup so that each instance can be used in multiple tasks with different weights. [sent-65, score-0.049]

29 1 2 Algorithm 1 A LTERNATING M INIMIZATION A LGORITHM FOR MTL [1] 1: Input: Data {(xi , yi )}i∈Nn and weights {wi (t)}i∈Nn ,t∈NT ; 2: Initialize: D ← Id /d (Id is d × d identity matrix) 3: while convergence condition is not true do 4: Step 1: For t = 1, . [sent-66, score-0.203]

30 , T do ∑ γ ˜ ˜ ˜ wi (t)ℓt (r(yi , β ⊤ xi )) + β ⊤ D−1 β βt ← arg min ˜ T β i∈Nn 5: Step 2: D ← ∑ C 1/2 ⊤ βt D−1 βt , = arg min 1/2 d ,tr(D)≤1 tr(C) D∈S++ t∈N T where C := BB ⊤ whose (j, k)th element is deﬁned as Cj,k := 6: end while ˜ 7: Output: {βt }t∈NT and D; ∑ t∈NT βtj βtk . [sent-69, score-0.349]

31 However, at ﬁrst glance, the PTL optimization problem (2) seems computationally intractable since we need to ﬁnd inﬁnitely many task-wise solutions as well as the common feature subspace (or the common set of variables if D is restricted to be diagonal) shared by inﬁnitely many tasks. [sent-71, score-0.246]

32 Our key ﬁnding is that, for a certain class of PTL problems, when D is ﬁxed, the optimal path of the ˜ task-wise solutions βθ is shown to be piecewise-linear in θ. [sent-72, score-0.1]

33 By exploiting this piecewise-linearity, we can efﬁciently handle inﬁnitely many parameterized tasks, and the optimal solutions of those class of PTL problems can be exactly computed. [sent-73, score-0.138]

34 ˜ In the following theorem, we prove that the task-wise solutions βθ is piecewise-linear in θ if the weight functions and the loss function satisfy certain conditions. [sent-74, score-0.149]

35 In the proof in Appendix A, we show that, if the weight functions and the loss function satisfy the conditions (a) or (b), the problem (3) is reformulated as a parametric quadratic program (parametric QP), where the parameter θ only appears in the linear term of the objective function. [sent-76, score-0.198]

36 As shown, for example, in [9], the optimal solution path of this class of parametric QP has a piecewise-linear form. [sent-77, score-0.149]

37 ˜ If βθ is piecewise-linear in θ, we can exactly compute the entire solution path by using parametric programming. [sent-78, score-0.149]

38 We start from the solution at θ = θL , and follow the path of the optimal solutions while θ is continuously increased. [sent-80, score-0.119]

39 Note that, by exploiting the piecewise linearity of βθ , we can compute the integral at Step 2 (Eq. [sent-83, score-0.075]

40 , λd ) where λj = ∑ for all j ∈ Nd , θU 2 βθ,j ′ dθ j ′ ∈Nd θL which can also be computed efﬁciently by exploiting the piecewise linearity of βθ . [sent-88, score-0.075]

41 Binary Classiﬁcation Under Non-Stationarity Suppose that we observe n training instances sequentially, and denote them as {(xi , yi , τi )}i∈Nn , where xi ∈ Rd , yi ∈ {−1, 1}, and τi is the time when the ith instance is observed. [sent-90, score-0.561]

42 Under non-stationarity, if we are requested to learn a classiﬁer to predict the output for a test input x observed at time τ , the training instances observed around time τ should have more inﬂuence on the classiﬁer than others. [sent-95, score-0.118]

43 Let wi (τ ) denote the weight of the ith instance when training a classiﬁer for a test point at time τ . [sent-96, score-0.282]

44 We can for example use the following triangular weight function (see Figure1):   1 + s−1 (τi − τ ) if τ − s ≤ τi < τ, wi (τ ) = (6) 1 − s−1 (τi − τ ) if τ ≤ τi < τ + s,  0 otherwise, where s > 0 determines the width of the triangular time windows. [sent-97, score-0.269]

45 The problem of training a classiﬁer for time τ is then formulated as ∑ ˜ ˜ min wi (τ ) max(0, 1 − yi β ⊤ xi ) + γ||β||2 , 2 ˜ β i∈Nn where we used the hinge loss. [sent-98, score-0.532]

46 3 In regularization path-following, one computes the optimal solution path w. [sent-99, score-0.092]

47 the regularization parameter, whereas we compute the optimal solution path w. [sent-102, score-0.092]

48 4 Figure 1: Examples of weight functions {wi (τ )}i∈Nn in non-stationary time-series learning. [sent-106, score-0.083]

49 Given a training instances (xi , yi ) at time τi for i = 1, . [sent-107, score-0.254]

50 , n under non-stationary condition, it is reasonable to use the weights {wi (τ )}i∈Nn as shown here when we learn a classiﬁer to predict the output of a test input at time τ . [sent-110, score-0.048]

51 If we have the belief that a set of classiﬁers for different time should have some common structure, we can apply our PTL approach to this problem. [sent-111, score-0.04]

52 If we consider a time interval τ ∈ [τL , τU ], the parametric-task feature learning problem is formulated as ∫ τU ∑ ∫ τU ⊤ ˜⊤ ˜ min wi (τ ) max(0, 1 − yi βτ xi ) dτ + γ βτ D−1 βτ dτ. [sent-112, score-0.553]

53 When the costs of false positives and false negatives are unequal, or when the numbers of positive and negative training instances are highly imbalanced, it is effective to use the cost-sensitive learning approach [16]. [sent-115, score-0.168]

54 Suppose that we are given a set of training instances {(xi , yi )}i∈Nn with xi ∈ Rd and yi ∈ {−1, 1}. [sent-116, score-0.532]

55 When the exact false positive and false negative costs in the test scenario are unknown [4], it is often desirable to train several cost-sensitive SVMs with different values of θ. [sent-118, score-0.121]

56 If we consider an interval θ ∈ [θL , θU ], 0 < θL < θU < 1, the parametric-task feature learning problem is formulated as ∫ θU ∑ ∫ θU ⊤ ˜⊤ xi ) dθ + γ min wi (θ) max(0, 1 − yi βθ ˜ βθ D−1 βθ dθ. [sent-120, score-0.553]

57 Figure 2 shows an example of joint cost-sensitive learning applied to a toy 2D binary classiﬁcation problem. [sent-122, score-0.078]

58 Jointly estimating multiple conditional quantile functions is often useful for exploring the stochastic relationship between X and Y (see Section 5 for an example of joint quantile regression problems). [sent-124, score-1.091]

59 Linear quantile regression along with L2 regularization [20] at order τ ∈ (0, 1) is formulated as { ∑ (1 − τ )|r| if r ≤ 0, ˜ ˜ min ρτ (yi − β ⊤ xi ) + γ||β||2 , ρτ (r) := 2 τ |r| if r > 0. [sent-125, score-0.693]

60 (a) Left plot is the results obtained by independently training each cost-sensitive SVMs. [sent-133, score-0.057]

61 (b) Right plot is the results obtained by jointly training inﬁnitely many cost-sensitive SVMs for all the continuum of θ ∈ [0. [sent-134, score-0.083]

62 95] using the methodology we present in this paper (both are trained with the same regularization parameter γ). [sent-136, score-0.055]

63 We assume that our data generating mechanism produces the training set {(xi , yi , τi )}i∈Nn with n = 100 as follows. [sent-143, score-0.217]

64 , (n − 1) 2π }, the output yi is ﬁrst determined as yi = 1 if i n n n is odd, while yi = −1 if i is even. [sent-147, score-0.552]

65 Then, xi ∈ Rd is generated as xi1 ∼ N (yi cos τi , 12 ), xi2 ∼ N (yi sin τi , 12 ), xij ∼ N (0, 12 ), ∀j ∈ {3, . [sent-148, score-0.094]

66 Namely, only the ﬁrst two dimensions of x differ in two classes, and the remaining d − 2 dimensions are considered as noise. [sent-152, score-0.066]

67 In addition, according to the value of τi , the means of the class-wise distributions in the ﬁrst two dimensions gradually change. [sent-153, score-0.064]

68 Here, we applied our PT feature learning approach with triangular time windows in (6) with s = 0. [sent-157, score-0.09]

69 Figure 4 shows the mis-classiﬁcation rate of PT feature learning (PTFL) and ordinary independent learning (IND) on a similarly generated test sample with size 1000. [sent-159, score-0.103]

70 When the input dimension d = 2, there is no advantage for learning common features since these two input dimensions are important for classiﬁcation. [sent-160, score-0.123]

71 On the other hand, as d increases, PT feature learning becomes more and more advantageous. [sent-161, score-0.054]

72 6 Figure 3: The ﬁrst 2 input dimensions of artiﬁcial example at τ = 0, 0. [sent-163, score-0.058]

73 The class-wise distributions in these two dimensions gradually change with τ ∈ [0, 2π]. [sent-166, score-0.092]

74 The red symbols indicate the results of our PT feature learning (PTFL) whereas the blue symbols indicate ordinary independent learning (IND). [sent-185, score-0.126]

75 Joint Cost-Sensitive SVM Learning on Benchmark Datasets Here, we report the experimental results on joint cost-sensitive SVM learning discussed in Section 4. [sent-189, score-0.058]

76 In PTFL and PTVS, we learned common feature subspaces and common sets of variables shared across the continuum of cost-sensitive SVM for θ ∈ [0. [sent-191, score-0.227]

77 Joint Quantile Regression Finally, we applied PT feature learning to joint quantile regression problems. [sent-210, score-0.607]

78 Given a training set {(xi , yi )}i∈Nn , we ﬁrst estimated conditional mean function E[Y |X = ˆ ˆ x] by least-square regression, and computed the residual ri := yi − E[Y |X = xi ], where E is the estimated conditional mean function. [sent-212, score-0.626]

79 Then, we applied PT feature learning to {(xi , ri )}i∈Nn , and ˆ ˆ −1 ˆ estimated the conditional τ th quantile function as FY |X=x (τ ) := E[Y |X = xi ] + fres (x|τ ), where ˆ fres (·|τ ) is the estimated τ th quantile regression ﬁtted to the residuals. [sent-213, score-1.29]

80 When multiple quantile regressions with different τ s are independently learned, we often encounter a notorious problem known as quantile crossing (see Section 2. [sent-214, score-0.983]

81 For example, in Figure 5(a), some of the estimated conditional quantile functions cross each other (which never happens in the true conditional quantile functions). [sent-216, score-1.039]

82 In the simplest case, if we assume that the data is homoscedastic (i. [sent-218, score-0.042]

83 , the conditional distribution P (Y |x) does not depend on x except its location), 7 Table 1: Average (and standard deviation) of test errors obtained by joint cost-sensitive SVMs on benchmark datasets. [sent-220, score-0.133]

84 n is the sample size, d is the input dimension, Ind indicates the results when each cost-sensitive SVM was trained independently, while PTFL and PTVS indicate the results from PT feature learning and PT feature selection, respectively. [sent-221, score-0.154]

85 38) quantile regressions at different τ s can be obtained by just vertically shifting other quantile regression function (see Figure 5(f)). [sent-283, score-1.005]

86 Our PT feature learning approach, when applied to the joint quantile regression problem, allows us to interpolate these two extreme cases. [sent-284, score-0.607]

87 Figure 5 shows a joint QR example on the bone mineral density (BMD) data [21]. [sent-285, score-0.058]

88 When (a) γ → 0, our approach is identical with independently estimating each quantile regression, while it coincides with homoscedastic case when (f) γ → ∞. [sent-287, score-0.515]

89 95 conditional quantile functions 3 2 1 0 -1 -2 -2 -1 0 -1 2 (Standardized) Relative BMD Change (Standardized) Relative BMD Change 3 -1. [sent-310, score-0.538]

90 95 conditional quantile functions -2 2 (Standardized) Age (a) γ → 0 4 0. [sent-318, score-0.538]

91 95 conditional quantile functions 3 -2 -2 -2 (Standardized) Relative BMD Change 4 0. [sent-324, score-0.538]

92 95 conditional quantile functions 3 (Standardized) Relative BMD Change 0. [sent-330, score-0.538]

93 95 conditional quantile functions 3 (Standardized) Relative BMD Change (Standardized) Relative BMD Change 4 0. [sent-336, score-0.538]

94 95 conditional quantile functions 3 2 1 0 -1 -2 -2 -1. [sent-342, score-0.538]

95 5 2 (f) γ → ∞ Figure 5: Joint quantile regression examples on BMD data [21] for six different γs. [sent-350, score-0.495]

96 6 Conclusions In this paper, we introduced parametric-task learning (PTL) approach that can systematically handle inﬁnitely many tasks parameterized by a continuous parameter. [sent-351, score-0.177]

97 We illustrated the usefulness of this approach by providing three examples that can be naturally formulated as PTL. [sent-352, score-0.041]

98 An algorithm for the solution of the parametric quadratic programming problem. [sent-405, score-0.132]

99 Joint covariate selection and joint sbspace selection for multiple classiﬁcation problems. [sent-427, score-0.112]

100 The entire regularization path for the support vector machine. [sent-447, score-0.092]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ptl', 0.548), ('quantile', 0.449), ('nn', 0.266), ('mtl', 0.186), ('standardized', 0.184), ('yi', 0.184), ('bmd', 0.169), ('ptfl', 0.169), ('wi', 0.151), ('ind', 0.137), ('nt', 0.113), ('ptvs', 0.105), ('pt', 0.103), ('nitely', 0.095), ('xi', 0.094), ('parametric', 0.091), ('nagoya', 0.074), ('age', 0.068), ('regressions', 0.061), ('svms', 0.06), ('joint', 0.058), ('path', 0.058), ('feature', 0.054), ('conditional', 0.052), ('japan', 0.051), ('tth', 0.051), ('svm', 0.051), ('continuum', 0.05), ('tasks', 0.049), ('false', 0.049), ('parameterized', 0.046), ('weight', 0.046), ('dh', 0.046), ('regression', 0.046), ('fy', 0.044), ('classi', 0.044), ('shared', 0.043), ('rd', 0.042), ('fres', 0.042), ('homoscedastic', 0.042), ('inimization', 0.042), ('lternating', 0.042), ('solutions', 0.042), ('programming', 0.041), ('tokyo', 0.041), ('formulated', 0.041), ('common', 0.04), ('ch', 0.04), ('tr', 0.038), ('lgorithm', 0.037), ('ah', 0.037), ('bh', 0.037), ('mext', 0.037), ('instances', 0.037), ('functions', 0.037), ('triangular', 0.036), ('continuous', 0.035), ('id', 0.034), ('regularization', 0.034), ('bb', 0.033), ('dimensions', 0.033), ('training', 0.033), ('kakenhi', 0.032), ('takeuchi', 0.032), ('task', 0.031), ('gradually', 0.031), ('th', 0.031), ('piecewise', 0.03), ('min', 0.029), ('ith', 0.029), ('change', 0.028), ('alternately', 0.028), ('residual', 0.027), ('subspace', 0.027), ('er', 0.027), ('argyriou', 0.027), ('qp', 0.027), ('handle', 0.027), ('selection', 0.027), ('ordinary', 0.026), ('formulation', 0.026), ('breast', 0.026), ('input', 0.025), ('loss', 0.024), ('independently', 0.024), ('symbols', 0.023), ('exploiting', 0.023), ('test', 0.023), ('cancer', 0.023), ('arg', 0.023), ('trace', 0.022), ('linearity', 0.022), ('trained', 0.021), ('relative', 0.021), ('systematically', 0.02), ('af', 0.02), ('toy', 0.02), ('parametrized', 0.02), ('condition', 0.019), ('continuously', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 244 nips-2013-Parametric Task Learning

Author: Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima

2 0.13558498 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: In this paper we introduce a novel method that can efﬁciently estimate a family of hierarchical dense sets in high-dimensional distributions. Our method can be regarded as a natural extension of the one-class SVM (OCSVM) algorithm that ﬁnds multiple parallel separating hyperplanes in a reproducing kernel Hilbert space. We call our method q-OCSVM, as it can be used to estimate q quantiles of a highdimensional distribution. For this purpose, we introduce a new global convex optimization program that ﬁnds all estimated sets at once and show that it can be solved efﬁciently. We prove the correctness of our method and present empirical results that demonstrate its superiority over existing methods. 1

3 0.081423931 211 nips-2013-Non-Linear Domain Adaptation with Boosting

Author: Carlos J. Becker, Christos M. Christoudias, Pascal Fua

Abstract: A common assumption in machine vision is that the training and test samples are drawn from the same distribution. However, there are many problems when this assumption is grossly violated, as in bio-medical applications where different acquisitions can generate drastic variations in the appearance of the data due to changing experimental conditions. This problem is accentuated with 3D data, for which annotation is very time-consuming, limiting the amount of data that can be labeled in new acquisitions for training. In this paper we present a multitask learning algorithm for domain adaptation based on boosting. Unlike previous approaches that learn task-speciﬁc decision boundaries, our method learns a single decision boundary in a shared feature space, common to all tasks. We use the boosting-trick to learn a non-linear mapping of the observations in each task, with no need for speciﬁc a-priori knowledge of its global analytical form. This yields a more parameter-free domain adaptation approach that successfully leverages learning on new tasks where labeled data is scarce. We evaluate our approach on two challenging bio-medical datasets and achieve a signiﬁcant improvement over the state of the art. 1

4 0.070973143 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression

Author: Samory Kpotufe, Vikas Garg

Abstract: We present the ﬁrst result for kernel regression where the procedure adapts locally at a point x to both the unknown local dimension of the metric space X and the unknown H¨ lder-continuity of the regression function at x. The result holds with o high probability simultaneously at all points x in a general metric space X of unknown structure. 1

5 0.066063337 75 nips-2013-Convex Two-Layer Modeling

Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans

Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difﬁcult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1

6 0.06185611 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

7 0.057228137 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms

8 0.055639457 318 nips-2013-Structured Learning via Logistic Regression

9 0.054314476 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

10 0.054120891 158 nips-2013-Learning Multiple Models via Regularized Weighting

11 0.051799465 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

12 0.049113896 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

13 0.048663806 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

14 0.048459835 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

15 0.046481702 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

16 0.046439286 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

17 0.046338011 91 nips-2013-Dirty Statistical Models

18 0.045538485 65 nips-2013-Compressive Feature Learning

19 0.045481104 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

20 0.045002636 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.142), (1, 0.04), (2, 0.02), (3, -0.017), (4, 0.034), (5, 0.012), (6, -0.02), (7, 0.034), (8, -0.031), (9, 0.022), (10, 0.008), (11, -0.017), (12, -0.029), (13, -0.022), (14, 0.033), (15, 0.015), (16, 0.004), (17, 0.035), (18, -0.011), (19, -0.014), (20, -0.053), (21, 0.048), (22, 0.048), (23, 0.066), (24, -0.001), (25, -0.017), (26, 0.002), (27, -0.063), (28, 0.021), (29, -0.068), (30, -0.053), (31, 0.035), (32, -0.012), (33, 0.008), (34, 0.052), (35, 0.046), (36, -0.04), (37, -0.151), (38, -0.055), (39, 0.109), (40, 0.01), (41, 0.053), (42, -0.089), (43, 0.048), (44, -0.147), (45, -0.085), (46, 0.0), (47, -0.004), (48, 0.021), (49, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9088071 244 nips-2013-Parametric Task Learning

Author: Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima

2 0.80270451 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms

Author: Yu Zhang

Abstract: All the existing multi-task local learning methods are deﬁned on homogeneous neighborhood which consists of all data points from only one task. In this paper, different from existing methods, we propose local learning methods for multitask classiﬁcation and regression problems based on heterogeneous neighborhood which is deﬁned on data points from all tasks. Speciﬁcally, we extend the knearest-neighbor classiﬁer by formulating the decision function for each data point as a weighted voting among the neighbors from all tasks where the weights are task-speciﬁc. By deﬁning a regularizer to enforce the task-speciﬁc weight matrix to approach a symmetric one, a regularized objective function is proposed and an efﬁcient coordinate descent method is developed to solve it. For regression problems, we extend the kernel regression to multi-task setting in a similar way to the classiﬁcation case. Experiments on some toy data and real-world datasets demonstrate the effectiveness of our proposed methods. 1

3 0.65598541 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

Author: Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang

Abstract: We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classiﬁer of weak classiﬁers through directly minimizing empirical classiﬁcation error over labeled training examples; once the training classiﬁcation error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classiﬁers to maximize any targeted arbitrarily deﬁned margins until reaching a local coordinatewise maximum of the margins in a certain sense. Experimental results on a collection of machine-learning benchmark datasets show that DirectBoost gives better results than AdaBoost, LogitBoost, LPBoost with column generation and BrownBoost, and is noise tolerant when it maximizes an n′ th order bottom sample margin. 1

4 0.63817501 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

5 0.5737896 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression

Author: Samory Kpotufe, Vikas Garg

6 0.56644118 211 nips-2013-Non-Linear Domain Adaptation with Boosting

7 0.56161547 76 nips-2013-Correlated random features for fast semi-supervised learning

8 0.53838903 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space

9 0.52805275 158 nips-2013-Learning Multiple Models via Regularized Weighting

10 0.52476555 202 nips-2013-Multiclass Total Variation Clustering

11 0.51667649 75 nips-2013-Convex Two-Layer Modeling

12 0.50711948 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

13 0.50062418 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

14 0.49778295 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

15 0.48248136 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics

16 0.4813579 171 nips-2013-Learning with Noisy Labels

17 0.4779855 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

18 0.47469628 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

19 0.47176695 65 nips-2013-Compressive Feature Learning

20 0.46177658 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.023), (33, 0.158), (34, 0.099), (41, 0.043), (49, 0.027), (56, 0.081), (70, 0.036), (72, 0.292), (85, 0.024), (89, 0.034), (93, 0.061), (95, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81296819 126 nips-2013-Gaussian Process Conditional Copulas with Applications to Financial Time Series

Author: José Miguel Hernández-Lobato, James R. Lloyd, Daniel Hernández-Lobato

Abstract: The estimation of dependencies between multiple variables is a central problem in the analysis of ﬁnancial time series. A common approach is to express these dependencies in terms of a copula function. Typically the copula function is assumed to be constant but this may be inaccurate when there are covariates that could have a large inﬂuence on the dependence structure of the data. To account for this, a Bayesian framework for the estimation of conditional copulas is proposed. In this framework the parameters of a copula are non-linearly related to some arbitrary conditioning variables. We evaluate the ability of our method to predict time-varying dependencies on several equities and currencies and observe consistent performance gains compared to static copula models and other timevarying copula methods. 1

2 0.80691403 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

Author: Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng

Abstract: Knowledge bases are an important resource for question answering and other tasks but often suffer from incompleteness and lack of ability to reason over their discrete entities and relationships. In this paper we introduce an expressive neural tensor network suitable for reasoning over relationships between two entities. Previous work represented entities as either discrete atomic units or with a single entity vector representation. We show that performance can be improved when entities are represented as an average of their constituting word vectors. This allows sharing of statistical strength between, for instance, facts involving the “Sumatran tiger” and “Bengal tiger.” Lastly, we demonstrate that all models improve when these word vectors are initialized with vectors learned from unsupervised large corpora. We assess the model by considering the problem of predicting additional true relations between entities given a subset of the knowledge base. Our model outperforms previous models and can classify unseen relationships in WordNet and FreeBase with an accuracy of 86.2% and 90.0%, respectively. 1

same-paper 3 0.74737984 244 nips-2013-Parametric Task Learning

Author: Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima

4 0.73108643 167 nips-2013-Learning the Local Statistics of Optical Flow

Author: Dan Rosenbaum, Daniel Zoran, Yair Weiss

Abstract: Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical ﬂow to learn the local statistics of optical ﬂow and compare the learned models to prior models assumed by computer vision researchers. We ﬁnd that a Gaussian mixture model (GMM) with 64 components provides a signiﬁcantly better model for local ﬂow statistics when compared to commonly used models. We investigate the source of the GMM’s success and show it is related to an explicit representation of ﬂow boundaries. We also learn a model that jointly models the local intensity pattern and the local optical ﬂow. In accordance with the assumptions often made in computer vision, the model learns that ﬂow boundaries are more likely at intensity boundaries. However, when evaluated on a large dataset, this dependency is very weak and the beneﬁt of conditioning ﬂow estimation on the local intensity pattern is marginal. 1

5 0.68764299 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin

Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we ﬁnd several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1

6 0.66253018 336 nips-2013-Translating Embeddings for Modeling Multi-relational Data

7 0.60122335 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

8 0.60046148 201 nips-2013-Multi-Task Bayesian Optimization

9 0.60010564 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

10 0.59955668 251 nips-2013-Predicting Parameters in Deep Learning

11 0.59765583 301 nips-2013-Sparse Additive Text Models with Low Rank Background

12 0.59649777 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

13 0.59562188 331 nips-2013-Top-Down Regularization of Deep Belief Networks

14 0.59550601 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

15 0.59514081 30 nips-2013-Adaptive dropout for training deep neural networks

16 0.5950743 99 nips-2013-Dropout Training as Adaptive Regularization

17 0.59493983 173 nips-2013-Least Informative Dimensions

18 0.59478867 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

19 0.59450984 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

20 0.59406817 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles