nips nips2011 nips2011-134 knowledge-graph by maker-knowledge-mining

134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning


Source: pdf

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu † Abstract Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. [sent-11, score-1.013]

2 While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. [sent-12, score-0.487]

3 1 Introduction Nonparametric Bayesian latent variable models have recently gained remarkable popularity in statistics and machine learning, partly owning to their desirable “nonparametric” nature which allows practitioners to “sidestep” the difficult model selection problem, e. [sent-16, score-0.29]

4 , figuring out the unknown number of components (or classes) in a mixture model [2] or determining the unknown dimensionality of latent features [12], by using an appropriate prior distribution with a large support. [sent-18, score-0.421]

5 Among the most commonly used priors are Gaussian process (GP) [24], Dirichlet process (DP) [2] and Indian buffet process (IBP) [12]. [sent-19, score-0.206]

6 However, standard nonparametric Bayesian models are limited in that they usually make very strict and unrealistic assumptions on data, such as that observations being homogeneous or exchangeable. [sent-20, score-0.177]

7 However, all these methods rely solely on crafting a nonparametric Bayesian prior encoding some special structure, which can indirectly influence the posterior distribution of interest via trading-off with likelihood models. [sent-23, score-0.343]

8 Since it is the posterior distributions, which capture the latent structures to be learned, that are of our ultimate interest, an arguably more direct way to learn a desirable latent-variable model is to impose posterior regularization (i. [sent-24, score-0.748]

9 , regularization on posterior distributions), as we will explore in this paper. [sent-26, score-0.222]

10 Another reason for using posterior regularization is that in some cases it is more natural and easier to incorporate domain knowledge, such as the large-margin [15, 31] or manifold constraints [14], directly on posterior distributions rather than through priors, as shown in this paper. [sent-27, score-0.547]

11 Recent attempts toward learning a posterior distribution of model parameters include the “learning from measurements” [19], maximum entropy discrimination [15] and MedLDA [31]. [sent-29, score-0.166]

12 To our knowledge, very few attempts have been made to impose posterior regularization on nonparametric Bayesian latent variable models. [sent-31, score-0.725]

13 iSVM is a latent class model that assigns each data example to a single mixture component for classification and the unknown number of mixture components is automatically resolved from data. [sent-33, score-0.352]

14 In this paper, we present a general formulation of performing nonparametric Bayesian inference subject to appropriate posterior constraints. [sent-34, score-0.457]

15 Technically, although it is intuitively natural for MLE-based methods to include a regularization term on the posterior distributions of latent variables, this is not straightforward for Bayesian inference because we do not have an optimization objective to be regularized. [sent-37, score-0.659]

16 Under this optimization framework, we incorporate posterior constraints to do regularized Bayesian inference, with a penalty term that measures the violation of the constraints. [sent-39, score-0.329]

17 Both iLSVM and MT-iLSVM are special cases that explore the large-margin principle to consider supervising information for learning predictive latent features, which are good for classification or multi-task learning. [sent-40, score-0.325]

18 We use the nonparametric IBP prior to allow the models to have an unbounded number of latent features. [sent-41, score-0.508]

19 The regularized inference problem can be efficiently solved with an iterative procedure, which leverages existing high-performance convex optimization techniques. [sent-42, score-0.185]

20 Related Work: As stated above, both iLSVM and MT-iLSVM generalize the ideas of iSVM to infinite latent feature models. [sent-43, score-0.29]

21 For multi-task learning, nonparametric Bayesian models have been developed in [28, 23] for learning features shared by multiple tasks. [sent-44, score-0.277]

22 But these methods are based on standard Bayesian inference, without the ability to consider posterior regularization, such as the large-margin constraints or the manifold constraints [14]. [sent-45, score-0.384]

23 Finally, MT-iLSVM is a nonparametric Bayesian generalization of the popular multi-task learning methods [1, 16], as explained shortly. [sent-46, score-0.235]

24 2 Regularized Bayesian Inference with Posterior Constraints In this section, we present the general framework of regularized Bayesian inference with posterior constraints. [sent-47, score-0.351]

25 1 Bayesian Inference as a Learning Model Let M be a model space, containing any variables whose posterior distributions we are trying to infer. [sent-50, score-0.199]

26 Then, by the Bayes’ theorem, the posterior distribution is p(M|x1 , · · · , xN ) = ∏ π(M) N p(xn |M) n=1 , p(x1 , · · · , xN ) (1) where p(x1 , · · · , xN ) is the marginal likelihood or evidence of observed data. [sent-52, score-0.166]

27 Zellner [29] first showed that the posterior distribution due to the Bayes’ theorem is the solution of the problem min p(M) s. [sent-53, score-0.166]

28 Below, we study how to extend the basic results to incorporate posterior constraints in Bayesian inference. [sent-60, score-0.258]

29 In general, regularized Bayesian inference solves the constrained optimization problem min p(M),ξ s. [sent-64, score-0.185]

30 We can use an iterative procedure to do the regularized Bayesian inference based on convex optimization techniques. [sent-71, score-0.185]

31 The basic setup is that we project each data example x ∈ X ⊂ RD to a latent feature vector z. [sent-77, score-0.29]

32 Instead of pre-specifying a fixed dimension of z, we resort to the nonparametric Bayesian methods and let z have an infinite number of dimensions. [sent-80, score-0.177]

33 To make the expected number of active latent features finite, we put the well-studied IBP prior on the binary feature matrix Z. [sent-81, score-0.448]

34 We focus on its stick-breaking construction [25], which is good for developing efficient inference methods. [sent-84, score-0.157]

35 2 Infinite Latent Support Vector Machines We consider the multi-way classification, where each training data is provided with a categorical def label y, where y ∈ Y = {1, · · · , L}. [sent-91, score-0.188]

36 For binary classification and regression, similar procedure can be applied to impose large-margin constraints on posterior distributions. [sent-92, score-0.325]

37 Suppose that the latent features z are given, then we can define the latent discriminant function as def f (y, x, z; η) = η ⊤ g(y, x, z), (5) where g(y, x, z) is a vector stacking of L subvectors2 of which the yth is z⊤ and all the others are zero. [sent-93, score-0.893]

38 We can consider the input features x or its certain statistics in combination with the latent features z to define a classifier boundary, by simply concatenating them in the subvectors. [sent-95, score-0.49]

39 , a weighted average considering all possible values of Z) of the latent discriminant function. [sent-100, score-0.359]

40 To make the model fully Bayesian, we also treat η as random and aim to infer the posterior distribution p(Z, η) from given data. [sent-101, score-0.202]

41 More formally, the effective discriminant function f : X ×Y → R is def f (y, x; p(Z, η)) = Ep(Z,η) [f (y, x, z; η)] = Ep(Z,η) [η ⊤ g(y, x, z)]. [sent-102, score-0.213]

42 (6) Note that although the number of latent features is allowed to be infinite, with probability one, the number of non-zero features is finite when only a finite number of data are observed, under the IBP prior. [sent-103, score-0.49]

43 With the above definitions, we define the Ppost (ξ) in problem (3) using large-margin constraints as { } ∀n ∈ Itr : f (yn , xn ; p(Z, η))−f (y, xn ; p(Z, η)) ≥ ℓ(y, yn )−ξn , ∀y def c Ppost (ξ) = p(Z, η) (7) ξn ≥ 0 ∑ def p and define the penalty function as U c (ξ) = C n∈Itr ξn , where p ≥ 1. [sent-108, score-0.58]

44 If p is 1, minimizing c U (ξ) is ∑ equivalent to minimizing the hinge-loss (or ℓ1 -loss) Rc of the prediction rule (9), where h c Rh = C n∈Itr maxy (f (y, xn ; p(Z, η)) + ℓ(y, yn ) − f (yn , xn ; p(Z, η))); if p is 2, the surrogate loss is the ℓ2 -loss. [sent-109, score-0.2]

45 In order to robustly estimate the latent matrix Z, we need a reasonable amount of data. [sent-115, score-0.29]

46 Testing: to make prediction on test examples, we put both training and test data together to do the regularized Bayesian inference. [sent-125, score-0.196]

47 For training data, we impose the above large-margin constraints because of the awareness of their true labels, while for test data, we do the inference without the large-margin constraints since we do not know their true labels. [sent-126, score-0.405]

48 After inference, we make the prediction via the rule def y ∗ = arg max f (y, x; p(Z, η)). [sent-127, score-0.144]

49 We can also cast the problem as a transductive inference problem by imposing additional constraints on test data [17]. [sent-129, score-0.266]

50 In particular, learning a common latent representation shared by all the related tasks has proven to be an effective way to capture task relationships [1, 3, 23]. [sent-134, score-0.407]

51 Below, we present the multi-task infinite latent SVM (MT-iLSVM) for learning a common binary projection matrix Z to capture the relationships among multiple tasks. [sent-135, score-0.391]

52 4 Xn W IBP( ) IBP( ) Xmn Wmn Figure 1: Graphical structures of (a) infinite latent SVM (iLSVM); and (b) multi-task infinite latent SVM (MT-iLSVM). [sent-139, score-0.614]

53 Let Dm = {(xmn , ymn )}n∈Itr be the training data for task m. [sent-145, score-0.206]

54 If the latent matrix Z is given, we define the latent discriminant function for task m as def fm (x, Z; η m ) = (Zη m )⊤ x = η ⊤ (Z⊤ x). [sent-148, score-0.865]

55 If we let ςm = Zη m , then ςm are the actual parameters of task m and all ςm in different tasks are coupled by sharing the same latent matrix Z. [sent-150, score-0.377]

56 Another view is that each task m has its own parameters η m , but all the tasks share the same latent features Z⊤ x, which is a projection of the input features x and Z is the latent projection matrix. [sent-151, score-0.947]

57 As such, our method can be viewed as a nonparametric Bayesian treatment of alternating structure optimization (ASO) [1], which learns a single projection matrix with a pre-specified latent dimension. [sent-152, score-0.507]

58 Moreover, different from [16], which learns a binary vector with known dimensionality to select features or kernels on x, we learn an unbounded projection matrix Z using nonparametric Bayesian techniques. [sent-153, score-0.389]

59 , η m are also random variables) and define the effective discriminant function for task m as the expectation def fm (x; p(Z, η)) = Ep(Z,η) [fm (x, Z; η m )] = Ep(Z,η) [Zη m ]⊤ x. [sent-156, score-0.317]

60 Similarly, we do regularized Then, the prediction rule for task m is naturally ∑ def Bayesian inference by imposing the following constraints and defining U M T (ξ) = C m,n∈Itr ξmn m { } m def ∀m, ∀n ∈ Itr : ymn Ep(Z,η) [Zη m ]⊤ xmn ≥ 1 − ξmn MT Ppost (ξ) = p(Z, η) . [sent-158, score-0.891]

61 (12) ξmn ≥ 0 Similar as in iLSVM, minimizing U M T (ξ) is equivalent to minimizing the hinge-loss RM T of the h ∑ multiple binary prediction rules, where RM T = C m,n∈Itr max(0, 1−ymn Ep(Z,η) [Zη m ]⊤ xmn ). [sent-159, score-0.162]

62 m h Finally, to obtain more data to estimate the latent Z, we also relate it to observed data by defining the likelihood model p(xmn |wmn , Z, λ2 ) = N (xmn |Zwmn , λ2 I), (13) mn mn ∏ 2 where wmn is a vector. [sent-160, score-1.113]

63 We assume W has an independent prior π(W) = mn N (wmn |0, σm0 I). [sent-161, score-0.337]

64 For testing, we use the same strategy as in iLSVM to do Bayesian inference on both training and test data. [sent-164, score-0.185]

65 4 Inference with Truncated Mean-Field Constraints We briefly discuss how to do regularized Bayesian inference (3) with the large-margin constraints for MT-iLSVM. [sent-169, score-0.277]

66 To make the problem easier to solve, we use the stick-breaking representation of IBP, which includes the auxiliary variables ν, and infer the posterior p(ν, W, Z, η). [sent-171, score-0.229]

67 Now, we focus on p(Z) and provide insights on how the large-margin constraints regularize the procedure of inferring the latent matrix Z. [sent-181, score-0.382]

68 The last term k of ϑdk is due to the large-margin posterior constraints as defined in Eq. [sent-184, score-0.258]

69 ∏ Infer p(η) and solve for ω and ξ: We optimize L over p(η) and can get p(η) = m p(η m ), where p(η m ) ∝ π(η m ) exp{η ⊤ µm }, m ∑ ⊤ and µm = m n∈Itr ymn ωmn (ψ xmn ). [sent-186, score-0.259]

70 Our results demonstrate the merits inherited from both Bayesian nonparametrics and large-margin learning. [sent-194, score-0.152]

71 1 Multi-way Classification We evaluate the infinite latent SVM (iLSVM) for classification on the real TRECVID2003 and Flickr image datasets, which have been extensively evaluated in the context of learning finite latent feature models [8]. [sent-196, score-0.58]

72 TRECVID2003 consists of 1078 video key-frames, and each example has two types of features – 1894-dimension binary vector of text features and 165-dimension HSV color histogram. [sent-197, score-0.231]

73 Here, we consider the real-valued features only by using normal distributions for x. [sent-205, score-0.133]

74 We compare iLSVM with the large-margin Harmonium (MMH) [8], which was shown to outperform many other latent feature models [8], and two decoupled approaches – EFH+SVM and IBP+SVM. [sent-206, score-0.369]

75 EFH+SVM uses the exponential family Harmonium (EFH) [27] to discover latent features and then learns a multi-way SVM classifier. [sent-207, score-0.39]

76 IBP+SVM is similar, but uses an IBP factor analysis model [12] to discover latent features. [sent-208, score-0.29]

77 As finite models, both MMH and EFH+SVM need to pre-specify the dimensionality of latent features. [sent-209, score-0.29]

78 We can see that iLSVM can achieve comparable performance with the nearly optimal MMH, without needing to pre-specify the latent feature dimension4 , and is much better than the decoupled approaches (i. [sent-218, score-0.369]

79 The Scene dataset consists 1211 training and 1196 test examples, each having 294 features, and the number of labels (or tasks) per example for this dataset is 6. [sent-304, score-0.129]

80 In order to compare with the above methods, we follow the same setup described in [3, 4] and similarly we create dummy variables for those features that are categorical forming a total of 19 student-dependent features and 8 school-dependent features. [sent-309, score-0.2]

81 We use the same 10 random splits5 of the data, so that 75% of the examples from each school (task) belong to the training set and 25% to the test set. [sent-310, score-0.174]

82 On average, the training set includes about 80 students per school and the test set about 30 students per school. [sent-311, score-0.306]

83 We can see that the large-margin MT-iLSVM performs much better than other nonparametric Bayesian methods and MT-IBP+SVM, which separates the inference of latent features from learning the classifiers. [sent-320, score-0.681]

84 School Data: We use the percentage of explained variance [4] as the measure of the regression performance, which is defined as the total variance of the data minus the sum-squared error on the test set as a percentage of the total variance. [sent-321, score-0.253]

85 For MT-iLSVM and MT-IBP+SVM, we also report the results achieved by using both the latent features (i. [sent-323, score-0.39]

86 html This decoupled approach is in fact an one-iteration MT-iLSVM, where we first infer the shared latent matrix Z and then learn an SVM classifier for each task. [sent-329, score-0.405]

87 1 Table 4: Percentage of explained variance and running time by MT-iLSVM with various training sizes. [sent-347, score-0.128]

88 5 the results in Table 3, we can see that the multi-task latent SVM (i. [sent-372, score-0.29]

89 Again, the joint MTiLSVM performs much better than the decoupled method MT-IBP+SVM, which separates the latent feature inference from the training of large-margin classifiers. [sent-375, score-0.527]

90 Finally, using both latent features and the original input features can boost the performance slightly for MT-iLSVM, while much more significantly for the decoupled MT-IBP+SVM. [sent-376, score-0.569]

91 5 C (c) School Figure 2: Sensitivity study of MT-iLSVM: (a) classification accuracy with different α; (b) classification accuracy with different C; and (c) percentage of explained variance with different C. [sent-392, score-0.202]

92 We use the first b% (b = 50, 60, 70, 80, 90, 100) of the training data in each of the 10 random splits as training set and use the corresponding test data as test set. [sent-400, score-0.142]

93 If we further mn mn estimate them by maximizing the objective function, the performance does not change much (±0. [sent-405, score-0.674]

94 5 Conclusions and Future Work We first present a general framework for doing regularized Bayesian inference subject to appropriate constraints, which are imposed directly on the posterior distributions. [sent-408, score-0.351]

95 Then, we particularly concentrate on developing two nonparametric Bayesian models to learn predictive latent features for classification and multi-task learning, respectively, by exploring the large-margin principle to define posterior constraints. [sent-409, score-0.811]

96 Both models allow the latent dimension to be automatically resolved from the data. [sent-410, score-0.29]

97 Regularized Bayesian inference offers a general framework for considering posterior regularization in performing nonparametric Bayesian inference. [sent-412, score-0.513]

98 For future work, we plan to study other posterior regularization beyond the large-margin constraints, such as posterior constraints defined on manifold structures [14], and investigate how posterior regularization can be used in other interesting nonparametric Bayesian models [5, 26]. [sent-413, score-0.947]

99 Mixture of Dirichlet process with applications to Bayesian nonparametric problems. [sent-428, score-0.204]

100 Infinite latent feature models and the Indian buffet process. [sent-494, score-0.382]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mn', 0.337), ('ilsvm', 0.32), ('latent', 0.29), ('ibp', 0.274), ('itr', 0.256), ('bayesian', 0.178), ('nonparametric', 0.177), ('posterior', 0.166), ('svm', 0.157), ('wmn', 0.149), ('def', 0.144), ('xmn', 0.131), ('efh', 0.128), ('ymn', 0.128), ('yeast', 0.127), ('inference', 0.114), ('ppost', 0.107), ('school', 0.103), ('features', 0.1), ('flickr', 0.094), ('constraints', 0.092), ('buffet', 0.092), ('ep', 0.087), ('mmh', 0.085), ('classi', 0.083), ('indian', 0.079), ('decoupled', 0.079), ('xn', 0.074), ('dk', 0.074), ('regularized', 0.071), ('discriminant', 0.069), ('nonparametrics', 0.069), ('students', 0.066), ('isvm', 0.064), ('pprob', 0.064), ('zellner', 0.064), ('nite', 0.064), ('appendix', 0.061), ('explained', 0.058), ('percentage', 0.058), ('regularization', 0.056), ('tasks', 0.053), ('yn', 0.052), ('merits', 0.052), ('lagrangian', 0.051), ('truncation', 0.048), ('bayes', 0.045), ('machines', 0.044), ('china', 0.044), ('training', 0.044), ('developing', 0.043), ('bmtl', 0.043), ('harmonium', 0.043), ('jz', 0.043), ('medlda', 0.043), ('mtgp', 0.043), ('mtilsvm', 0.043), ('mtrl', 0.043), ('svmf', 0.043), ('zdk', 0.043), ('scene', 0.042), ('unbounded', 0.041), ('projection', 0.04), ('zhu', 0.04), ('fm', 0.038), ('exam', 0.037), ('impose', 0.036), ('infer', 0.036), ('predictive', 0.035), ('cation', 0.035), ('dirichlet', 0.035), ('sqrt', 0.034), ('manifold', 0.034), ('structures', 0.034), ('task', 0.034), ('imposing', 0.033), ('priors', 0.033), ('dm', 0.033), ('distributions', 0.033), ('expectation', 0.032), ('acc', 0.032), ('stl', 0.032), ('kl', 0.032), ('zn', 0.032), ('binary', 0.031), ('multitask', 0.031), ('jmlr', 0.031), ('inherited', 0.031), ('mixture', 0.031), ('accuracy', 0.03), ('relationships', 0.03), ('vr', 0.029), ('dataset', 0.029), ('band', 0.028), ('auxiliary', 0.027), ('put', 0.027), ('test', 0.027), ('nips', 0.027), ('process', 0.027), ('variance', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

2 0.20662457 258 nips-2011-Sparse Bayesian Multi-Task Learning

Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau

Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1

3 0.18281119 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

4 0.12799825 301 nips-2011-Variational Gaussian Process Dynamical Systems

Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou

Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1

5 0.12624249 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1

6 0.11104591 285 nips-2011-The Kernel Beta Process

7 0.097360924 302 nips-2011-Variational Learning for Recurrent Spiking Networks

8 0.092617184 60 nips-2011-Confidence Sets for Network Structure

9 0.088489778 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

10 0.086440705 288 nips-2011-Thinning Measurement Models and Questionnaire Design

11 0.084650733 104 nips-2011-Generalized Beta Mixtures of Gaussians

12 0.080290005 240 nips-2011-Robust Multi-Class Gaussian Process Classification

13 0.079877637 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

14 0.079683945 131 nips-2011-Inference in continuous-time change-point models

15 0.079394974 156 nips-2011-Learning to Learn with Compound HD Models

16 0.075530484 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

17 0.075121328 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

18 0.074520677 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

19 0.073970959 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

20 0.072111487 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.232), (1, 0.086), (2, -0.004), (3, -0.01), (4, -0.03), (5, -0.15), (6, 0.112), (7, -0.067), (8, 0.04), (9, 0.125), (10, -0.076), (11, -0.092), (12, 0.053), (13, -0.023), (14, -0.103), (15, 0.003), (16, -0.117), (17, -0.008), (18, 0.064), (19, -0.088), (20, 0.003), (21, -0.109), (22, -0.01), (23, 0.043), (24, -0.015), (25, -0.073), (26, 0.094), (27, 0.057), (28, -0.055), (29, 0.157), (30, 0.112), (31, 0.001), (32, -0.118), (33, -0.014), (34, 0.011), (35, -0.038), (36, -0.116), (37, -0.027), (38, 0.026), (39, -0.176), (40, -0.029), (41, 0.057), (42, 0.119), (43, 0.041), (44, 0.019), (45, 0.048), (46, -0.084), (47, 0.046), (48, -0.012), (49, -0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95165133 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

2 0.77567625 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

Author: Hai-son P. Le, Ziv Bar-joseph

Abstract: Determining interactions between entities and the overall organization and clustering of nodes in networks is a major challenge when analyzing biological and social network data. Here we extend the Indian Buffet Process (IBP), a nonparametric Bayesian model, to integrate noisy interaction scores with properties of individual entities for inferring interaction networks and clustering nodes within these networks. We present an application of this method to study how microRNAs regulate mRNAs in cells. Analysis of synthetic and real data indicates that the method improves upon prior methods, correctly recovers interactions and clusters, and provides accurate biological predictions. 1

3 0.70132351 258 nips-2011-Sparse Bayesian Multi-Task Learning

Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau

Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1

4 0.67494231 269 nips-2011-Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning

Author: Miguel Lázaro-gredilla, Michalis K. Titsias

Abstract: We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multioutput Gaussian process regression, multi-class classification, image processing applications and collaborative filtering. 1

5 0.67326415 301 nips-2011-Variational Gaussian Process Dynamical Systems

Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou

Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1

6 0.65707248 60 nips-2011-Confidence Sets for Network Structure

7 0.64645845 285 nips-2011-The Kernel Beta Process

8 0.63398325 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

9 0.60534507 240 nips-2011-Robust Multi-Class Gaussian Process Classification

10 0.57529718 104 nips-2011-Generalized Beta Mixtures of Gaussians

11 0.55740952 288 nips-2011-Thinning Measurement Models and Questionnaire Design

12 0.50056529 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise

13 0.49338388 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

14 0.48472774 217 nips-2011-Practical Variational Inference for Neural Networks

15 0.4741427 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

16 0.46513978 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

17 0.46444249 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

18 0.46101335 14 nips-2011-A concave regularization technique for sparse mixture models

19 0.45921096 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

20 0.45849842 139 nips-2011-Kernel Bayes' Rule


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (3, 0.255), (4, 0.057), (20, 0.032), (26, 0.021), (31, 0.118), (33, 0.034), (43, 0.089), (45, 0.119), (57, 0.033), (74, 0.045), (83, 0.044), (84, 0.013), (99, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84640598 101 nips-2011-Gaussian process modulated renewal processes

Author: Yee W. Teh, Vinayak Rao

Abstract: Renewal processes are generalizations of the Poisson process on the real line whose intervals are drawn i.i.d. from some distribution. Modulated renewal processes allow these interevent distributions to vary with time, allowing the introduction of nonstationarity. In this work, we take a nonparametric Bayesian approach, modelling this nonstationarity with a Gaussian process. Our approach is based on the idea of uniformization, which allows us to draw exact samples from an otherwise intractable distribution. We develop a novel and efficient MCMC sampler for posterior inference. In our experiments, we test these on a number of synthetic and real datasets. 1

2 0.83385891 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

Author: David Adametz, Volker Roth

Abstract: A Bayesian approach to partitioning distance matrices is presented. It is inspired by the Translation-invariant Wishart-Dirichlet process (TIWD) in [1] and shares a number of advantageous properties like the fully probabilistic nature of the inference model, automatic selection of the number of clusters and applicability in semi-supervised settings. In addition, our method (which we call fastTIWD) overcomes the main shortcoming of the original TIWD, namely its high computational costs. The fastTIWD reduces the workload in each iteration of a Gibbs sampler from O(n3 ) in the TIWD to O(n2 ). Our experiments show that the cost reduction does not compromise the quality of the inferred partitions. With this new method it is now possible to ‘mine’ large relational datasets with a probabilistic model, thereby automatically detecting new and potentially interesting clusters. 1

same-paper 3 0.78859156 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning

Author: Jun Zhu, Ning Chen, Eric P. Xing

Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.

4 0.78468335 33 nips-2011-An Exact Algorithm for F-Measure Maximization

Author: Krzysztof J. Dembczynski, Willem Waegeman, Weiwei Cheng, Eyke Hüllermeier

Abstract: The F-measure, originally introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure remains a statistically and computationally challenging problem, since no closed-form maximizer exists. Current algorithms are approximate and typically rely on additional assumptions regarding the statistical distribution of the binary response variables. In this paper, we present an algorithm which is not only computationally efficient but also exact, regardless of the underlying distribution. The algorithm requires only a quadratic number of parameters of the joint distribution (with respect to the number of binary responses). We illustrate its practical performance by means of experimental results for multi-label classification. 1

5 0.6424253 180 nips-2011-Multiple Instance Filtering

Author: Kamil A. Wnuk, Stefano Soatto

Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1

6 0.63985372 258 nips-2011-Sparse Bayesian Multi-Task Learning

7 0.63557106 75 nips-2011-Dynamical segmentation of single trials from population neural data

8 0.63472193 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

9 0.63426745 301 nips-2011-Variational Gaussian Process Dynamical Systems

10 0.63362098 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes

11 0.63119501 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

12 0.6310674 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines

13 0.63062286 156 nips-2011-Learning to Learn with Compound HD Models

14 0.62967998 66 nips-2011-Crowdclustering

15 0.62924272 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models

16 0.62897748 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

17 0.62808597 219 nips-2011-Predicting response time and error rates in visual search

18 0.62798613 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

19 0.62794864 229 nips-2011-Query-Aware MCMC

20 0.62640798 231 nips-2011-Randomized Algorithms for Comparison-based Search