nips nips2009 nips2009-158 knowledge-graph by maker-knowledge-mining

158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA


Source: pdf

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. [sent-4, score-0.35]

2 In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. [sent-5, score-0.178]

3 The aforementioned setting is a special case of multitask learning [6] when predicting each label is a task and all the tasks share a common source of input. [sent-15, score-0.226]

4 However, such ı an approach ignores the label correlations and leads to sub-optimal performance [20]. [sent-18, score-0.126]

5 One important application of CCA is in supervised dimensionality reduction, albeit in the more general setting where each example has several labels. [sent-21, score-0.162]

6 In this setting, CCA on input-output pair (X, Y) can be used to project inputs X to a low-dimensional space directed by label information Y. [sent-22, score-0.099]

7 An even more crucial issue is choosing the number of correlation components, which is traditionally dealt with by using cross-validation, or model-selection [21]. [sent-26, score-0.129]

8 Another issue is the potential sparsity [18] of the underlying projections that is ignored by the standard CCA formulation. [sent-27, score-0.143]

9 Building upon the recently suggested probabilistic interpretation of CCA [3], we propose a nonparametric, fully Bayesian framework that can deal with each of these issues. [sent-28, score-0.141]

10 In particular, the proposed model can automatically select the number of correlation components, and effectively capture the 1 sparsity underlying the projections. [sent-29, score-0.26]

11 Our framework is based on the Indian Buffet Process [9], a nonparametric Bayesian model to discover latent feature representation of a set of observations. [sent-30, score-0.148]

12 In addition, our probabilistic model allows dealing with missing data and, in the supervised dimensionality reduction case, can incorporate additional unlabeled data one may have access to, making our CCA algorithm work in a semi-supervised setting. [sent-31, score-0.355]

13 Thus, apart from being a general, nonparametric, fully Bayesian solution to the CCA problem, our framework can be readily applied for learning useful predictive features from labeled (or partially labeled) data in the context of learning a set of related tasks. [sent-32, score-0.088]

14 In particular, we describe a fully supervised setting (when the test data is not available at the time of training), and a semi-supervised setting with partial labels (when we have access to test data at the time of training). [sent-37, score-0.179]

15 2 Canonical Correlation Analysis Canonical correlation analysis (CCA) is a useful technique for modeling the relationships among a set of variables. [sent-40, score-0.103]

16 More formally, given a pair of variables x ∈ RD1 and y ∈ RD2 , CCA seeks to find linear projections ux and uy such that the variables are maximally correlated in the projected space. [sent-42, score-0.251]

17 The correlation coefficient between the two variables in the embedded space is given by uT xyT uy x ρ= (uT xxT ux )(uT yyT uy ) y x Since the correlation is not affected by rescaling of the projections ux and uy , CCA is posed as a constrained optimization problem. [sent-43, score-0.817]

18 1 Probabilistic CCA Bach and Jordan [3] gave a probabilistic interpretation of CCA by posing it as a latent variable model. [sent-52, score-0.164]

19 Bach and Jordan [3] showed that, given the maximum likelihood solution for the model parameters, the expectations E(z|x) and E(z|y) of the latent variable z lie in the same subspace that classical CCA finds, thereby establishing the equivalence between the above probabilistic model and CCA. [sent-59, score-0.311]

20 However, it still assumes an apriori fixed number of canonical correlation components. [sent-61, score-0.214]

21 In addition, another important issue is the sparsity of the underlying projection matrix which is usually ignored. [sent-62, score-0.22]

22 A crucial issue in the CCA model is choosing the number of canonical correlation components which is set to a fixed value in classical CCA (and even in the probabilistic extensions of CCA). [sent-64, score-0.393]

23 In the Bayesian formulation of CCA, one can use the Automatic Relevance Determination (ARD) prior [5] on the projection matrix W that gives a way to select this number. [sent-65, score-0.123]

24 We propose a nonparametric Bayesian model that selects the number of canonical correlation components automatically. [sent-67, score-0.314]

25 More specifically, we use the Indian Buffet Process [9] (IBP) as a nonparametric prior on the projection matrix W. [sent-68, score-0.164]

26 The IBP prior allows W to have an unbounded number of columns which gives a way to automatically determine the dimensionality K of the latent space associated with Z. [sent-69, score-0.269]

27 1 The Indian Buffet Process The Indian Buffet Process [9] defines a distribution over infinite binary matrices, originally motivated by the need to model the latent feature structure of a given set of observations. [sent-71, score-0.136]

28 In the latent feature model, each observation can be thought of as being explained by a set of latent features. [sent-73, score-0.166]

29 Given an N × D matrix X of N observations having D features each, we can consider a decomposition of the form X = ZA + E where Z is an N × K binary feature-assignment matrix describing which features are present in each observation. [sent-74, score-0.117]

30 A is a K × D matrix of feature scores, and the matrix E consists of observation specific noise. [sent-76, score-0.088]

31 A crucial issue in such models is the choosing the number K of latent features. [sent-77, score-0.109]

32 The standard formulation of IBP lets us define a prior over the binary matrix Z such that it can have an unbounded number of columns and thus can be a suitable prior in problems dealing with such structures. [sent-78, score-0.163]

33 The IBP derivation starts by defining a finite model for K many columns of a N × K binary matrix. [sent-79, score-0.099]

34 This equivalence can be best understood by a culinary analogy of customers coming to an Indian restaurant and selecting dishes from an infinite array of dishes. [sent-82, score-0.123]

35 In this analogy, customers represent observations and dishes represent latent features. [sent-83, score-0.206]

36 Thereafter, each incoming customer n selects an existing dish k with a probability mk /N , where mk denotes how many previous customers chose that particular dish. [sent-85, score-0.176]

37 This process generates a binary matrix Z with rows representing customer and columns representing dishes. [sent-87, score-0.158]

38 Many real world datasets have a sparseness 3 Figure 1: The graphical model depicts the fully supervised case when all variables X and Y are observed. [sent-88, score-0.141]

39 The semisupervised case can have X and/or Y consisting of missing values as well. [sent-89, score-0.09]

40 The graphical model structure remains the same property which means that each observation depends only on a subset of all the K latent features. [sent-90, score-0.107]

41 This means that the binary matrix Z is expected to be reasonably sparse for many datasets. [sent-91, score-0.103]

42 This makes IBP a suitable choice for also capturing the underlying sparsity in addition to automatically discovering the number of latent features. [sent-92, score-0.255]

43 2 The Infinite CCA Model In our proposed framework, the matrix W consisting of canonical correlation vectors is modeled using an IBP prior. [sent-94, score-0.292]

44 , xN ] is D1 × N matrix consisting of N samples of D1 dimensions each, and Y = [y1 , . [sent-103, score-0.101]

45 , yN ] is another matrix consisting of N samples of D2 dimensions each. [sent-106, score-0.101]

46 This is particularly important in the case of supervised dimensionality reduction (i. [sent-110, score-0.233]

47 , X consisting of inputs and Y associated responses) when the labels for some of the inputs are unknown, making it a model for semi-supervised dimensionality reduction with partially labeled data. [sent-112, score-0.355]

48 In addition, placing the IBP prior on the projection matrix W (via the binary matrix B) also helps in capturing the sparsity in W (see results section for evidence). [sent-113, score-0.303]

49 3 Inference We take a fully Bayesian approach by treating everything at latent variables and computing the posterior distributions over them. [sent-115, score-0.116]

50 4 In what follows, D denotes the data [X; Y], B = [Bx ; By ], and V = [Vx ; Vy ] Sampling B: Sampling the binary IBP matrix B consists of sampling existing dishes, proposing new dishes and accepting or rejecting them based on the acceptance ratio in the associated M-H step. [sent-117, score-0.186]

51 Note that the number of columns in V is the same as number of columns in the IBP matrix B. [sent-127, score-0.136]

52 To deal with this issue, one could sample Bx (say having Kx nonzero columns) and By (say having Ky nonzero columns) first, introduce extra dummy columns (|Kx −Ky | in number) in the matrix having smaller number of nonzero columns, and then set all such columns to zero. [sent-133, score-0.241]

53 4 Multitask Learning using Infinite CCA Having set up the framework for infinite CCA, we now describe its applicability for the problem of multitask learning. [sent-136, score-0.091]

54 Here predicting each individual label becomes a task to be learned. [sent-138, score-0.11]

55 Although one can individually learn a separate model for each task, doing this would ignore the label correlations. [sent-139, score-0.104]

56 With this motivation, we apply our infinite CCA model to capture the label correlations and to learn better predictive features from the data by projective it to a subspace directed by label information. [sent-141, score-0.298]

57 It has been empirically and theoretically [25] shown that incorporating label information in dimensionality reduction indeed leads to better projections if the final goal is prediction. [sent-142, score-0.231]

58 The infinite CCA model is applied on the pair X and Y which is akin to doing supervised dimensionality reduction for the inputs X. [sent-153, score-0.298]

59 Note that the generalized eigenvalue problem posed in such a supervised setting of CCA consists of cross-covariance matrix ΣXY and label covariance matrix ΣY Y . [sent-154, score-0.258]

60 Therefore the projection takes into account both the input-output correlations and the label correlations. [sent-155, score-0.158]

61 5 Multitask learning using the infinite CCA model can be done in two settings: supervised and semisupervised depending on whether or not the inputs of test data are involved in learning the shared subspace Z. [sent-157, score-0.307]

62 1 Fully supervised setting In the supervised setting, CCA is done on labeled data (X, Y) to give a single shared subspace Z ∈ RK×N that is good across all tasks. [sent-159, score-0.298]

63 A model is then learned in the Z subspace to learn M task parameters {θm } ∈ RK×1 where m ∈ {1, . [sent-160, score-0.139]

64 With the second option, we x x x can inflate each learned task parameter back to D dimensions by applying the projection matrix Wx . [sent-167, score-0.153]

65 The infinite CCA model is then applied on the pair (X, Y) and the parts of Y consisting of Yte are treated as a latent variables to be imputed. [sent-172, score-0.141]

66 With this model, we get the embeddings also for the test data and thus training and testing both take place in the K dimensional subspace, unlike model-1 in which training is done in K dimensional subspace and prediction are made in the original D dimensional subspace. [sent-173, score-0.191]

67 We first show our results with the infinite CCA as a stand alone algorithm for CCA by using it on a synthetic dataset demonstrating its effectiveness in capturing the canonical correlations. [sent-176, score-0.15]

68 We then also report our experiments on applying the infinite CCA model to the problem of multitask learning on two real world datasets. [sent-177, score-0.115]

69 1 Infinite CCA results on synthetic data In the first experiment, we demonstrate the effectiveness of our proposed infinite CCA model in discovering the correct number of canonical correlation components, and in capturing the sparsity pattern underlying the projection matrix. [sent-179, score-0.427]

70 , the number of components, and the underlying sparsity of projection matrix). [sent-183, score-0.15]

71 In particular, the dataset had 4 correlation components with a 63% sparsity in the true projection matrix. [sent-184, score-0.263]

72 Looking at all the correlations discovered by classical CCA, we found that it discovered 8 components having significant correlations, whereas our model correctly discovered exactly 4 components in the first place (we extract the MAP samples for W and Z output by our Gibbs sampler). [sent-186, score-0.258]

73 Thus on this small dataset, standard CCA indeed seems to be finding spurious correlations, indicating a case of overfitting (the overfitting problem of classical CCA was also observed in [15] when comparing Bayesian versus classical CCA). [sent-187, score-0.092]

74 Furthermore, as expected, the projection matrix inferred by the classical CCA had no exact zero entries and even after thresholding significantly small absolute values to zero, the uncovered sparsity was only about 25%. [sent-188, score-0.215]

75 On the other hand, the projection matrix inferred by the infinite CCA model had 57% exact zero entries and 62% zero entries after thresholding very small values, thereby demonstrating its effectiveness in also capturing the sparsity patterns. [sent-189, score-0.232]

76 This baseline ignores the label information while learning the low dimensional subspace. [sent-243, score-0.118]

77 • CCA: Apply classical CCA on training data to extract the shared subspace, learn separate model (i. [sent-244, score-0.136]

78 , task parameters) for each task in this subspace, project the task parameters back to the original D dimensional feature space by applying the projection Wx , and do predictions on the test data in this feature pace. [sent-246, score-0.179]

79 • Model-1: Use our supervised infinite CCA model to learn the shared subspace using only the training data (see section 4. [sent-247, score-0.238]

80 • Model-2: Use our semi-supervised infinite CCA model to simultaneously learn the shared subspace for both training and test data (see section 4. [sent-249, score-0.154]

81 This is possible in our probabilistic model since we could treat the unknown Y’s of the test data as latent variables to be imputed while doing the Gibbs sampling. [sent-256, score-0.155]

82 6 Related Work A number of approaches have been proposed in the recent past for the problem of supervised dimensionality reduction of multi-label data. [sent-259, score-0.233]

83 The few approaches that exist include Partial Least Squares [2], multi-label informed latent semantic indexing [24], and multi-label dimensionality reduction using dependence maximization (MDDM) [26]. [sent-260, score-0.232]

84 Somewhat similar in spirit to our approach is the work on supervised probabilistic PCA [25] that extends probabilistic PCA to the setting when we also have access to labels. [sent-262, score-0.202]

85 However, it assumes a fixed number of components and does not take into account sparsity of the projections. [sent-263, score-0.103]

86 7 The CCA based approach to supervised dimensionality reduction is more closely related to the notion of dimension reduction for regression (DRR) which is formally defined as finding a low dimensional representation z ∈ RK of inputs x ∈ RD (K ≪ D) for predicting multivariate outputs y ∈ RM . [sent-264, score-0.403]

87 An important notion in DRR is that of sufficient dimensionality reduction (SDR) [10, 8] which states that given z, x and y are conditionally independent, i. [sent-265, score-0.149]

88 As we can see in the ⊥ graphical model shown in figure-1, the probabilistic interpretation of CCA yields the same condition with X and Y being conditionally independent given Z. [sent-268, score-0.105]

89 Among the DRR based approaches to dimensionality reduction for real-valued multilabel data, Covariance Operator Inverse Regression (COIR) exploits the covariance structures of both the inputs and outputs [14]. [sent-269, score-0.19]

90 In another recent work [13], a joint learning framework is proposed which performs dimensionality reduction and multi-label classification simultaneously. [sent-277, score-0.149]

91 In particular, sparsity improves model interpretation and has been gaining lots of attention recently. [sent-279, score-0.125]

92 Another recent solution is based on a direct greedy approach which bounds the correlation at each stage [22]. [sent-281, score-0.103]

93 Finally, multitask learning has been tackled using a variety of different approaches, primarily depending on what notion of task relatedness is assumed. [sent-286, score-0.144]

94 Some of the examples include tasks generated from an IID space [4], and learning multiple tasks using a hierarchical prior over the task space [23, 7], among others. [sent-287, score-0.101]

95 In particular, our model does not assume a fixed number of correlation components and this number is determined automatically based only on the data. [sent-290, score-0.202]

96 In addition, our model enjoys sparsity making the model more interpretable. [sent-291, score-0.116]

97 The probabilistic nature of our model also allows dealing with missing data. [sent-292, score-0.1]

98 Finally, we also demonstrate the model’s applicability to the problem of multi-label learning where our model, directed by label information, can be used to automatically extract useful predictive features from the data. [sent-293, score-0.127]

99 Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. [sent-342, score-0.155]

100 Covariance operator based dimensionality reduction with extension to semisupervised settings. [sent-384, score-0.177]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cca', 0.827), ('ibp', 0.192), ('uy', 0.133), ('canonical', 0.111), ('correlation', 0.103), ('bx', 0.094), ('ux', 0.094), ('multitask', 0.091), ('subspace', 0.086), ('supervised', 0.084), ('latent', 0.083), ('dishes', 0.083), ('wx', 0.081), ('dimensionality', 0.078), ('reduction', 0.071), ('sparsity', 0.068), ('drr', 0.067), ('xte', 0.067), ('label', 0.058), ('projection', 0.057), ('vx', 0.05), ('vy', 0.05), ('oisson', 0.05), ('yte', 0.05), ('probabilistic', 0.048), ('nite', 0.047), ('columns', 0.046), ('classical', 0.046), ('yeast', 0.045), ('ut', 0.045), ('matrix', 0.044), ('shared', 0.044), ('correlations', 0.043), ('indian', 0.043), ('bayesian', 0.041), ('inputs', 0.041), ('pca', 0.041), ('nonparametric', 0.041), ('customers', 0.04), ('automatically', 0.04), ('labels', 0.04), ('customer', 0.039), ('capturing', 0.039), ('buffet', 0.038), ('ky', 0.038), ('daum', 0.038), ('mk', 0.037), ('kx', 0.036), ('components', 0.035), ('dimensional', 0.035), ('wt', 0.035), ('wy', 0.034), ('consisting', 0.034), ('ate', 0.033), ('bik', 0.033), ('coir', 0.033), ('xyt', 0.033), ('interpretation', 0.033), ('fully', 0.033), ('bach', 0.032), ('sampling', 0.03), ('sparse', 0.03), ('yu', 0.03), ('predictive', 0.029), ('yyt', 0.029), ('acc', 0.029), ('wwt', 0.029), ('task', 0.029), ('binary', 0.029), ('semisupervised', 0.028), ('missing', 0.028), ('eigenvalue', 0.028), ('auc', 0.028), ('option', 0.028), ('ib', 0.027), ('wz', 0.027), ('deal', 0.027), ('gibbs', 0.026), ('bottleneck', 0.026), ('partially', 0.026), ('nonzero', 0.026), ('issue', 0.026), ('underlying', 0.025), ('tasks', 0.025), ('scene', 0.025), ('ignores', 0.025), ('discovered', 0.025), ('projections', 0.024), ('relatedness', 0.024), ('rai', 0.024), ('xxt', 0.024), ('ard', 0.024), ('model', 0.024), ('predicting', 0.023), ('dish', 0.023), ('dimensions', 0.023), ('rk', 0.022), ('separate', 0.022), ('prior', 0.022), ('access', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

2 0.33004567 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

Abstract: Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-defined visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were fixed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.

3 0.31393927 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Author: Feng Zhou, Fernando Torre

Abstract: Alignment of time series is an important problem to solve in many scientific disciplines. In particular, temporal alignment of two or more subjects performing similar activities is a challenging problem due to the large temporal scale difference between human actions as well as the inter/intra subject variability. In this paper we present canonical time warping (CTW), an extension of canonical correlation analysis (CCA) for spatio-temporal alignment of human motion between two subjects. CTW extends previous work on CCA in two ways: (i) it combines CCA with dynamic time warping (DTW), and (ii) it extends CCA by allowing local spatial deformations. We show CTW’s effectiveness in three experiments: alignment of synthetic data, alignment of motion capture data of two subjects performing similar actions, and alignment of similar facial expressions made by two people. Our results demonstrate that CTW provides both visually and qualitatively better alignment than state-of-the-art techniques based on DTW. 1

4 0.11254437 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths

Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1

5 0.11206143 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

Author: Finale Doshi-velez, Shakir Mohamed, Zoubin Ghahramani, David A. Knowles

Abstract: Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the first parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible. 1

6 0.10893483 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

7 0.10215394 114 nips-2009-Indian Buffet Processes with Power-law Behavior

8 0.074556991 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

9 0.074076675 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

10 0.07299269 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

11 0.06709405 157 nips-2009-Multi-Label Prediction via Compressed Sensing

12 0.064946592 46 nips-2009-Bilinear classifiers for visual recognition

13 0.059292253 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

14 0.056738537 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

15 0.050602172 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

16 0.04656148 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

17 0.045651529 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

18 0.044309378 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

19 0.043310799 140 nips-2009-Linearly constrained Bayesian matrix factorization for blind source separation

20 0.042801328 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.174), (1, -0.057), (2, -0.042), (3, -0.019), (4, 0.05), (5, -0.083), (6, 0.111), (7, -0.127), (8, 0.004), (9, 0.087), (10, 0.091), (11, -0.001), (12, -0.167), (13, 0.026), (14, -0.274), (15, -0.026), (16, 0.261), (17, 0.207), (18, 0.146), (19, 0.045), (20, 0.04), (21, 0.001), (22, 0.119), (23, -0.161), (24, -0.093), (25, -0.018), (26, 0.18), (27, 0.037), (28, -0.178), (29, -0.269), (30, -0.028), (31, 0.145), (32, -0.043), (33, 0.023), (34, 0.008), (35, 0.103), (36, 0.004), (37, -0.062), (38, 0.008), (39, -0.036), (40, 0.016), (41, -0.046), (42, -0.057), (43, 0.029), (44, -0.093), (45, 0.031), (46, -0.055), (47, 0.076), (48, -0.006), (49, -0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91179484 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

2 0.86998028 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior

Author: Feng Zhou, Fernando Torre

Abstract: Alignment of time series is an important problem to solve in many scientific disciplines. In particular, temporal alignment of two or more subjects performing similar activities is a challenging problem due to the large temporal scale difference between human actions as well as the inter/intra subject variability. In this paper we present canonical time warping (CTW), an extension of canonical correlation analysis (CCA) for spatio-temporal alignment of human motion between two subjects. CTW extends previous work on CCA in two ways: (i) it combines CCA with dynamic time warping (DTW), and (ii) it extends CCA by allowing local spatial deformations. We show CTW’s effectiveness in three experiments: alignment of synthetic data, alignment of motion capture data of two subjects performing similar actions, and alignment of similar facial expressions made by two people. Our results demonstrate that CTW provides both visually and qualitatively better alignment than state-of-the-art techniques based on DTW. 1

3 0.63864136 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

Author: Yusuke Fujiwara, Yoichi Miyawaki, Yukiyasu Kamitani

Abstract: Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-defined visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were fixed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.

4 0.44673184 114 nips-2009-Indian Buffet Processes with Power-law Behavior

Author: Yee W. Teh, Dilan Gorur

Abstract: The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a three-parameter generalization of the IBP exhibiting power-law behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stable-beta process and deriving the IBP corresponding to it. We find interesting relationships between the stable-beta process and the Pitman-Yor process (another stochastic process used in Bayesian nonparametric models with interesting power-law properties). We derive a stick-breaking construction for the stable-beta process, and find that our power-law IBP is a good model for word occurrences in document corpora. 1

5 0.40037414 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Author: Douglas Eck, Yoshua Bengio, Aaron C. Courville

Abstract: The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an infinite number of latent factors. Here we extend the latent factor model framework to two or more unbounded layers of latent factors. From a generative perspective, each layer defines a conditional factorial prior distribution over the binary latent variables of the layer below via a noisy-or mechanism. We explore the properties of the model with two empirical studies, one digit recognition task and one music tag data experiment. 1

6 0.35975385 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

7 0.34617957 46 nips-2009-Bilinear classifiers for visual recognition

8 0.33457187 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

9 0.26359788 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

10 0.23912525 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison

11 0.23368165 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

12 0.23006758 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

13 0.22584596 152 nips-2009-Measuring model complexity with the prior predictive

14 0.21895029 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

15 0.21766421 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

16 0.21682686 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

17 0.21634668 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

18 0.20653377 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

19 0.20452814 157 nips-2009-Multi-Label Prediction via Compressed Sensing

20 0.197291 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.013), (7, 0.013), (21, 0.011), (24, 0.039), (25, 0.051), (35, 0.075), (36, 0.112), (39, 0.068), (55, 0.013), (58, 0.116), (61, 0.012), (66, 0.055), (71, 0.071), (81, 0.019), (86, 0.076), (91, 0.013), (94, 0.153)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88329303 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

Author: Ashish Kapoor, Eric Horvitz

Abstract: To date, the processes employed for active information acquisition during periods of learning and diagnosis have been considered as separate and have been applied in distinct phases of analysis. While active learning centers on the collection of information about training cases in order to build better predictive models, diagnosis uses fixed predictive models for guiding the collection of observations about a specific test case at hand. We introduce a model and inferential methods that bridge these phases of analysis into a holistic approach to information acquisition that considers simultaneously the extension of the predictive model and the probing of a case at hand. The bridging of active learning and real-time diagnostic feature acquisition leads to a new class of policies for learning and diagnosis. 1

same-paper 2 0.8746891 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

Author: Piyush Rai, Hal Daume

Abstract: Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction. 1

3 0.80877191 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

Author: Lei Shi, Thomas L. Griffiths

Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1

4 0.80317098 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

Abstract: Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions defined on discrete random variables. Improvements include guarantees of convergence, approximations that are provably more accurate, and bounds on the results of exact inference. However, extending these methods to continuous-valued systems has lagged behind. While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. The resulting algorithms behave similarly to their purely discrete counterparts, extending the benefits of these more advanced inference techniques to the continuous domain. 1

5 0.7827329 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

Author: Kurt Miller, Michael I. Jordan, Thomas L. Griffiths

Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets. 1

6 0.78132433 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

7 0.78020608 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

8 0.77883494 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

9 0.77847701 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

10 0.77771425 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

11 0.77640158 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

12 0.77434456 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

13 0.77303231 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

14 0.77214831 114 nips-2009-Indian Buffet Processes with Power-law Behavior

15 0.77137774 70 nips-2009-Discriminative Network Models of Schizophrenia

16 0.76963794 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

17 0.76918781 100 nips-2009-Gaussian process regression with Student-t likelihood

18 0.768125 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

19 0.76790369 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

20 0.767802 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies