nips nips2012 nips2012-54 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lei Shi
Abstract: For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. By formulating the posterior updating into the task of solving Sylvester equations, we propose an efficient variational inference algorithm. Furthermore, PCSA is extended to tackling and filling missing values, to adapting model sparseness, and to modelling tensor data. In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. [sent-3, score-0.075]
2 Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. [sent-4, score-0.061]
3 In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. [sent-5, score-0.06]
4 By formulating the posterior updating into the task of solving Sylvester equations, we propose an efficient variational inference algorithm. [sent-6, score-0.075]
5 Furthermore, PCSA is extended to tackling and filling missing values, to adapting model sparseness, and to modelling tensor data. [sent-7, score-0.26]
6 In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values. [sent-8, score-0.192]
7 1 Introduction This paper focuses on modeling data matrices by simultaneously capturing the dependent structures among both rows and columns, which is especially useful for filling missing values. [sent-9, score-0.253]
8 In [12, 16], Bayesian probabilistic matrix factorization (PMF) is investigated via modeling the row-wise and column-wise specific variances and inferred based on suitable priors. [sent-12, score-0.074]
9 Probabilistic Matrix Addition (PMA) [1] describes the covariance structures among rows and columns, showing promising results compared with GP regression, PMF and LMC. [sent-13, score-0.044]
10 On high dimensional data, subspace structures are usually designed in statistical models with reduced numbers of free parameters, leading to improvement on both learning efficiency and accuracy [3, 11, 24]. [sent-15, score-0.061]
11 1), PCSA is able to capture the dependencies among entries intricately, fit the non-Gaussian and heteroscedastic density, and extract the hidden features in the co-subspaces. [sent-18, score-0.101]
12 We propose a variational Bayesian algorithm for inferring both the parameters and the latent dimensionalities of PCSA. [sent-19, score-0.09]
13 First, missing values in data matrices are easily tackled and filled by iterating with the variational inference. [sent-22, score-0.214]
14 Second, with a Jeffreys prior, Bayesian sparse PCSA is implemented with an adaptive model sparseness [4]. [sent-23, score-0.072]
15 , 2nd-order tensor) to PCSA-k for modelling tensor data with an arbitrary order k. [sent-26, score-0.107]
16 1 On the task of filling missing values in matrix data, we compare (sparse) PCSA with several stateof-art models/approaches, including PMA, Robust Bayesian PMF and Bayesian GPLVM [21]. [sent-27, score-0.176]
17 The datasets under consideration range from multi-label classification data, user-item rating data for collaborative filtering, and face images. [sent-28, score-0.066]
18 Further on tensor structured face image data, PCSA is compared with the M2 SA method [6] that uses consecutive SVDs on all modes of the tensor. [sent-29, score-0.159]
19 Letting X ∈ RD1 ×D2 be an observed matrix with D1 ≤ D2 without loss of generality1 , we start by outlining a generative model for X. [sent-33, score-0.054]
20 Consider two hidden variables y ∼ N (y|0d1 , Id1 ) and z ∼ N (z|0d2 , Id2 ) with d1 < D1 and d2 < D2 , where 0d denotes a d-dim vector with all entries being zeros and Id denotes a d × d identity matrix. [sent-34, score-0.053]
21 Using the concatenation nomenclature of Matlab, two matrices of hidden factors Y = [y∗1 , . [sent-35, score-0.05]
22 Through two linear mapping matrices A ∈ RD1 ×d1 and B ∈ RD2 ×d2 , each entry xij ∈ X is independent given Y and Z by xij = ai∗ y∗j + bj∗ z∗i + eij , where ai∗ is the i-th row of A. [sent-42, score-0.194]
23 Each eij ∼ N (eij |0, 1/τ ) is independently Gaussian distributed and independent from Y, Z. [sent-43, score-0.058]
24 , D1 ; • Get E ∈ RD1 ×D2 by independently drawing each element eij ∼ N (eij |0, 1/τ ) for ∀i, j; • Get X = AY + (BZ)⊤ + E given Y and Z, i. [sent-50, score-0.058]
25 Although each entry xij ∈ X is independent given Y and Z, the PCSA model captures the dependencies along rows as well as columns in the joint X. [sent-57, score-0.114]
26 Although able to describe the row-wise and columnwise covariances, PMA [1] requires estimating and inverting two (large) kernel matrices with sizes D1 × D1 and D2 × D2 respectively, which is intractable for many real world applications. [sent-69, score-0.044]
27 Moreover, PCSA is able to extract the hidden features Y and Z simultaneously. [sent-71, score-0.041]
28 2 Variational Bayesian Inference Given X and the hidden dimensionalities (d1 , d2 ), we can estimate PCSA’s parameters θ = {A, B, τ } by maximizing the likelihood p(X|θ). [sent-75, score-0.077]
29 However, the capacity control is essential to generalization ability, for which we proceed to deliver a variational Bayesian inference on PCSA. [sent-76, score-0.054]
30 Each column a∗i of the mapping matrix A priori independently follows a spherical Gaussian with a precision scalar ςi , i. [sent-84, score-0.053]
31 Each precision ςi further follows a Gamma prior for completing the specification of the Bayesian model. [sent-87, score-0.048]
32 Since MCMC samplers are inefficient for high dimensional data, this paper chooses variational inference instead [11], which introduces a distribution Q(Θ) and approximates maximizing the log marginal like∫ lihood log p(X) by maximizing a lower bound L(Q) = Q(Θ) log p(X,Θ) dΘ. [sent-89, score-0.106]
33 During ¯ ¯ learning, redundant columns of A and B will be pushed to approach zeros, which actually makes Bayesian model selection on hidden dimensionalities d1 and d2 . [sent-116, score-0.073]
34 1 Extensions Filling Missing Values In many real applications, X is usually partially observed with some missing entries. [sent-118, score-0.19]
35 The goal here is to infer not only the PCSA model but also the missing values in X based on the model structure. [sent-119, score-0.153]
36 Similar to the settings of PMA in [1], let us begin with a full matrix X, where the missing values are randomly filled. [sent-120, score-0.176]
37 We denote M = {(i, j) : xij is missing} as the index set of the missing ˜ values therein. [sent-121, score-0.216]
38 In each iteration, we “pretend” that X is the observed matrix and update Q(Θ) by Eqs. [sent-122, score-0.039]
39 Then given Q(Θ), the missing entries {˜ij : (i, j) ∈ M} are updated by maximizing x ¯ ¯¯ ¯ ¯ L(Q), i. [sent-124, score-0.217]
40 Moreover, filling missing values in PMA [1] needs to infer the column and row factors by either ¯ ¯ Gibbs sampling or MAP. [sent-128, score-0.168]
41 Paper [9] showed that the NJ prior performs better than the Laplacian on sparse PPCA. [sent-139, score-0.054]
42 In this paper, we choose to adopt the NJ prior for learning a sparse PCSA model. [sent-140, score-0.054]
43 Still under the variational inference framework, we now let Θ = {Z, Y, θ} and Q(Θ) = Q(Y)Q(Z)Q(θ) takes the conjugate form same as in ∫ A B Eq. [sent-151, score-0.054]
44 Each dimension of a tensor is called as a mode, and the order of a tensor is determined as the number of its modes. [sent-168, score-0.214]
45 Let us denote tensors with open-face uppercase letters (e. [sent-169, score-0.064]
46 A kth-order tensor X can be denoted by X ∈ RD1 ×D2 ×. [sent-174, score-0.107]
47 ×Dk , where its dimensionalities in each mode are D1 , D2 , . [sent-177, score-0.047]
48 Based on the above definitions, the PCSA model describes a kth-order tensor data XD1 ×. [sent-218, score-0.107]
49 ×Dk through the following generative process: (i) for each mode i, all elements of the hidden tensor (i) Y(i) ∈ Rdi ×Di+1 ×. [sent-221, score-0.161]
50 Shortly named as PCSA-k, this model has latent tensors {Y(i) }k and parameters θ = {τ } ∪ {A(i) }k with latent i=1 i=1 scales {di }k . [sent-254, score-0.088]
51 Except the involvement of the tensor structure and its operators, there is another difference compared with the variational posterior updating based on a matrix X. [sent-262, score-0.187]
52 Following [1], the first experiment compares PCSA with PMA in filling the missing entries of a truncated logodds matrix in multi-label classification. [sent-271, score-0.219]
53 A truncated log-odds matrix X is constructed with xij = c if gij = 1 and xij = −c if gij = 0, where c is nonzero constant. [sent-273, score-0.198]
54 In experiments, certain entries xij are assumed to be missing and filled as xij by an algorithm, and the performance is ˜ evaluated by the class membership prediction accuracy based on sign(˜ij ). [sent-274, score-0.307]
55 To test the capability in dealing with missing values, the proportion of the missing labels is increased from 10% to 50%, with 5% as a step size. [sent-282, score-0.339]
56 1 reports the error rates for recovering the missing labels in the truncated log-odds matrices, by Bayesian PCSA, Bayesian sparse PCSA and PMA. [sent-285, score-0.204]
57 On the relatively unbalanced Emotions data, PCSA outperforms sparse PCSA when the missing proportion is no larger than 40%, while sparse PCSA takes over the advantage when too many entries are missing due to the increasing importance of model sparsity. [sent-286, score-0.439]
58 Table 1 reports the average time cost, where sparse PCSA shows a little quicker convergence than PCSA. [sent-289, score-0.055]
59 Both are much quicker than PMA, since they do not need to either invert large covariances or infer the factor during filling missing values (see Section 3. [sent-290, score-0.172]
60 dataset: PCSA sparse PCSA PMA Figure 1: Error rates of 10 independent runs for recovering the missing labels in Emotions (left) and CAL500 (right) data. [sent-292, score-0.22]
61 3 Table 1: Average time cost (in seconds) on each dataset throughout 10 independent runs and all missing proportions. [sent-299, score-0.184]
62 Particularly, the MovieLens100K dataset contains 100K ratings of 943 users on 1682 movies, which are ordinal values on the scale [1, 5]. [sent-305, score-0.048]
63 The JesterJoke3 data contains ratings of 24983 users who have rated between 15 and 35 pieces of the total 100 jokes, where the ratings are continuous in [−10. [sent-306, score-0.096]
64 Recently in [12], Robust Bayesian Matrix Factorization (RBMF) was proposed by adopting a Student-t prior in probabilistic matrix factorization, and showed promising results on predicting entries on both MovieLens100K and JesterJoke3 data. [sent-309, score-0.089]
65 Following [12], in each run we randomly choose 70% of the ratings for training, and use the remaining ratings as the missing values r t=1 , for testing. [sent-310, score-0.249]
66 Given the true test ratings {rt }T and the predictions {˜t }T √the performance is evalt=1 ∑T 1 uated based on the rooted mean squared error (RMSE), i. [sent-311, score-0.048]
67 Since PMA runs inefficiently on high dimensional data as in Table 1, it is not considered to fill the ratings in this experiment. [sent-319, score-0.093]
68 It is observed that the performance by PCSA on predicting ratings is comparable with RBMF. [sent-320, score-0.064]
69 2 Completing Partially Observed Images We consider two greyscale face image datasets, namely Frey [15] and ORL [17]. [sent-339, score-0.057]
70 Specifically, Frey has 1965 images of size 28 × 20 taken from one person, and the data X is thus a 560 × 1965 matrix; ORL has 400 images of size 64 × 64 taken from 40 persons (10 images per person), and the data X is thus a 4096 × 400 matrix. [sent-340, score-0.189]
71 Applied on these matrices, the PCSA model is expected to extract the latent correlations among pixels and images. [sent-341, score-0.07]
72 In [13], Neil Lawrence proposed a Gaussian process latent variable model (GPLVM) for modeling and visualizing high dimensional data. [sent-342, score-0.051]
73 Recently a Bayesian GPLVM [21] was developed and showed much improved performance on filling pixels in partially observed Frey faces. [sent-343, score-0.07]
74 Thus in each run, we randomly pick nf images as fully observed, and a half pixels of the remaining images are further randomly chosen as missing values. [sent-346, score-0.357]
75 Same as [21], Bayesian GPLVM uses the nf images for training and then infers the missing pixels. [sent-347, score-0.261]
76 In contrast, (sparse) PCSA uses all images as a whole matrix. [sent-348, score-0.063]
77 In order to test the robustness, the nf for Frey is decreased gradually from 1000 to 200, and for ORL is decreased gradually from 300 to 50. [sent-349, score-0.045]
78 3 report the CORR and MAE values of 10 independent runs by PCSA, sparse PCSA and Bayesian GPLVM. [sent-353, score-0.067]
79 Both PCSA and sparse PCSA perform more accurately than Bayesian GPLVM in completing the missing pixels, and PCSA gives the best matching. [sent-354, score-0.219]
80 Also, (sparse) PCSA shows promising stability against the decreasing fully observed sample size nf , and this tendency is kept even when we assign all images are partially observed (i. [sent-355, score-0.161]
81 The results by Bayesian GPLVM deteriorate more obviously, because the partially observed images have no contribution during learning. [sent-359, score-0.1]
82 Figure 4: Reconstruction examples by PCSA when all images are partially observed: Frey (left) and ORL (right). [sent-365, score-0.084]
83 3 Completing Partially Observed Image Tensor We proceed to consider modeling the face image data arranged in a tensor. [sent-368, score-0.052]
84 The dataset under consideration is a subset of the CMU PIE database [18], and totally has 5100 face images from 30 individuals. [sent-369, score-0.114]
85 Each person’s face exhibits 170 images corresponding to 170 different pose-and-illumination combinations. [sent-370, score-0.099]
86 Each normalized image has 32 × 32 greyscale pixels, and the dataset is thus a tensor X ∈ R1024×30×170 , whose three modes correspond to pixel, identity, and pose/illumination, respectively. [sent-371, score-0.144]
87 , correlations among pixels, identities, and poses/illuminations respectively) and fill the missing values in X. [sent-382, score-0.153]
88 In [6], an M2 SA method was proposed to conduct multilinear subspace analysis with missing values on the tensor data, via consecutive SVD dimension reductions on each mode. [sent-383, score-0.301]
89 Figure 5: Typical normalized face images from the CMU PIE database. [sent-384, score-0.099]
90 true: filled: true: filled: Figure 6: Typical missing images filled by PCSA-3. [sent-385, score-0.216]
91 Original images (in the odd rows) are randomly picked and removed, and PCSA-3 fills the images in the even rows. [sent-386, score-0.126]
92 8 Here, the randomly drawn missing values are not pixels as in Section 4. [sent-400, score-0.186]
93 Compared with the true missing images, the goodness of the filled missing images is evaluated again by CORR and MAE. [sent-402, score-0.369]
94 Still to test the capability in dealing with missing values, the proportion of the missing images is considered as 10%, 20% and 30%, respectively. [sent-403, score-0.402]
95 After 10 independent runs for each proportion, the averages CORR and average MAE of filing the missing images by PCSA-3 and M2 SA are compared in Table 3. [sent-404, score-0.247]
96 6 shows some filled missing images when the missing proportion is 20%, which match the original images steadily well. [sent-410, score-0.465]
97 5 Concluding Remarks We have introduced the Probabilistic Co-Subspace Addition (PCSA) model, which simultaneously captures the dependent structures among both rows and columns in data matrices (tensors). [sent-411, score-0.099]
98 Capable to fill missing values, PCSA is extended to not only sparse PCSA with the help of a Jeffreys prior, but also PCSA-k that models arbitrary kth-order tensor data. [sent-413, score-0.296]
99 Although somewhat simple and not designed for any particular application, the experiments demonstrate the effectiveness and efficiency of PCSA on modeling matrix (tensor) data and filling missing values. [sent-414, score-0.192]
100 Face image modeling by multilinear subspace analysis with missing values. [sent-460, score-0.21]
wordName wordTfidf (topN-words)
[('pcsa', 0.891), ('pma', 0.178), ('missing', 0.153), ('tensor', 0.107), ('mae', 0.106), ('sylvester', 0.077), ('sa', 0.075), ('emotions', 0.073), ('lling', 0.069), ('gplvm', 0.068), ('orl', 0.068), ('xij', 0.063), ('images', 0.063), ('frey', 0.062), ('bayesian', 0.056), ('lled', 0.056), ('dk', 0.052), ('ratings', 0.048), ('tensors', 0.046), ('nf', 0.045), ('rmse', 0.045), ('eij', 0.043), ('ay', 0.039), ('sparseness', 0.036), ('variational', 0.036), ('face', 0.036), ('rdi', 0.036), ('sparse', 0.036), ('di', 0.035), ('cmu', 0.035), ('dimensionalities', 0.033), ('pixels', 0.033), ('proportion', 0.033), ('lmc', 0.031), ('pie', 0.031), ('runs', 0.031), ('completing', 0.03), ('pmf', 0.03), ('ky', 0.029), ('entries', 0.028), ('diag', 0.028), ('corr', 0.027), ('jeffreys', 0.027), ('ppca', 0.027), ('matrices', 0.025), ('hidden', 0.025), ('structures', 0.024), ('rt', 0.024), ('gelfand', 0.024), ('intricately', 0.024), ('kz', 0.024), ('rbmf', 0.024), ('subspace', 0.023), ('matrix', 0.023), ('sb', 0.023), ('person', 0.023), ('music', 0.022), ('updating', 0.021), ('ji', 0.021), ('latent', 0.021), ('greyscale', 0.021), ('lz', 0.021), ('partially', 0.021), ('probabilistic', 0.02), ('rows', 0.02), ('quicker', 0.019), ('columnwise', 0.019), ('maximizing', 0.019), ('uppercase', 0.018), ('multilinear', 0.018), ('prior', 0.018), ('inference', 0.018), ('inef', 0.018), ('gij', 0.017), ('updated', 0.017), ('volume', 0.017), ('mappings', 0.017), ('sy', 0.017), ('extract', 0.016), ('modeling', 0.016), ('dependencies', 0.016), ('heteroscedastic', 0.016), ('modes', 0.016), ('observed', 0.016), ('ma', 0.016), ('consideration', 0.015), ('gamma', 0.015), ('variances', 0.015), ('columns', 0.015), ('numeric', 0.015), ('independently', 0.015), ('column', 0.015), ('collaborative', 0.015), ('dependent', 0.015), ('generative', 0.015), ('truncated', 0.015), ('dimensional', 0.014), ('mode', 0.014), ('jmlr', 0.014), ('nj', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
Author: Lei Shi
Abstract: For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. By formulating the posterior updating into the task of solving Sylvester equations, we propose an efficient variational inference algorithm. Furthermore, PCSA is extended to tackling and filling missing values, to adapting model sparseness, and to modelling tensor data. In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.
2 0.089736462 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs
Author: Michael Collins, Shay B. Cohen
Abstract: We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance. 1
3 0.064502358 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data
Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy
Abstract: A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases. 1
4 0.058553156 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
Author: S. D. Babacan, Shinichi Nakajima, Minh Do
Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we first develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the first method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1
5 0.05441308 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
Author: Minjie Xu, Jun Zhu, Bo Zhang
Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1
6 0.053773474 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing
7 0.051755786 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
8 0.046854954 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
9 0.044962112 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data
10 0.044941112 9 nips-2012-A Geometric take on Metric Learning
11 0.03629297 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
12 0.0360546 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
13 0.035956074 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
14 0.033936962 187 nips-2012-Learning curves for multi-task Gaussian process regression
15 0.033819336 75 nips-2012-Collaborative Ranking With 17 Parameters
16 0.032198142 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
17 0.031835411 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
18 0.031290378 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
19 0.029549971 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features
20 0.029325293 8 nips-2012-A Generative Model for Parts-based Object Segmentation
topicId topicWeight
[(0, 0.095), (1, 0.037), (2, -0.021), (3, -0.021), (4, -0.038), (5, -0.021), (6, -0.011), (7, 0.017), (8, 0.003), (9, -0.03), (10, -0.012), (11, -0.006), (12, -0.018), (13, -0.011), (14, -0.004), (15, 0.056), (16, 0.023), (17, -0.0), (18, 0.079), (19, -0.048), (20, 0.014), (21, -0.005), (22, -0.003), (23, 0.001), (24, -0.013), (25, 0.018), (26, 0.025), (27, -0.035), (28, -0.017), (29, 0.028), (30, -0.092), (31, 0.006), (32, 0.018), (33, 0.018), (34, -0.011), (35, -0.045), (36, 0.035), (37, -0.034), (38, 0.016), (39, -0.011), (40, -0.0), (41, -0.012), (42, 0.051), (43, -0.082), (44, -0.003), (45, -0.011), (46, -0.05), (47, -0.013), (48, -0.012), (49, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.88487637 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
Author: Lei Shi
Abstract: For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. By formulating the posterior updating into the task of solving Sylvester equations, we propose an efficient variational inference algorithm. Furthermore, PCSA is extended to tackling and filling missing values, to adapting model sparseness, and to modelling tensor data. In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.
2 0.63610321 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
Author: Minjie Xu, Jun Zhu, Bo Zhang
Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1
3 0.63584983 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data
Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy
Abstract: A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases. 1
4 0.62722373 103 nips-2012-Distributed Probabilistic Learning for Camera Networks with Missing Data
Author: Sejong Yoon, Vladimir Pavlovic
Abstract: Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in wide-area surveillance can benefit from distributed modeling, either because of physical or computational constraints. Most distributed models to date use algebraic approaches (such as distributed SVD) and as a result cannot explicitly deal with missing data. In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we show how traditional centralized models, such as probabilistic PCA and missing-data PPCA, can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed affine structure from motion. Our experiments suggest that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations. 1
5 0.56794447 22 nips-2012-A latent factor model for highly multi-relational data
Author: Rodolphe Jenatton, Nicolas L. Roux, Antoine Bordes, Guillaume R. Obozinski
Abstract: Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations. 1
6 0.55748814 192 nips-2012-Learning the Dependency Structure of Latent Factors
7 0.55400461 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
8 0.54843956 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding
9 0.53976089 225 nips-2012-Multi-task Vector Field Learning
10 0.53577954 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs
11 0.5047847 59 nips-2012-Bayesian nonparametric models for bipartite graphs
12 0.49239731 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
13 0.48154137 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
14 0.47139749 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
15 0.45511031 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
16 0.4531602 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction
17 0.44339192 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
18 0.4423123 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
19 0.43666223 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA
20 0.43339539 37 nips-2012-Affine Independent Variational Inference
topicId topicWeight
[(0, 0.045), (9, 0.015), (17, 0.012), (21, 0.017), (38, 0.102), (42, 0.019), (54, 0.033), (55, 0.013), (72, 0.037), (74, 0.054), (76, 0.133), (80, 0.096), (82, 0.244), (92, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.76230401 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
Author: Lei Shi
Abstract: For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two low-dimensional features, which distribute in the row-wise and column-wise latent subspaces respectively. In consequence, PCSA captures the dependencies among entries intricately, and is able to handle non-Gaussian and heteroscedastic densities. By formulating the posterior updating into the task of solving Sylvester equations, we propose an efficient variational inference algorithm. Furthermore, PCSA is extended to tackling and filling missing values, to adapting model sparseness, and to modelling tensor data. In comparison with several state-of-art methods, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.
2 0.73993313 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
3 0.73241794 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability
Author: Jayadev Acharya, Hirakendu Das, Alon Orlitsky
Abstract: The minimax KL-divergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. In online estimation and learning, it is the lowest expected log-loss regret when guessing a sequence of random values generated by a distribution in P. In hypothesis testing, it upper bounds the largest number of distinguishable distributions in P. Motivated by problems ranging from population estimation to text classification and speech recognition, several machine-learning and information-theory researchers have recently considered label-invariant observations and properties induced by i.i.d. distributions. A sufficient statistic for all these properties is the data’s profile, the multiset of the number of times each data element appears. Improving on a sequence of previous works, we show that the redundancy of the collection of distributions induced over profiles by length-n i.i.d. sequences is between 0.3 · n1/3 and n1/3 log2 n, in particular, establishing its exact growth power. 1
4 0.64564896 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection
Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville
Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identification of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efficient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any significant loss in quality of results. 1
5 0.6444605 197 nips-2012-Learning with Recursive Perceptual Representations
Author: Oriol Vinyals, Yangqing Jia, Li Deng, Trevor Darrell
Abstract: Linear Support Vector Machines (SVMs) have become very popular in vision as part of state-of-the-art object recognition and other classification tasks but require high dimensional feature spaces for good performance. Deep learning methods can find more compact representations but current methods employ multilayer perceptrons that require solving a difficult, non-convex optimization problem. We propose a deep non-linear classifier whose layers are SVMs and which incorporates random projection as its core stacking element. Our method learns layers of linear SVMs recursively transforming the original data manifold through a random projection of the weak prediction computed from each layer. Our method scales as linear SVMs, does not rely on any kernel computations or nonconvex optimization, and exhibits better generalization ability than kernel-based SVMs. This is especially true when the number of training samples is smaller than the dimensionality of data, a common scenario in many real-world applications. The use of random projections is key to our method, as we show in the experiments section, in which we observe a consistent improvement over previous –often more complicated– methods on several vision and speech benchmarks. 1
6 0.64361775 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
7 0.64316291 9 nips-2012-A Geometric take on Metric Learning
8 0.64179724 250 nips-2012-On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization
9 0.64133441 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
10 0.64097458 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
11 0.63999873 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
12 0.639476 168 nips-2012-Kernel Latent SVM for Visual Recognition
13 0.63920844 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
14 0.63875496 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
15 0.63869464 279 nips-2012-Projection Retrieval for Classification
16 0.63715112 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
17 0.63605809 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization
18 0.63605589 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
19 0.6355741 188 nips-2012-Learning from Distributions via Support Measure Machines
20 0.63513273 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC