nips nips2007 nips2007-158 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andriy Mnih, Ruslan Salakhutdinov
Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. [sent-3, score-0.204]
2 We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. [sent-5, score-0.129]
3 Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. [sent-6, score-0.373]
4 The resulting model is able to generalize considerably better for users with very few ratings. [sent-7, score-0.181]
5 When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0. [sent-8, score-0.095]
6 The idea behind such models is that attitudes or preferences of a user are determined by a small number of unobserved factors. [sent-11, score-0.138]
7 For example, for N users and M movies, the N × M preference matrix R is given by the product of an N × D user coefficient matrix U T and a D × M factor matrix V [7]. [sent-13, score-0.32]
8 All these models can be viewed as graphical models in which hidden factor variables have directed connections to variables that represent user ratings. [sent-16, score-0.15]
9 ,M σ σ Figure 1: The left panel shows the graphical model for Probabilistic Matrix Factorization (PMF). [sent-42, score-0.078]
10 The right panel shows the graphical model for constrained PMF. [sent-43, score-0.179]
11 Many of the collaborative filtering algorithms mentioned above have been applied to modelling user ratings on the Netflix Prize dataset that contains 480,189 users, 17,770 movies, and over 100 million observations (user/movie/rating triples). [sent-44, score-0.309]
12 Second, most of the existing algorithms have trouble making accurate predictions for users who have very few ratings. [sent-47, score-0.167]
13 A common practice in the collaborative filtering community is to remove all users with fewer than some minimal number of ratings. [sent-48, score-0.187]
14 For example, the Netflix dataset is very imbalanced, with “infrequent” users rating less than 5 movies, while “frequent” users rating over 10,000 movies. [sent-50, score-0.442]
15 However, since the standardized test set includes the complete range of users, the Netflix dataset provides a much more realistic and useful benchmark for collaborative filtering algorithms. [sent-51, score-0.081]
16 The goal of this paper is to present probabilistic algorithms that scale linearly with the number of observations and perform well on very sparse and imbalanced datasets, such as the Netflix dataset. [sent-52, score-0.086]
17 In Section 2 we present the Probabilistic Matrix Factorization (PMF) model that models the user preference matrix as a product of two lower-rank user and movie matrices. [sent-53, score-0.429]
18 In Section 3, we extend the PMF model to include adaptive priors over the movie and user feature vectors and show how these priors can be used to control model complexity automatically. [sent-54, score-0.547]
19 In Section 4 we introduce a constrained version of the PMF model that is based on the assumption that users who rate similar sets of movies have similar preferences. [sent-55, score-0.33]
20 We also show that constrained PMF and PMF with learnable priors improve model performance significantly. [sent-57, score-0.212]
21 Our results demonstrate that constrained PMF is especially effective at making better predictions for users with few ratings. [sent-58, score-0.268]
22 Let Rij represent the rating of user i for movie j, U ∈ RD×N and V ∈ RD×M be latent user and movie feature matrices, with column vectors Ui and Vj representing user-specific and movie-specific latent feature vectors respectively. [sent-60, score-0.777]
23 Since model performance is measured by computing the root mean squared error (RMSE) on the test set we first adopt a probabilistic linear model with Gaussian observation noise (see fig. [sent-61, score-0.088]
24 We also place zero-mean spherical Gaussian priors [1, 11] on user and movie feature vectors: N 2 p(U |σU ) = M 2 N (Ui |0, σU I), 2 p(V |σV ) = i=1 2 N (Vj |0, σV I). [sent-65, score-0.426]
25 Maximizing the log-posterior over movie and user features with hyperparameters (i. [sent-67, score-0.319]
26 Note that this model can be viewed as a probabilistic extension of the SVD model, since if all ratings have been observed, the objective given by Eq. [sent-72, score-0.175]
27 , K to the interval [0, 1] using the function t(x) = (x − 1)/(K − 1), so that the range of valid rating values matches the range of predictions our model makes. [sent-78, score-0.116]
28 the number of observations differs significantly among different rows or columns, this approach fails, since any single number of feature dimensions will be too high for some feature vectors and too low for others. [sent-86, score-0.121]
29 3 As shown above, the problem of approximating a matrix in the L2 sense by a product of two low-rank matrices that are regularized by penalizing their Frobenius norm can be viewed as MAP estimation in a probabilistic model with spherical Gaussian priors on the rows of the low-rank matrices. [sent-91, score-0.244]
30 The complexity of the model is controlled by the hyperparameters: the noise variance σ 2 and the the 2 2 parameters of the priors (σU and σV above). [sent-92, score-0.102]
31 Introducing priors for the hyperparameters and maximizing the log-posterior of the model over both parameters and hyperparameters as suggested in [6] allows model complexity to be controlled automatically based on the training data. [sent-93, score-0.26]
32 Using spherical priors for user and movie feature vectors in this framework leads to the standard form of PMF with λU and λV chosen automatically. [sent-94, score-0.467]
33 For example, we can use priors with diagonal or even full covariance matrices as well as adjustable means for the feature vectors. [sent-96, score-0.168]
34 When the prior is Gaussian, the optimal hyperparameters can be found in closed form if the movie and user feature vectors are kept fixed. [sent-99, score-0.417]
35 Thus to simplify learning we alternate between optimizing the hyperparameters and updating the feature vectors using steepest ascent with the values of hyperparameters fixed. [sent-100, score-0.211]
36 In all of our experiments we used improper priors for the hyperparameters, but it is easy to extend the closed form updates to handle conjugate priors for the hyperparameters. [sent-102, score-0.138]
37 4 Constrained PMF Once a PMF model has been fitted, users with very few ratings will have feature vectors that are close to the prior mean, or the average user, so the predicted ratings for those users will be close to the movie average ratings. [sent-103, score-0.81]
38 In this section we introduce an additional way of constraining user-specific feature vectors that has a strong effect on infrequent users. [sent-104, score-0.132]
39 We define the feature vector for user i as: Ui = Yi + M k=1 Iik Wk . [sent-106, score-0.147]
40 M k=1 Iik (7) where I is the observed indicator matrix with Iij taking on value 1 if user i rated movie j and 0 otherwise2 . [sent-107, score-0.341]
41 Intuitively, the ith column of the W matrix captures the effect of a user having rated a particular movie has on the prior mean of the user’s feature vector. [sent-108, score-0.386]
42 As a result, users that have seen the same (or similar) movies will have similar prior distributions for their feature vectors. [sent-109, score-0.266]
43 Note that Yi can be seen as the offset added to the mean of the prior distribution to get the feature vector Ui for the user i. [sent-110, score-0.175]
44 We now define the conditional distribution over the observed ratings as N M p(R|Y, V, W, σ 2 ) = N (Rij |g Yi + i=1 j=1 M k=1 Iik Wk T Vj M k=1 Iik Iij , σ2 ) . [sent-113, score-0.133]
45 (8) We regularize the latent similarity constraint matrix W by placing a zero-mean spherical Gaussian prior on it: M 2 N (Wk |0, σW I). [sent-114, score-0.111]
46 p(W |σW ) = (9) k=1 2 If no rating information is available about some user i, i. [sent-115, score-0.174]
47 9 0 100 Constrained PMF 5 10 15 20 25 30 35 40 45 50 55 60 Epochs Epochs Figure 2: Left panel: Performance of SVD, PMF and PMF with adaptive priors, using 10D feature vectors, on the full Netflix validation data. [sent-134, score-0.102]
48 Right panel: Performance of SVD, Probabilistic Matrix Factorization (PMF) and constrained PMF, using 30D feature vectors, on the validation data. [sent-135, score-0.176]
49 The training time for the constrained PMF model scales linearly with the number of observations, which allows for a fast and simple implementation. [sent-140, score-0.164]
50 As we show in our experimental results section, this model performs considerably better than a simple unconstrained PMF model, especially on infrequent users. [sent-141, score-0.095]
51 1 Description of the Netflix Data According to Netflix, the data were collected between October 1998 and December 2005 and represent the distribution of all ratings Netflix obtained during this period. [sent-143, score-0.121]
52 The training dataset consists of 100,480,507 ratings from 480,189 randomly-chosen, anonymous users on 17,770 movie titles. [sent-144, score-0.467]
53 In addition to the training and validation data, Netflix also provides a test set containing 2,817,131 user/movie pairs with the ratings withheld. [sent-146, score-0.188]
54 The pairs were selected from the most recent ratings for a subset of the users in the training dataset. [sent-147, score-0.281]
55 To reduce the unintentional overfitting to the test set that plagues many empirical comparisons in the machine learning literature, performance is assessed by submitting predicted ratings to Netflix who then post the root mean squared error (RMSE) on an unknown half of the test set. [sent-148, score-0.146]
56 To provide additional insight into the performance of different algorithms we created a smaller and much more difficult dataset from the Netflix data by randomly selecting 50,000 users and 1850 movies. [sent-151, score-0.17]
57 The toy dataset contains 1,082,982 training and 2,462 validation user/movie pairs. [sent-152, score-0.11]
58 Over 50% of the users in the training dataset have less than 10 ratings. [sent-153, score-0.192]
59 2 Details of Training To speed-up the training, instead of performing batch learning we subdivided the Netflix data into mini-batches of size 100,000 (user/movie/rating triples), and updated the feature vectors after each 5 mini-batch. [sent-155, score-0.081]
60 3 Results for PMF with Adaptive Priors To evaluate the performance of PMF models with adaptive priors we used models with 10D features. [sent-160, score-0.128]
61 The feature vectors of the SVD model were not regularized in any way. [sent-164, score-0.113]
62 The first PMF model with adaptive priors (PMFA1) had Gaussian priors with spherical covariance matrices on user and movie feature vectors, while the second model (PMFA2) had diagonal covariance matrices. [sent-170, score-0.618]
63 In both cases, the adaptive priors had adjustable means. [sent-171, score-0.11]
64 Prior parameters and noise covariances were updated after every 10 and 100 feature matrix updates respectively. [sent-172, score-0.093]
65 Note that the curve for the PMF model with spherical covariances is not shown since it is virtually identical to the curve for the model with diagonal covariances. [sent-175, score-0.154]
66 The models with adaptive priors clearly outperform the competing models, achieving the RMSE of 0. [sent-182, score-0.112]
67 These results suggest that automatic regularization through adaptive priors works well in practice. [sent-185, score-0.118]
68 Moreover, our preliminary results for models with higher-dimensional feature vectors suggest that the gap in performance due to the use of adaptive priors is likely to grow as the dimensionality of feature vectors increases. [sent-186, score-0.274]
69 4 Results for Constrained PMF For experiments involving constrained PMF models, we used 30D features (D = 30), since this choice resulted in the best model performance on the validation set. [sent-189, score-0.156]
70 Performance results of SVD, PMF, and constrained PMF on the toy dataset are shown on Figure 3. [sent-191, score-0.154]
71 The feature vectors were initialized to the same values in all three models. [sent-192, score-0.081]
72 For both PMF and constrained PMF models the regularization parameters were set to λU = λY = λV = λW = 0. [sent-193, score-0.139]
73 The constrained PMF model performs much better and converges considerably faster than the unconstrained PMF model. [sent-196, score-0.161]
74 Figure 3 (right panel) shows the effect of constraining user-specific features on the predictions for infrequent users. [sent-197, score-0.08]
75 Performance of the PMF model for a group of users that have fewer than 5 ratings in the training datasets is virtually identical to that of the movie average algorithm that always predicts the average rating of each movie. [sent-198, score-0.571]
76 The constrained PMF model, however, performs considerably better on users with few ratings. [sent-199, score-0.262]
77 As the number of ratings increases, both PMF and constrained PMF exhibit similar performance. [sent-200, score-0.222]
78 One other interesting aspect of the constrained PMF model is that even if we know only what movies the user has rated, but do not know the values of the ratings, the model can make better predictions than the movie average model. [sent-201, score-0.512]
79 For the toy dataset, we randomly sampled an additional 50,000 users, and for each of the users compiled a list of movies the user has rated and then discarded the actual ratings. [sent-202, score-0.38]
80 This experiment strongly suggests that knowing only which movies a user rated, but not the actual ratings, can still help us to model that user’s preferences better. [sent-206, score-0.213]
81 8 200 1−5 6−10 11−20 21−40 41−80 81−160 >161 Number of Observed Ratings Epochs Figure 3: Left panel: Performance of SVD, Probabilistic Matrix Factorization (PMF) and constrained PMF on the validation data. [sent-230, score-0.136]
82 Right panel: Performance of constrained PMF, PMF, and the movie average algorithm that always predicts the average rating of each movie. [sent-232, score-0.342]
83 The users were grouped by the number of observed ratings in the training data. [sent-233, score-0.304]
84 9 0 Constrained PMF (using Test rated/unrated id) 5 10 15 20 25 30 35 40 45 50 55 60 Epochs Figure 4: Left panel: Performance of constrained PMF, PMF, and the movie average algorithm that always predicts the average rating of each movie. [sent-253, score-0.342]
85 The users were grouped by the number of observed rating in the training data, with the x-axis showing those groups, and the y-axis displaying RMSE on the full Netflix validation data for each such group. [sent-254, score-0.285]
86 Middle panel: Distribution of users in the training dataset. [sent-255, score-0.16]
87 Right panel: Performance of constrained PMF and constrained PMF that makes use of an additional rated/unrated information obtained from the test dataset. [sent-256, score-0.202]
88 For both the PMF and constrained PMF models the regularization parameters were set to λU = λY = λV = λW = 0. [sent-258, score-0.139]
89 Figure 2 (right panel) shows that constrained PMF significantly outperforms the unconstrained PMF model, achieving a RMSE of 0. [sent-260, score-0.118]
90 Figure 4 (left panel) shows that the constrained PMF model is able to generalize considerably better for users with very few ratings. [sent-264, score-0.282]
91 Note that over 10% of users in the training dataset have fewer than 20 ratings. [sent-265, score-0.192]
92 As the number of ratings increases, the effect from the offset in Eq. [sent-266, score-0.132]
93 7 diminishes, and both PMF and constrained PMF achieve similar performance. [sent-267, score-0.101]
94 Netflix tells us in advance which user/movie pairs occur in the test set, so we have an additional category: movies that were viewed but for which the rating is unknown. [sent-269, score-0.149]
95 This is a valuable source of information about users who occur several times in the test set, especially if they have only a small number of ratings in the training set. [sent-270, score-0.281]
96 The constrained PMF model can easily take this information into account. [sent-271, score-0.121]
97 When we linearly combine the predictions of PMF, PMF with a learnable prior, and constrained PMF, we achieve an error rate of 0. [sent-273, score-0.173]
98 When the predictions of multiple PMF models are linearly combined with the predictions of multiple RBM models, recently introduced by [8], we achieve an error rate of 0. [sent-275, score-0.095]
99 7 6 Summary and Discussion In this paper we presented Probabilistic Matrix Factorization (PMF) and its two derivatives: PMF with a learnable prior and constrained PMF. [sent-277, score-0.14]
100 We also demonstrated that these models can be efficiently trained and successfully applied to a large dataset containing over 100 million movie ratings. [sent-278, score-0.212]
wordName wordTfidf (topN-words)
[('pmf', 0.865), ('ix', 0.184), ('movie', 0.154), ('users', 0.138), ('svd', 0.127), ('ratings', 0.121), ('net', 0.118), ('rmse', 0.109), ('user', 0.107), ('constrained', 0.101), ('iij', 0.079), ('movies', 0.071), ('vj', 0.07), ('priors', 0.069), ('rating', 0.067), ('iik', 0.06), ('panel', 0.058), ('hyperparameters', 0.058), ('spherical', 0.056), ('ro', 0.053), ('rij', 0.049), ('collaborative', 0.049), ('rated', 0.043), ('epochs', 0.042), ('vectors', 0.041), ('uit', 0.04), ('factorization', 0.04), ('feature', 0.04), ('ui', 0.039), ('validation', 0.035), ('infrequent', 0.035), ('ln', 0.033), ('wk', 0.033), ('dataset', 0.032), ('imbalanced', 0.03), ('predictions', 0.029), ('covariances', 0.028), ('adaptive', 0.027), ('matrix', 0.025), ('nathan', 0.024), ('probabilistic', 0.023), ('considerably', 0.023), ('learnable', 0.022), ('regularization', 0.022), ('training', 0.022), ('toy', 0.021), ('linearly', 0.021), ('andriy', 0.02), ('mnih', 0.02), ('netflix', 0.02), ('frobenius', 0.02), ('model', 0.02), ('geoffrey', 0.019), ('toronto', 0.018), ('diagonal', 0.018), ('ruslan', 0.018), ('unconstrained', 0.017), ('ltering', 0.017), ('prior', 0.017), ('datasets', 0.017), ('score', 0.016), ('yi', 0.016), ('regularizing', 0.016), ('momentum', 0.016), ('matrices', 0.016), ('constraining', 0.016), ('models', 0.016), ('preferences', 0.015), ('salakhutdinov', 0.015), ('tommi', 0.015), ('benjamin', 0.015), ('jason', 0.014), ('adjustable', 0.014), ('steepest', 0.014), ('root', 0.014), ('triples', 0.013), ('rennie', 0.013), ('srebro', 0.013), ('latent', 0.013), ('rd', 0.013), ('controlled', 0.013), ('virtually', 0.012), ('penalizing', 0.012), ('capacity', 0.012), ('observed', 0.012), ('august', 0.012), ('entries', 0.012), ('regularized', 0.012), ('sparse', 0.012), ('grouped', 0.011), ('covariance', 0.011), ('squared', 0.011), ('passes', 0.011), ('viewed', 0.011), ('icml', 0.011), ('september', 0.011), ('boltzmann', 0.011), ('offset', 0.011), ('average', 0.01), ('containing', 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 158 nips-2007-Probabilistic Matrix Factorization
Author: Andriy Mnih, Ruslan Salakhutdinov
Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.
2 0.10404777 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola
Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1
3 0.076410271 156 nips-2007-Predictive Matrix-Variate t Models
Author: Shenghuo Zhu, Kai Yu, Yihong Gong
Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difficult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efficient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1
4 0.068527408 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
Author: Sabri Boutemedjet, Djemel Ziou, Nizar Bouguila
Abstract: Content-based image suggestion (CBIS) targets the recommendation of products based on user preferences on the visual content of images. In this paper, we motivate both feature selection and model order identification as two key issues for a successful CBIS. We propose a generative model in which the visual features and users are clustered into separate classes. We identify the number of both user and image classes with the simultaneous selection of relevant visual features using the message length approach. The goal is to ensure an accurate prediction of ratings for multidimensional non-Gaussian and continuous image descriptors. Experiments on a collected data have demonstrated the merits of our approach.
5 0.062897295 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression
Author: Sennay Ghebreab, Arnold Smeulders, Pieter Adriaans
Abstract: We propose a method for reconstruction of human brain states directly from functional neuroimaging data. The method extends the traditional multivariate regression analysis of discretized fMRI data to the domain of stochastic functional measurements, facilitating evaluation of brain responses to complex stimuli and boosting the power of functional imaging. The method searches for sets of voxel time courses that optimize a multivariate functional linear model in terms of R2 statistic. Population based incremental learning is used to identify spatially distributed brain responses to complex stimuli without attempting to localize function first. Variation in hemodynamic lag across brain areas and among subjects is taken into account by voxel-wise non-linear registration of stimulus pattern to fMRI data. Application of the method on an international test benchmark for prediction of naturalistic stimuli from new and unknown fMRI data shows that the method successfully uncovers spatially distributed parts of the brain that are highly predictive of a given stimulus. 1
6 0.061530065 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
7 0.051762603 19 nips-2007-Active Preference Learning with Discrete Choice Data
8 0.047041677 189 nips-2007-Supervised Topic Models
9 0.043307882 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
10 0.043118738 96 nips-2007-Heterogeneous Component Analysis
11 0.03708506 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
12 0.034235548 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis
13 0.033532754 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
14 0.033015344 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
15 0.028023116 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data
16 0.027708828 135 nips-2007-Multi-task Gaussian Process Prediction
17 0.02676685 61 nips-2007-Convex Clustering with Exemplar-Based Models
18 0.025232268 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems
19 0.025070205 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning
20 0.024681063 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
topicId topicWeight
[(0, -0.092), (1, 0.028), (2, -0.029), (3, -0.014), (4, 0.017), (5, 0.035), (6, -0.037), (7, -0.034), (8, -0.008), (9, -0.043), (10, -0.02), (11, -0.037), (12, 0.107), (13, 0.061), (14, -0.004), (15, 0.017), (16, -0.041), (17, -0.031), (18, -0.005), (19, -0.018), (20, -0.048), (21, -0.028), (22, -0.022), (23, -0.025), (24, -0.054), (25, 0.168), (26, -0.065), (27, -0.098), (28, -0.026), (29, 0.008), (30, -0.003), (31, -0.092), (32, -0.029), (33, -0.007), (34, -0.048), (35, -0.061), (36, 0.061), (37, 0.02), (38, -0.031), (39, -0.082), (40, -0.074), (41, -0.128), (42, 0.167), (43, 0.141), (44, -0.121), (45, -0.21), (46, -0.062), (47, -0.095), (48, 0.015), (49, -0.08)]
simIndex simValue paperId paperTitle
same-paper 1 0.93544412 158 nips-2007-Probabilistic Matrix Factorization
Author: Andriy Mnih, Ruslan Salakhutdinov
Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.
2 0.68557644 156 nips-2007-Predictive Matrix-Variate t Models
Author: Shenghuo Zhu, Kai Yu, Yihong Gong
Abstract: It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate t model (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model selection to encourage sparse predictive models. Due to the non-conjugacy of its prior, it is difficult to make predictions by computing the mode or mean of the posterior distribution. We suggest an optimization method that sequentially minimizes a convex upper-bound of the log-likelihood, which is very efficient and scalable. The experiments on a toy data and EachMovie dataset show a good predictive accuracy of the model. 1
3 0.55001843 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola
Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1
4 0.54098219 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
Author: Sabri Boutemedjet, Djemel Ziou, Nizar Bouguila
Abstract: Content-based image suggestion (CBIS) targets the recommendation of products based on user preferences on the visual content of images. In this paper, we motivate both feature selection and model order identification as two key issues for a successful CBIS. We propose a generative model in which the visual features and users are clustered into separate classes. We identify the number of both user and image classes with the simultaneous selection of relevant visual features using the message length approach. The goal is to ensure an accurate prediction of ratings for multidimensional non-Gaussian and continuous image descriptors. Experiments on a collected data have demonstrated the merits of our approach.
5 0.5378027 96 nips-2007-Heterogeneous Component Analysis
Author: Shigeyuki Oba, Motoaki Kawanabe, Klaus-Robert Müller, Shin Ishii
Abstract: In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noise levels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithms that implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specific components within each block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept. 1
6 0.42367578 19 nips-2007-Active Preference Learning with Discrete Choice Data
7 0.41401634 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data
8 0.3457751 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
9 0.31051481 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes
10 0.2951808 97 nips-2007-Hidden Common Cause Relations in Relational Learning
11 0.29108256 154 nips-2007-Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression
12 0.27724442 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
13 0.27189651 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
14 0.26776588 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis
15 0.25353017 206 nips-2007-Topmoumoute Online Natural Gradient Algorithm
16 0.24235609 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
17 0.23648953 122 nips-2007-Locality and low-dimensions in the prediction of natural experience from fMRI
18 0.23443808 8 nips-2007-A New View of Automatic Relevance Determination
19 0.2298144 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems
20 0.22526813 193 nips-2007-The Distribution Family of Similarity Distances
topicId topicWeight
[(5, 0.044), (9, 0.036), (13, 0.03), (16, 0.046), (18, 0.011), (19, 0.02), (21, 0.08), (31, 0.017), (34, 0.016), (35, 0.043), (47, 0.098), (82, 0.236), (83, 0.089), (85, 0.019), (88, 0.035), (90, 0.056)]
simIndex simValue paperId paperTitle
1 0.77632535 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
Author: Moran Cerf, Jonathan Harel, Wolfgang Einhaeuser, Christof Koch
Abstract: Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses. 1
same-paper 2 0.73835438 158 nips-2007-Probabilistic Matrix Factorization
Author: Andriy Mnih, Ruslan Salakhutdinov
Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.
3 0.67525935 84 nips-2007-Expectation Maximization and Posterior Constraints
Author: Kuzman Ganchev, Ben Taskar, João Gama
Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1
4 0.58850312 96 nips-2007-Heterogeneous Component Analysis
Author: Shigeyuki Oba, Motoaki Kawanabe, Klaus-Robert Müller, Shin Ishii
Abstract: In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noise levels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithms that implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specific components within each block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept. 1
5 0.57307494 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Author: Matthias Bethge, Philipp Berens
Abstract: Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data—the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible. 1
6 0.57141209 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
7 0.56937265 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
8 0.56599426 195 nips-2007-The Generalized FITC Approximation
9 0.56494969 140 nips-2007-Neural characterization in partially observed populations of spiking neurons
10 0.56311011 164 nips-2007-Receptive Fields without Spike-Triggering
11 0.56307828 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
12 0.56236786 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes
13 0.56186676 156 nips-2007-Predictive Matrix-Variate t Models
14 0.56091702 100 nips-2007-Hippocampal Contributions to Control: The Third Way
15 0.56062627 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models
16 0.56061417 86 nips-2007-Exponential Family Predictive Representations of State
17 0.55998504 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
18 0.55976039 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes
19 0.55944037 174 nips-2007-Selecting Observations against Adversarial Objectives
20 0.559331 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations