nips nips2012 nips2012-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. [sent-8, score-0.399]
2 In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. [sent-9, score-0.371]
3 Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. [sent-10, score-0.399]
4 We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. [sent-11, score-0.414]
5 Similarly, Netflix recommends top-T movies it predicts a user will like based on his/her rating and viewing history. [sent-16, score-0.377]
6 However, while recommending a small ordered list of items is a ranking problem, ranking in CF has gained relatively little attention from the learning-to-rank community. [sent-17, score-0.625]
7 However, recent work [23, 15, 2] has shown that by optimizing a ranking objective just given the known ratings a significantly higher ranking accuracy can be achieved as compared to models that optimize rating prediction. [sent-28, score-0.906]
8 Inspired by these results we propose a new ranking framework where we show how the observed ratings can be used to extract effective feature descriptors for every user-item pair. [sent-29, score-0.514]
9 The features do not require any external information and make it it possible to apply any learning-to-rank method to optimize the parameters of the ranking function for the target metric. [sent-30, score-0.334]
10 datasets show that our model outperforms existing rating and ranking approaches to CF. [sent-32, score-0.46]
11 2 Collaborative Ranking Framework In a typical collaborative filtering (CF) problem we are given a set of N users U = {u1 , . [sent-34, score-0.317]
12 The users’ ratings of the items can be represented by an N ×M matrix R where R(un , vm ) is the rating assigned by user un to item vm and R(un , vm ) = 0 if vm is not rated by un . [sent-41, score-3.628]
13 We use U(vm ) to denote the set of all users that have rated vm and V(un ) to denote the set of items that have been rated by un . [sent-42, score-1.37]
14 We use vector notation: R(un , :) denotes the n’th row of R (1 × M vector), and R(:, vm ) denotes the m’th column (N × 1 vector). [sent-43, score-0.444]
15 As mentioned above, most research has concentrated on the rating prediction problem in CF where the aim is to accurately predict the ratings for the unrated items for each user. [sent-44, score-0.729]
16 However, most applications that use CF typically aim to recommend only a small ranked set of items to each user. [sent-45, score-0.296]
17 Thus rather than concentrating on rating prediction we instead approach this problem from the ranking viewpoint and refer to it as Collaborative Ranking (CR). [sent-46, score-0.399]
18 In CR the goal is to rank the unrated items in the order of relevance to the user. [sent-47, score-0.354]
19 A ranking of the items V can be represented as a permutation π : {1, . [sent-48, score-0.417]
20 , M } where π(m) = l denotes the rank of the item vm and m = π −1 (l). [sent-54, score-0.643]
21 T is typically set to a small value to emphasize that the user will only be shown the top-T ranked items and the items below the top-T are not evaluated. [sent-58, score-0.636]
22 1 Neighborhood-Based Approaches Neighborhood-based CF approaches estimate the unknown ratings for a target user based on the ratings from the set of neighborhood users that tend to rate similarly to the target user. [sent-62, score-1.002]
23 Formally, given the target user un and item vm the neighborhood-based methods find a subset of K neighbor users who are most similar to un and have rated vm , i. [sent-63, score-2.305]
24 We use K(un , vm ) ⊆ U(vm ) \ un to denote the set of K neighboring users. [sent-66, score-0.841]
25 Once the K neighbors are found the rating is predicted by taking the weighted average of the neighbors’ ratings. [sent-69, score-0.311]
26 One problem with the neighborhood-based approaches is that the raw ratings often contain user bias. [sent-71, score-0.475]
27 For instance, some users tend to give high ratings while others tend to give low ones. [sent-72, score-0.447]
28 To correct for these biases various methods have been proposed to normalize or center the ratings [4, 20] before computing the predictions. [sent-73, score-0.275]
29 Another major problem with the neighborhood-based approaches arises from the fact that the observed rating matrix R is typically highly sparse, making it very difficult to find similar neighbors reliably. [sent-74, score-0.363]
30 To addresss this sparsity, most methods employ dimensionality reduction [9] and data smoothing [24] to fill in some of the unknown ratings, or to cluster users before computing user 2 similarity. [sent-75, score-0.34]
31 Instead of predicting ratings, this method uses the neighbors of un to fill in the missing entries in the M ×M pairwise preference matrix Yn , where Yn (vm , vl ) is the preference strength for vm over vl by un . [sent-78, score-1.676]
32 2 Model-Based Approaches In contrast to the neighborhood-based approaches, the model-based approaches use the observed ratings to create a compact model of the data which is then used to predict the unobserved ratings. [sent-84, score-0.307]
33 In PMF every user un and item vm are represented by latent vectors φ(un ) and φ(vm ) of length D. [sent-87, score-1.186]
34 For a given user-item pair (un , vm ) the dot product of the corresponding latent vectors gives the rating prediction: R(un , vm ) ≈ φ(un ) · φ(vm ). [sent-88, score-1.115]
35 The latent representations are learned by minimizing the squared error between the observed ratings and the predicted ones. [sent-89, score-0.383]
36 Latent models have more expressive power and typically perform better than the neighborhoodbased models when the number of observed ratings is small because they are able to learn preference correlations that extend beyond the simple neighborhood similarity. [sent-90, score-0.504]
37 The PMF-based approach uses the latent representations produced by PMF as user-item features and learns a ranking model on these features. [sent-99, score-0.318]
38 The authors of that work also note that the PMF representations might not be optimal for ranking since they are learned using a squared error objective which is very different from most ranking metric. [sent-100, score-0.488]
39 To account for this they propose an extension where both user-item features and the weights of the ranking function are optimized during learning. [sent-101, score-0.281]
40 The documents are represented as query dependent feature vectors and the goal is to learn a feature-based ranking function to rank the documents in the order of relevance to the query. [sent-108, score-0.35]
41 (1) Top-3 closest neighbors {u1 , u5 , u6 } are selected from U(v4 ) = {u1 , u2 , u5 , u6 } (all users who rated v4 ). [sent-110, score-0.366]
42 Note that u2 is not selected because the ratings for u2 deviate significantly from those for u3 . [sent-111, score-0.275]
43 4 Our Approach The main idea behind our approach is to transform the CR problem into a learning-to-rank one and then utilize one of the many developed ranking methods to learn the ranking function. [sent-120, score-0.433]
44 CR can be placed into the learning-to-rank framework by noting that the users correspond to queries and items to documents. [sent-121, score-0.381]
45 For each user the observed ratings indicate the relevance of the corresponding items to that user and can be used to train the ranking function. [sent-122, score-1.061]
46 In this work we bridge this gap and develop a robust feature extraction approach which does not require any external user or item information and is based only on the available training ratings. [sent-124, score-0.352]
47 1 Feature Extraction The PMF-based ranking approach [2] extracts user-item features by concatenating together the latent representations learned by the PMF model. [sent-126, score-0.376]
48 The model thus requires the user-item representations to be learned before the items can be ranked and hence suffers from the main disadvantages of the model-based approaches: the large number of parameters, complex optimization, and expensive inference for new users and items. [sent-127, score-0.486]
49 Formally, given a user-item pair (un , vm ) and a similarity function ψ, we use ψ to extract a subset of the K most similar users to un that rated vm , i. [sent-130, score-1.588]
50 This step is identical to the standard neighborhood-based model, and ψ can be any rating or preference based similarity function. [sent-133, score-0.343]
51 However, recent work in preference aggregation [8, 13] has shown that additional gain can be achieved by taking the relative rating magnitude into account by using either the normalized rating or log rating difference. [sent-136, score-0.732]
52 All three versions of g address the user bias problem mentioned above by using 4 relative comparisons rather than the absolute rating magnitude. [sent-137, score-0.359]
53 In this form WINnm (k) corresponds to the net positive preference for vm by neighbor uk . [sent-138, score-0.777]
54 Together the three matrices thus describe the relative preferences for vm across all the neighbors of un . [sent-140, score-1.04]
55 Normalization by |V(uk ) \ vm | (number of observed ratings for uk excluding vm ), ensures that the entries are comparable across neighbors with different numbers of ratings. [sent-141, score-1.406]
56 For unpopular items vm that do not have many ratings with |U(vm )| < K, the number of neighbors will be less than K, i. [sent-142, score-1.048]
57 When such an item is encountered we shrink the preference matrices to be the same size as |K(un , vm )|. [sent-145, score-0.735]
58 Figure 1 shows an example rating matrix R together with the preference matrices computed for the user-item pair (u3 , v4 ). [sent-146, score-0.341]
59 The last statistic counts the number of neighbors (out of K) that express any positive preference towards vm , and together with σ summarizes the overall confidence of the preference. [sent-148, score-0.69]
60 During training our aim is now to use the observed training ratings to learn a scoring function f : R|γ| → R which maximizes the target IR metric, such as NDCG, across all users. [sent-152, score-0.452]
61 , vM }, we (1) extract features for each item vm using the neighbors of (u, vm ); (2) apply the learned scoring function to get the score for every item; and (3) sort the scores to produce the ranking. [sent-156, score-1.33]
62 It is important to note here that, first, a single scoring function Figure 2: The flow diagram for is learned for all users and items so the number of parameters is WLT, our feature-based CR model. [sent-158, score-0.48]
63 independent of the number of users or items and only depends on the size of γ. [sent-159, score-0.381]
64 Second, given a new user u no optimization is necessary to produce a ranking of the items for u. [sent-161, score-0.585]
65 Similarly to neighborhoodbased methods, our approach only requires computing the neighbors to extract the features and apply the learned scoring function to get the ranking. [sent-162, score-0.347]
66 This is also a significant advantage over most userbased approaches where it is typically necessary to learn a new model for every user not present in the training data before predictions can be made. [sent-163, score-0.29]
67 Moreover, the extracted features incorporate preference confidence information such as the variance across the neighbors and the fraction of the neighbors that generated each preference type (positive, negative and tie). [sent-165, score-0.579]
68 Note that an analogous item-based approach can be taken here by similarly summarizing the preferences of un for items that are closest to vm , we leave this for future work. [sent-168, score-1.086]
69 A modified version of this approach adapted to binary ratings recently placed second in the Million Song Dataset Challenge [18] ran by Kaggle. [sent-169, score-0.275]
70 2 Learning the Scoring Function Given the user-item features extracted based on the neighbors our goal is to use the observed training ratings for each user to optimize the parameters of the scoring function for the target IR metric. [sent-171, score-0.757]
71 If a given training item vm is not ranked by any other user except un the feature vector is set to zero (γ(un , vm ) ≡ 0). [sent-173, score-1.646]
72 One way to avoid missing features is to learn only with those items that have at least ratings in the training set. [sent-174, score-0.574]
73 We take a different approach, modifying the conventional linear scoring function to include an additional bias term b0 : f (γ(un , vm ), W) = w · γ(un , vm ) + b + I[U(vm ) \ un = ∅]b0 (4) where W = {w, b, b0 } is the set of free parameters to be learned. [sent-176, score-1.368]
74 The bias term b0 provides a base score for vm if vm does not have enough ratings in the training data. [sent-178, score-1.185]
75 Second, user information can be incorporated into the model by learning user specific weights. [sent-181, score-0.336]
76 To incorporate user information we can learn a separate set of weights wn for each user un or group of users. [sent-182, score-0.789]
77 The weights will provide user specific information and are then applied to rank the unrated items for the corresponding user(s). [sent-183, score-0.511]
78 for items, can be incorporated by simply concatenating it with γ(un , vm ) and expanding the dimensionality of W. [sent-186, score-0.47]
79 Note that if these additional features can be extracted efficiently, incorporating them will not add significant overhead to either learning or inference and the model can still be applied to new users and items very efficiently. [sent-187, score-0.466]
80 , which we subsampled by first selecting the 10,000 most popular items and then selecting the 100,000 users with the most ratings. [sent-196, score-0.381]
81 In addition to subsampling we rescaled user ratings from 0-100 to the 1-5 interval to make the data consistent with the other two datasets. [sent-198, score-0.466]
82 The user, item and rating statistics are summarized in Table 1. [sent-200, score-0.332]
83 To investigate the effect that the number of ratings has on accuracy we follow the framework of [23, 2]. [sent-201, score-0.275]
84 For each dataset we randomly select 10, 20, 30, 40 ratings Table 1: Dataset statistics. [sent-202, score-0.299]
85 Users with less than 30, 40, 50, MovieLens-1 1000 1700 100,000 60 ratings were removed to ensure that we could evaluate MovieLens-2 72,000 10,000 10,000,000 on at least 10 ratings for each user. [sent-204, score-0.55]
86 100,000 10,000 45,729,723 of test items varies significantly across users with many users having more test ratings than training ones. [sent-206, score-0.869]
87 This simulates the real life CR scenario where the set of unrated items from which the recommendations are generated is typically much larger than the rated item set for each user. [sent-207, score-0.498]
88 We also compare with two collaborative ranking models: PMF-based ranker [2] (PMF-R) and CofiRank [23] (CO). [sent-213, score-0.353]
89 In all experiments only ratings in the training set were used to select the neighbors, and make predictions for the validation and test set items. [sent-372, score-0.297]
90 Across the datasets the gains are especially large at lower truncations N@1 and N@3, which is important since those items will most likely be the ones viewed by the user. [sent-377, score-0.263]
91 First, when the number of users and ratings is small (MovieLens-1) the performance of the UB approach significantly drops. [sent-379, score-0.447]
92 This is likely due to the fact that neighbors cannot be found reliably in this setting since users have little overlap in ratings. [sent-380, score-0.292]
93 Thus both users and items can be taken from a different, unseen during training, set. [sent-390, score-0.381]
94 In strong generalization the models are evaluated on users not present at training time while keeping the item set fixed, while here the item set also changes. [sent-392, score-0.476]
95 has over 5 times more items and 72 times more users than MovieLens-1, majority of which the WLTM1 model has not seen during training. [sent-519, score-0.381]
96 The WLT-Y model trained on musical artist data achieves state-of-the-art performance on MovieLens-2 movie data, performing better than all the baselines when 20, 30 and 40 ratings are used for training. [sent-521, score-0.362]
97 Mean preferences and the number of neighbors features have the highest absolute weights which indicates that they are the most useful for predicting the item scores. [sent-528, score-0.37]
98 The features allow us to apply any learning-to-rank approach to learn the ranking function. [sent-531, score-0.276]
99 Experimental results show that by using these features state-of-the art ranking results can be achieved. [sent-532, score-0.259]
100 Going forward, the strong transfer results call into question whether the complex machinery developed for CF is appropriate when the true goal is recommendation, as the required information for finding the best items to recommend can be obtained from basic neighborhood statistics. [sent-533, score-0.295]
wordName wordTfidf (topN-words)
[('vm', 0.444), ('un', 0.397), ('ratings', 0.275), ('wlt', 0.244), ('items', 0.209), ('ranking', 0.208), ('rating', 0.191), ('users', 0.172), ('user', 0.168), ('winnm', 0.168), ('collaborative', 0.145), ('item', 0.141), ('yahoo', 0.14), ('cf', 0.135), ('co', 0.128), ('preference', 0.126), ('cr', 0.121), ('neighbors', 0.12), ('ub', 0.108), ('uk', 0.104), ('ndcg', 0.096), ('pmf', 0.077), ('rated', 0.074), ('scoring', 0.067), ('net', 0.065), ('lambdarank', 0.061), ('lossnm', 0.061), ('tienm', 0.061), ('rank', 0.058), ('ir', 0.054), ('unrated', 0.054), ('ix', 0.052), ('features', 0.051), ('ltering', 0.048), ('neighborhoodbased', 0.046), ('neighbor', 0.038), ('recommend', 0.037), ('preferences', 0.036), ('latent', 0.036), ('trained', 0.035), ('relevance', 0.033), ('learned', 0.032), ('approaches', 0.032), ('extract', 0.031), ('artist', 0.031), ('userbased', 0.031), ('useritem', 0.031), ('ranked', 0.03), ('target', 0.03), ('transfer', 0.029), ('datasets', 0.029), ('million', 0.029), ('win', 0.029), ('retrieval', 0.028), ('tie', 0.027), ('konstan', 0.027), ('excellent', 0.027), ('similarity', 0.026), ('concatenating', 0.026), ('sigir', 0.025), ('truncations', 0.025), ('dataset', 0.024), ('matrices', 0.024), ('optimize', 0.024), ('vl', 0.023), ('subsampling', 0.023), ('dcg', 0.023), ('representations', 0.023), ('beats', 0.022), ('training', 0.022), ('weights', 0.022), ('musical', 0.021), ('external', 0.021), ('disadvantages', 0.02), ('neighborhood', 0.02), ('challenge', 0.02), ('typically', 0.02), ('pairwise', 0.02), ('prone', 0.02), ('brackets', 0.02), ('lot', 0.02), ('domain', 0.02), ('across', 0.019), ('toronto', 0.019), ('overhead', 0.018), ('amazon', 0.018), ('absence', 0.018), ('dence', 0.018), ('movies', 0.018), ('www', 0.018), ('leverages', 0.018), ('squared', 0.017), ('outperforming', 0.017), ('learn', 0.017), ('documents', 0.017), ('cosine', 0.017), ('recommendation', 0.017), ('incorporate', 0.017), ('normalized', 0.017), ('additional', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 75 nips-2012-Collaborative Ranking With 17 Parameters
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
2 0.25746033 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim
Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1
3 0.24840783 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
4 0.19681916 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato
Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1
5 0.18512814 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
Author: Minjie Xu, Jun Zhu, Bo Zhang
Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1
6 0.17315157 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
7 0.15095222 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
8 0.14854427 165 nips-2012-Iterative ranking from pair-wise comparisons
9 0.14752643 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
10 0.1391118 30 nips-2012-Accuracy at the Top
11 0.12296988 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
12 0.12271948 60 nips-2012-Bayesian nonparametric models for ranked data
13 0.12154428 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
14 0.091410227 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
15 0.084798545 148 nips-2012-Hamming Distance Metric Learning
16 0.076891527 59 nips-2012-Bayesian nonparametric models for bipartite graphs
17 0.075720489 194 nips-2012-Learning to Discover Social Circles in Ego Networks
18 0.070024922 286 nips-2012-Random Utility Theory for Social Choice
19 0.066732921 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
20 0.066145875 362 nips-2012-Waveform Driven Plasticity in BiFeO3 Memristive Devices: Model and Implementation
topicId topicWeight
[(0, 0.151), (1, 0.041), (2, -0.037), (3, -0.052), (4, -0.047), (5, -0.133), (6, -0.001), (7, 0.236), (8, 0.067), (9, 0.29), (10, -0.252), (11, 0.12), (12, -0.221), (13, 0.084), (14, 0.107), (15, 0.063), (16, 0.04), (17, 0.063), (18, 0.05), (19, 0.072), (20, 0.042), (21, -0.076), (22, 0.046), (23, -0.078), (24, -0.008), (25, 0.018), (26, 0.051), (27, -0.022), (28, 0.043), (29, 0.024), (30, -0.05), (31, -0.035), (32, 0.011), (33, -0.016), (34, 0.016), (35, 0.004), (36, -0.01), (37, 0.037), (38, 0.01), (39, 0.044), (40, 0.031), (41, 0.06), (42, 0.074), (43, 0.017), (44, 0.019), (45, -0.031), (46, -0.007), (47, 0.001), (48, 0.036), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.96909398 75 nips-2012-Collaborative Ranking With 17 Parameters
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
2 0.79414022 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim
Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1
3 0.77422494 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
4 0.6137374 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
5 0.59465927 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato
Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1
6 0.56257015 30 nips-2012-Accuracy at the Top
7 0.54474002 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
8 0.54323632 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy
9 0.5420872 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
10 0.51661098 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
11 0.51360404 288 nips-2012-Rational inference of relative preferences
12 0.50587863 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
13 0.47579634 60 nips-2012-Bayesian nonparametric models for ranked data
14 0.47163823 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
15 0.44851011 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
16 0.41340977 194 nips-2012-Learning to Discover Social Circles in Ego Networks
17 0.40144587 155 nips-2012-Human memory search as a random walk in a semantic network
18 0.38033307 22 nips-2012-A latent factor model for highly multi-relational data
19 0.35511938 286 nips-2012-Random Utility Theory for Social Choice
20 0.33816922 59 nips-2012-Bayesian nonparametric models for bipartite graphs
topicId topicWeight
[(0, 0.027), (9, 0.06), (21, 0.022), (37, 0.153), (38, 0.174), (39, 0.029), (42, 0.033), (51, 0.029), (53, 0.01), (54, 0.017), (55, 0.038), (74, 0.042), (76, 0.195), (80, 0.058), (92, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.91056657 75 nips-2012-Collaborative Ranking With 17 Parameters
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
2 0.89014709 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
Author: Kei Wakabayashi, Takao Miura
Abstract: Hierarchical Hidden Markov Models (HHMMs) are sophisticated stochastic models that enable us to capture a hierarchical context characterization of sequence data. However, existing HHMM parameter estimation methods require large computations of time complexity O(T N 2D ) at least for model inference, where D is the depth of the hierarchy, N is the number of states in each level, and T is the sequence length. In this paper, we propose a new inference method of HHMMs for which the time complexity is O(T N D+1 ). A key idea of our algorithm is application of the forward-backward algorithm to state activation probabilities. The notion of a state activation, which offers a simple formalization of the hierarchical transition behavior of HHMMs, enables us to conduct model inference efficiently. We present some experiments to demonstrate that our proposed method works more efficiently to estimate HHMM parameters than do some existing methods such as the flattening method and Gibbs sampling method. 1
3 0.86363226 185 nips-2012-Learning about Canonical Views from Internet Image Collections
Author: Elad Mezuman, Yair Weiss
Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1
4 0.8579883 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim
Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1
5 0.84742934 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
Author: Arnak Dalalyan, Yin Chen
Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish finite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1
6 0.84505838 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters
7 0.84473491 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees
8 0.84430218 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
9 0.84368223 304 nips-2012-Selecting Diverse Features via Spectral Regularization
10 0.84339762 264 nips-2012-Optimal kernel choice for large-scale two-sample tests
11 0.84290195 319 nips-2012-Sparse Prediction with the $k$-Support Norm
12 0.8419987 221 nips-2012-Multi-Stage Multi-Task Feature Learning
13 0.84153348 254 nips-2012-On the Sample Complexity of Robust PCA
14 0.84144789 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation
15 0.84129375 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions
16 0.84120774 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions
17 0.84101164 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization
18 0.84078181 142 nips-2012-Generalization Bounds for Domain Adaptation
19 0.83991325 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
20 0.83945405 222 nips-2012-Multi-Task Averaging