nips nips2012 nips2012-49 knowledge-graph by maker-knowledge-mining

49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering


Source: pdf

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu a Abstract Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. [sent-5, score-0.238]

2 This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. [sent-7, score-0.749]

3 The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. [sent-8, score-0.68]

4 1 Introduction Recent approaches to collaborative filtering (CF) have concentrated on estimating an algebraic or statistical model, and using the model for predicting the missing rating of user u on item i. [sent-9, score-0.839]

5 Linear combinations of K models K F (K) (x) = αk fk (x) (1) k=1 where α1 , . [sent-12, score-0.184]

6 , fK ∈ F, such as boosting or stagewise linear regression and stagewise logistic regression, enjoy a significant performance boost over the single top-performing model. [sent-18, score-0.963]

7 Stagewise models are greedy incremental models of the form (αk , fk ) = arg min Risk(F (k−1) + αk fk ), k = 1, . [sent-20, score-0.294]

8 It is somewhat surprising that ensemble methods have had relatively little success in the collaborative filtering literature. [sent-25, score-0.254]

9 Generally speaking, ensemble or combination methods have shown only a minor improvement over the top-performing CF methods. [sent-26, score-0.161]

10 The cases where ensemble methods did show an improvement (for example the Netflix prize winner [10] and runner up), relied heavily on manual feature engineering, manual parameter setting, and other tinkering. [sent-27, score-0.306]

11 This paper follows up on an experimental discovery: different recommendation systems perform better than others for some users and items but not for others. [sent-28, score-0.274]

12 In other words, the relative strengths of two distinct CF models f1 (u, i), f2 (u, i) ∈ F depend on the user u and the item i whose rating 1 0. [sent-29, score-0.653]

13 70 40 70 100 130 160 Number of available ratings for target item 190 Figure 1: Test set loss (mean absolute error) of two simple algorithms (user average and item average) on items with different number of ratings. [sent-35, score-0.758]

14 One example of two such systems appears in Figure 1 that graphs the test-set loss of two recommendation rules (user average and item average) as a function of the number of available ratings for the recommended item i. [sent-37, score-0.822]

15 The two recommendation rules outperform each other, depending on whether the item in question has few or many ratings in the training data. [sent-38, score-0.507]

16 The inescapable conclusion is that the weights αk in the combination should be functions of u and i rather than constants K F (K) (u, i) = αk (u, i)fk (u, i) (3) k=1 where αk (u, i) ∈ R and fk ∈ F for k = 1, . [sent-40, score-0.284]

17 In this paper we explore the use of such models for collaborative filtering, where the weight functions αk (u, i) are learned from. [sent-44, score-0.235]

18 A major part of our contribution is a feature induction strategy to identify feature functions expressing useful locality information. [sent-45, score-0.284]

19 2 Related Work Many memory-based CF methods predict the rating of items based on the similarity of the test user and the training users [21, 3, 6]. [sent-47, score-0.477]

20 Model-based CF includes user and item clustering [3, 29, 32], Bayesian networks [3], dependence network [5] and probabilistic latent variable models [19, 17, 33]. [sent-50, score-0.552]

21 In many cases, there is significant manual intervention such as setting the combination weights manually. [sent-58, score-0.193]

22 The primary difference is the manual selection of features in [26] as opposed to automatic induction of local features in our paper that leads to a significant improvement in prediction quality. [sent-60, score-0.4]

23 Model combination based on locality has been proposed in other machine learning topics, such as classification [31, 18] or sensitivity estimation [2]. [sent-61, score-0.133]

24 2 3 Combination of CF Methods with Non-Constant Weights Recalling the linear combination (3) from Section 1, we define non-constant combination weights αk (u, i) that are functions of the user and item that are being predicted. [sent-62, score-0.744]

25 We propose the following algebraic form αk (u, i) = βk hk (u, i), βk ∈ R, hk ∈ H (4) where βk is a parameter and hk is a function selected from a family H of candidate feature functions. [sent-63, score-0.567]

26 The combination (3) with non-constant weights (4) enables some CF methods fk to be emphasized for some user-item combinations through an appropriate selection of the βk parameters. [sent-64, score-0.324]

27 Substituting (4) into (3) we get K F (K) (u, i) = βk hk (u, i) fk (u, i), βk ∈ R, hk ∈ H, fk ∈ F. [sent-66, score-0.544]

28 (5) k=1 Note that since hk and fk are selected from the sets of CF methods and feature functions respectively, we may have fj = fl or hj = hl for j = l. [sent-67, score-0.365]

29 This is similar to boosting and other stagewise algorithms where one feature or base learner may be chosen multiple times, effectively updating its associate feature functions and parameters. [sent-68, score-0.598]

30 The total weight function associated with a particular f ∈ F is k:fk =f βk hk (u, i). [sent-69, score-0.151]

31 , βK ) is least squares β ∗ = arg min β∈C F (K) (u, i) − Ru,i 2 , (6) u,i where Ru,i denotes the rating of user u on item i in the training data and the summation ranges over all ratings in the training set. [sent-73, score-0.691]

32 A variation of (6), where β is constrained such that αk (u, i) ≥ 0, and K k=1 αk (u, i) = 1 endows F with the following probabilistic interpretation F (u, i) = Ep {f | u, i} , (7) where f represents a random draw from F, with probabilities p(f |u, i) proportional to k:fk =f βk hk (u, i). [sent-74, score-0.125]

33 4 Inducing Local Features In contrast to [26] that manually defined 25 features, we induce the features hk from data. [sent-76, score-0.244]

34 The features hk (u, i) should emphasize users u and items i that are likely to lead to variations in the relative strength of the f1 , . [sent-77, score-0.372]

35 We consider below two issues: (i) defining the set H of candidate features, and (ii) a strategy for selecting features from H to add to the combination F . [sent-81, score-0.217]

36 1 Candidate Feature Families H We denote the sets of users and items by U and I respectively, and the domain of f ∈ F and h ∈ H as Ω = U × I. [sent-83, score-0.148]

37 There are several possible choices for the distance functions in (8) between users and between items. [sent-89, score-0.103]

38 For simplicity, we use in our experiments the angular distance x, y x · y d(x, y) = arc cos (9) where the inner products above are computed based on the user-item rating matrix expressing the training set (ignoring entries not present in both arguments). [sent-90, score-0.13]

39 Their values decay linearly with the distance from their mode (truncated at zero), and feature a bandwidth parameter h, controlling the rate of decay. [sent-92, score-0.103]

40 The unimodal feature functions (8) capture locality in the Ω space by measuring proximity to a mode, representing a user u∗ , an item i∗ , or a user-item pair. [sent-94, score-0.725]

41 We define the family of candidate features H as all possible additive mixtures or max-mixtures of the functions (8), parameterized by ∗ ∗ a set of multiple modes ω ∗ = {ω1 , . [sent-95, score-0.183]

42 ,r Using this definition, features functions hk (u, i) ∈ H are able to express a wide variety of locality information involving multiple potential modes. [sent-102, score-0.301]

43 We discuss next the strategy for identifying useful features from H and adding them to the model F in a stagewise manner. [sent-103, score-0.522]

44 2 Feature Induction Strategy Adapting the stagewise learning approach to the model (5) we have K F (K) (u, i) = βk hk (u, i) fk (u, i), (12) k=1 (βk , hk , fk ) = F (k−1) (u, i) + βk hk (u, i)fk (u, i) − Ru,i arg min βk ∈R,hk ∈H,fk ∈F 2 . [sent-105, score-1.113]

45 (u,i)∈R It is a well-known fact that stagewise algorithms sometimes outperform non-greedy algorithms due to resistance to overfitting (see [22], for example). [sent-106, score-0.471]

46 We address this difficulty by restricting H to a finite collection of additive or max-mixtures kernels with r modes, randomly sampled from the users or items present in the training data. [sent-113, score-0.169]

47 Our experiments conclude that it is possible to find useful features from a surprisingly small number of randomly-chosen samples. [sent-114, score-0.099]

48 CONST combines all candidate algorithms with constant weights as in (1). [sent-122, score-0.127]

49 FWLS combines all candidate algorithms with non-constant weights as in (3) [26]. [sent-123, score-0.127]

50 FEAT applies the feature induction techniques discussed in Section 4. [sent-126, score-0.138]

51 To evaluate whether the automatic feature induction in FEAT works better or worse than manually constructed features, we used in FWLS and STAGE manual features similar to the ones in [26] (excluding features requiring temporal data). [sent-127, score-0.426]

52 Examples include number of movies rated per user, number of users rating each movie, standard deviation of the users’ ratings, and standard deviation of the item’s ratings. [sent-128, score-0.183]

53 The feature induction in FEAT used a feature space H with additive multi-mode smoothing kernels as described in Section 4 (for simplicity we avoided kernels unimodal in both u and i). [sent-129, score-0.258]

54 The family H included 200 randomly sampled features (a new sample was taken for each of the iterations in the stagewise algorithms). [sent-130, score-0.547]

55 The r in (11) was set to 5% of user or item count, and bandwidth h values of 0. [sent-131, score-0.56]

56 The stagewise algorithm continues until either five consecutive trials fail to improve the RMSE on validation set, or the iteration number reaches 100, which occur only in a few cases. [sent-134, score-0.444]

57 We used similar L2 regularization for all methods (both stagewise and non-stagewise), where the regularization parameter was selected among 5 different values based on a validation set. [sent-135, score-0.469]

58 More specifically, we sub-sampled from the most active M users and the most often rated N items to obtain pre-specified data density levels |R|/|Ω|. [sent-138, score-0.21]

59 As shown in Table 2, we varied either the user or item count in the set {1000, 1500, 2000, 2500, 3000}, holding the other variable fixed at 1000 and the density at 1%, which is comparable density of the original Netflix dataset. [sent-139, score-0.732]

60 5% Density Figure 2: Performance trend with varied user count (left), item count (middle), and density (right) on Netflix dataset. [sent-340, score-0.821]

61 For stagewise methods, the 80% train set was divided to 60% training set and 20% validation set, used to determine when to stop the stagewise addition process. [sent-341, score-0.909]

62 The 10% of training set is used to select regularization parameter for both stagewise and non-stagewise. [sent-343, score-0.465]

63 • STAGE FWLS: Figure 2 indicates that stagewise combinations where features are chosen with replacement are more accurate. [sent-351, score-0.577]

64 The selection with replacement allow certain features to be selected more than once, correcting a previous inaccurate parameter setting. [sent-352, score-0.141]

65 • FEAT STAGE: Making use of induced features improves prediction accuracy further from stagewise optimization with manually-designed features. [sent-353, score-0.573]

66 Overall, our experiments indicate that the combination with non-constant weights and feature induction (FEAT) outperforms three baselines (the best single method, standard combinations with constant weight, and the FWLS method using manually constructed features [26]). [sent-354, score-0.456]

67 4 0 200 400 600 800 item (sorted by algorithm 1) 1000 0. [sent-372, score-0.312]

68 7 0 200 400 600 800 item (sorted by algorithm 2) 1000 0. [sent-373, score-0.312]

69 4 0 200 400 600 800 item (sorted by algorithm 8) 0. [sent-375, score-0.312]

70 We conclude that our proposed combination outperforms state-of-the-art methods, and several previously proposed combination methods. [sent-402, score-0.173]

71 To see how feature induction works in detail, we illustrate an example with a case where the user count and item count equals 1000. [sent-403, score-0.881]

72 Figure 3 shows the average weight distribution that each user or item receives under three CF methods: user average, item average, and NLPMF. [sent-404, score-1.112]

73 We focused on these three methods since they are frequently selected by the stagewise algorithm. [sent-405, score-0.469]

74 The x axis variables in the three panels are sorted in the order of decreasing weights of selected algorithm. [sent-406, score-0.155]

75 2 Trend Analysis Figure 2 graphs the RMSE of the different combination methods as a function of the user count, item count, and density. [sent-412, score-0.607]

76 • As expected, prediction accuracy for all combination methods and for the top single method improves with the user count, item count, and density. [sent-414, score-0.648]

77 • The performance gap between the best single algorithm and the combinations tends to decrease with larger user and item count. [sent-415, score-0.586]

78 This is a manifestation of the law of diminishing returns, and the fact that the size of a suitable family H capturing locality information increases with the user and item count. [sent-416, score-0.613]

79 Thus, the stagewise procedure becomes more challenging computationally, and less accurate since in our experiment we sampled same number of compositions from H, rather than increasing it for larger data. [sent-417, score-0.444]

80 • We note that all combination methods and the single best CF method improve performance as the density increases. [sent-418, score-0.129]

81 7 • Comparing the left and middle panels of Figure 2 implies that having more users is more informative than having more items. [sent-420, score-0.124]

82 3 Scalability Our proposed stagewise algorithm is very efficient, when compared to other feature selection algorithms such as step-wise or subset selection. [sent-423, score-0.515]

83 In our experiments, we sampled from the space of candidate features a small subset of features that was considered for addition (the random subset is different in each iteration of the stagewise algorithm). [sent-425, score-0.663]

84 In the limit K → ∞, such a sampling scheme would recover the optimal ensemble as each feature will be selected for consideration infinitely often. [sent-426, score-0.138]

85 Our experiments conclude that this scheme works well also in practice and results in significant improvement to the state-of-the-art even for a relatively small sample of feature candidates such as 200. [sent-427, score-0.141]

86 Viewed from another perspective, this implies that randomly selecting such a small subset of features each iteration ensures the selection of useful features. [sent-428, score-0.098]

87 In fact, the features induced in this manner were found to be more useful than the manually crafted features in the FWLS algorithm [26]. [sent-429, score-0.225]

88 7 Summary We started from an observation that the relative performance of different candidate recommendation systems f (u, i) depends on u and i, for example on the activity level of user u and popularity of item i. [sent-430, score-0.741]

89 This motivated the development of combination of recommendation systems with non-constant weights that emphasize different candidates based on their relative strengths in the feature space. [sent-431, score-0.395]

90 In contrast to the FWLS method that focused on manual construction of features, we developed a feature induction algorithm that works in conjunction with stagewise least-squares. [sent-432, score-0.655]

91 We formulate a family of feature function, based on the discrete analog of triangular kernel smoothing. [sent-433, score-0.103]

92 This family captures a wide variety of local information and is thus able to model the relative strengths of the different CF methods and how they change across Ω. [sent-434, score-0.101]

93 The combination with induced features outperformed any of the base candidates as well as other combination methods in literature. [sent-435, score-0.304]

94 This includes the recently proposed FWLS method that uses manually constructed feature function. [sent-436, score-0.113]

95 As our candidates included many of the recently proposed state-of-the-art recommendation systems our conclusions are significant for the engineering community as well as recommendation system scientists. [sent-437, score-0.298]

96 Dependency networks for inference, collaborative filtering, and data visualization. [sent-467, score-0.192]

97 Factorization meets the neighborhood: a multifaceted collaborative filtering model. [sent-488, score-0.192]

98 Factor in the neighbors: Scalable and accurate collaborative filtering. [sent-493, score-0.192]

99 Grouplens: an open architecture for collaborative filtering of netnews. [sent-570, score-0.192]

100 Hybrid collaborative filtering algorithms using a mixture of experts. [sent-613, score-0.215]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stagewise', 0.444), ('alg', 0.348), ('cf', 0.332), ('item', 0.312), ('fwls', 0.252), ('feat', 0.232), ('user', 0.219), ('collaborative', 0.192), ('fk', 0.147), ('recommendation', 0.126), ('hk', 0.125), ('count', 0.106), ('ix', 0.087), ('induction', 0.087), ('users', 0.086), ('ltering', 0.085), ('const', 0.085), ('rmse', 0.084), ('features', 0.078), ('combination', 0.076), ('manual', 0.073), ('rating', 0.07), ('unimodal', 0.069), ('kh', 0.068), ('net', 0.066), ('sorted', 0.065), ('candidate', 0.063), ('items', 0.062), ('ensemble', 0.062), ('pmf', 0.061), ('nlpmf', 0.058), ('locality', 0.057), ('feature', 0.051), ('stage', 0.049), ('ratings', 0.048), ('candidates', 0.046), ('weights', 0.044), ('manually', 0.041), ('factorization', 0.039), ('prea', 0.039), ('sef', 0.039), ('combinations', 0.037), ('density', 0.035), ('boosting', 0.035), ('recommender', 0.034), ('konstan', 0.034), ('koren', 0.034), ('sorting', 0.033), ('sigir', 0.032), ('strengths', 0.031), ('heckerman', 0.03), ('lebanon', 0.03), ('bandwidth', 0.029), ('algebraic', 0.028), ('induced', 0.028), ('acm', 0.028), ('resistance', 0.027), ('rated', 0.027), ('triangular', 0.027), ('weight', 0.026), ('nmf', 0.026), ('sun', 0.026), ('family', 0.025), ('varied', 0.025), ('selected', 0.025), ('conference', 0.024), ('sigkdd', 0.024), ('baselines', 0.024), ('average', 0.024), ('wide', 0.024), ('movielens', 0.024), ('winner', 0.024), ('bell', 0.024), ('mode', 0.023), ('mixture', 0.023), ('prediction', 0.023), ('ensembles', 0.023), ('improvement', 0.023), ('boost', 0.022), ('margin', 0.022), ('conclude', 0.021), ('panels', 0.021), ('expressing', 0.021), ('includes', 0.021), ('relative', 0.021), ('training', 0.021), ('centered', 0.02), ('selection', 0.02), ('combines', 0.02), ('cant', 0.019), ('similarity', 0.019), ('replacement', 0.018), ('single', 0.018), ('matrix', 0.018), ('concentrated', 0.018), ('automatic', 0.018), ('intelligence', 0.018), ('trend', 0.018), ('functions', 0.017), ('middle', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

2 0.25746033 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

3 0.24619544 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

4 0.17551726 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

5 0.12774211 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

6 0.1239037 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

7 0.099233314 165 nips-2012-Iterative ranking from pair-wise comparisons

8 0.095938839 60 nips-2012-Bayesian nonparametric models for ranked data

9 0.071921282 194 nips-2012-Learning to Discover Social Circles in Ego Networks

10 0.070849486 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

11 0.064691529 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

12 0.063196152 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

13 0.061549656 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

14 0.061244309 208 nips-2012-Matrix reconstruction with the local max norm

15 0.057862949 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

16 0.056198306 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex

17 0.055824 285 nips-2012-Query Complexity of Derivative-Free Optimization

18 0.054020155 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

19 0.052749157 55 nips-2012-Bayesian Warped Gaussian Processes

20 0.05128748 181 nips-2012-Learning Multiple Tasks using Shared Hypotheses


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.147), (1, 0.039), (2, -0.027), (3, -0.044), (4, -0.028), (5, -0.09), (6, 0.002), (7, 0.152), (8, 0.056), (9, 0.169), (10, -0.197), (11, 0.081), (12, -0.157), (13, 0.03), (14, 0.09), (15, 0.006), (16, 0.004), (17, 0.084), (18, -0.005), (19, 0.019), (20, 0.002), (21, -0.086), (22, 0.084), (23, -0.145), (24, -0.05), (25, -0.015), (26, 0.039), (27, -0.076), (28, 0.108), (29, 0.01), (30, -0.082), (31, 0.1), (32, -0.049), (33, -0.037), (34, -0.018), (35, 0.025), (36, -0.009), (37, 0.113), (38, -0.029), (39, 0.024), (40, 0.083), (41, 0.041), (42, 0.028), (43, -0.016), (44, 0.047), (45, -0.003), (46, -0.001), (47, 0.06), (48, 0.076), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93743545 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

2 0.84340495 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

3 0.84142959 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

4 0.6411134 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

5 0.62628651 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

Author: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt

Abstract: An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem. 1

6 0.60842746 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

7 0.55377322 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

8 0.54013711 30 nips-2012-Accuracy at the Top

9 0.53569198 165 nips-2012-Iterative ranking from pair-wise comparisons

10 0.46496326 194 nips-2012-Learning to Discover Social Circles in Ego Networks

11 0.44667789 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

12 0.44643933 155 nips-2012-Human memory search as a random walk in a semantic network

13 0.41409108 60 nips-2012-Bayesian nonparametric models for ranked data

14 0.40177834 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

15 0.38974515 288 nips-2012-Rational inference of relative preferences

16 0.37702385 125 nips-2012-Factoring nonnegative matrices with linear programs

17 0.31692389 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

18 0.30751294 59 nips-2012-Bayesian nonparametric models for bipartite graphs

19 0.29082349 146 nips-2012-Graphical Gaussian Vector for Image Categorization

20 0.28290471 279 nips-2012-Projection Retrieval for Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.029), (9, 0.029), (21, 0.039), (38, 0.188), (39, 0.013), (42, 0.037), (51, 0.237), (53, 0.011), (54, 0.019), (55, 0.025), (74, 0.054), (76, 0.136), (80, 0.06), (92, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81374335 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

2 0.81266254 36 nips-2012-Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

Author: Alexandra Carpentier, Rémi Munos

Abstract: We consider the problem of adaptive stratified sampling for Monte Carlo integration of a differentiable function given a finite number of evaluations to the function. We construct a sampling scheme that samples more often in regions where the function oscillates more, while allocating the samples such that they are well spread on the domain (this notion shares similitude with low discrepancy). We prove that the estimate returned by the algorithm is almost similarly accurate as the estimate that an optimal oracle strategy (that would know the variations of the function everywhere) would return, and provide a finite-sample analysis. 1

3 0.77208358 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

Author: Trung Nguyen, Tomi Silander, Tze Y. Leong

Abstract: We study how to automatically select and adapt multiple abstractions or representations of the world to support model-based reinforcement learning. We address the challenges of transfer learning in heterogeneous environments with varying tasks. We present an efficient, online framework that, through a sequence of tasks, learns a set of relevant representations to be used in future tasks. Without predefined mapping strategies, we introduce a general approach to support transfer learning across different state spaces. We demonstrate the potential impact of our system through improved jumpstart and faster convergence to near optimum policy in two benchmark domains. 1

4 0.72161084 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

5 0.71641934 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

6 0.7162329 187 nips-2012-Learning curves for multi-task Gaussian process regression

7 0.71593249 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

8 0.71532291 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

9 0.71276295 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

10 0.71241438 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

11 0.71236354 304 nips-2012-Selecting Diverse Features via Spectral Regularization

12 0.71201944 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

13 0.71178174 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

14 0.71161652 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

15 0.71161437 86 nips-2012-Convex Multi-view Subspace Learning

16 0.71129775 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

17 0.71090418 125 nips-2012-Factoring nonnegative matrices with linear programs

18 0.70994902 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

19 0.70960158 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

20 0.70958751 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification