nips nips2012 nips2012-246 knowledge-graph by maker-knowledge-mining

246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction


Source: pdf

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. [sent-5, score-0.72]

2 We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. [sent-7, score-0.118]

3 1 Introduction Collaborative prediction is a task of predicting users’ potential preferences on currently unrated items (e. [sent-8, score-0.08]

4 One typical setting formalizes it as a matrix completion problem, i. [sent-11, score-0.078]

5 Among other popular approaches, factor-based models have been used extensively in collaborative prediction. [sent-18, score-0.075]

6 The underlying idea behind such models is that there is only a small number of latent factors influencing the preferences. [sent-19, score-0.147]

7 In a linear factor model, a user’s rating of an item is modeled as a linear combination of these factors, with user-specific coefficients and item-specific factor values. [sent-20, score-0.204]

8 Thus, given a N × M preference matrix for N users and M items, a K-factor model fits it with a N × K coefficient matrix U and a M × K factor matrix V as U V . [sent-21, score-0.467]

9 Various computational methods have been successfully developed to implement such an idea, including probabilistic matrix factorization (PMF) [13, 12] and deterministic reconstruction/approximation error minimization, e. [sent-22, score-0.386]

10 , max-margin matrix factorization (M3 F) with hinge loss [14, 11, 16]. [sent-24, score-0.397]

11 One common problem in latent factor models is how to determine the number of factors, which is unknown a priori. [sent-25, score-0.137]

12 On the other hand, probabilistic matrix factorization models have lend themselves naturally to leverage recent advances in Bayesian nonparametrics to bypass explicit model selection [17, 1]. [sent-29, score-0.473]

13 However, it remains largely unexplored how to borrow such advantages into deterministic max-margin matrix factorization models, particularly the very successful M3 F. [sent-30, score-0.354]

14 To address the above problem, this paper presents infinite probabilistic max-margin matrix factorization (iPM3 F), a nonparametric Bayesian-style M3 F model that utilizes nonparametric Bayesian techniques to automatically resolve the unknown number of latent factors in M3 F models. [sent-31, score-0.834]

15 The first key step towards iPM3 F is a general probabilistic formulation of the standard M3 F, which is based on the maximum entropy discrimination principle [4]. [sent-32, score-0.216]

16 We can then principally extend it to a non1 parametric model, which in theory has an unbounded number of latent factors. [sent-33, score-0.093]

17 To avoid overfitting we impose a sparsity-inducing Indian buffet process prior on the latent coefficient matrix, selecting only an appropriate number of active factors. [sent-34, score-0.243]

18 We develop an efficient variational method to infer posterior distributions and learn parameters (if ever exist) and our extensive empirical results on MovieLens and EachMovie demonstrate appealing performances. [sent-35, score-0.165]

19 2 Max-margin matrix factorization Given a preference matrix Y ∈ RN ×M , which is partially observed and usually sparse, we denote the observed entry indices by I. [sent-38, score-0.391]

20 The task of traditional matrix factorization is to find a low-rank matrix X ∈ RN ×M to approximate Y under some loss measure, e. [sent-39, score-0.43]

21 , the commonly used squared error, and use Xij as the reconstruction of the missing entries Yij wherever ij ∈ I. [sent-41, score-0.09]

22 Max-margin / matrix factorization (M3 F) [14] extends the model by using a sparsity-inducing norm regularizer for a low-norm factorization and adopting hinge loss for the error measure, which is applicable to binary, discrete ordinal, or categorical data. [sent-42, score-0.664]

23 For the binary case where Yij ∈ {±1} and one predicts by Yij = sign(Xij ), the optimization problem of M3 F is defined as min X X ∗ h (Yij Xij ) , +C (1) ij∈I where h(x) = max(0, 1 − x) is the hinge loss and X ∗ is the nuclear norm of X. [sent-43, score-0.16]

24 M3 F can be equivalently reformulated as a semi-definite programming (SDP) and thus learned using standard SDP solvers, but it is unfortunately very slow and can only scale up to thousands of users and items. [sent-44, score-0.189]

25 The fast M3 F model can scale up to millions of users and items. [sent-48, score-0.216]

26 But one unaddressed resulting problem is that it needs to specify the unknown number of latent factors, K, a priori. [sent-49, score-0.093]

27 Below we present a nonparametric Bayesian approach, which effectively bypasses the model selection problem and produces very robust prediction. [sent-50, score-0.105]

28 We also design a blockwise coordinate descent algorithm that directly solves problem (3) rather than working on a smoothing relaxation [11], and it turns out to be as efficient and accurate. [sent-51, score-0.151]

29 3 Nonparametric Bayesian max-margin matrix factorization Now we present the nonparametric Bayesian max-margin matrix factorization models. [sent-53, score-0.731]

30 1 Maximum entropy discrimination We consider the binary classification setting since it suffices for our model. [sent-56, score-0.158]

31 Accordingly, MED takes expectation over the original discriminant function with respect to p(η) and has the new prediction rule y = sign (Ep [F (x; η)]) . [sent-58, score-0.133]

32 In fact, MED subsumes SVM as a special case and has been extended to incorporate latent variables [5, 18] and perform structured output prediction [21]. [sent-61, score-0.162]

33 Recent work has further extended MED to unite Bayesian nonparametrics and max-margin learning [20, 19], which have been largely treated as isolated topics, for learning better classification models. [sent-62, score-0.093]

34 The present work contributes by introducing a novel generalization of MED to handle the challenging matrix factorization problems. [sent-63, score-0.313]

35 2 Probabilistic max-margin matrix factorization Like PMF [12], we treat U and V as random variables, whose joint prior distribution is denoted by p0 (U, V ). [sent-65, score-0.372]

36 Then, our goal is to infer their posterior distribution p(U, V )2 after a set of observations have been provided. [sent-66, score-0.099]

37 (6) Furthermore, since both U and V are random variables, we need to resolve the uncertainty in order to derive a prediction rule. [sent-69, score-0.074]

38 Hence, substituting the discriminant function (6) into (4), we have the prediction rule Yij = sign Ep [Ui Vj ] . [sent-71, score-0.133]

39 (7) Then following the principle of MED learning, we define probabilistic max-margin matrix factorization (PM3 F) as solving the following optimization problem min p(U,V ) KL(p(U, V ) p0 (U, V )) + C h Yij Ep [Ui Vj ] . [sent-72, score-0.419]

40 (8) ij∈I Note that our probabilistic formulation is strictly more general than the original M3 F model, which is in fact a special case of PM3 F under a standard Gaussian prior and a mean-field assumption on p(U, V ). [sent-73, score-0.16]

41 Specifically, if we assume p0 (U, V ) = i N (Ui |0, I) j N (Vj |0, I) and p(U, V ) = p(U )p(V ), then one can prove p(U ) = i N (Ui |Φi , I), p(V ) = j N (Vj |Ψj , I) and PM3 F reduces accordingly to a M3 F problem (3), namely min Φ,Ψ 1 ( Φ 2 2 F + Ψ 2 F) +C h Yij Φi Ψj . [sent-74, score-0.11]

42 (9) ij∈I Ratings: For ordinal ratings Yij ∈ {1, 2, . [sent-75, score-0.278]

43 Specifically, we introduce thresholds θ0 ≤ θ1 ≤ · · · ≤ θL , where θ0 = −∞ and θL = +∞, to discretize R into L intervals. [sent-79, score-0.099]

44 The prediction rule is changed accordingly to Yij = max r|Ep [Ui Vj ] ≥ θr + 1. [sent-80, score-0.151]

45 The loss thus defined is an upper bound to the sum of absolute −1 for r < Yij r where Tij = differences between the predicted ratings and the true ratings, a loss measure closely related to Normalized Mean Absolute Error (NMAE) [7, 14]. [sent-85, score-0.261]

46 Furthermore, we can learn a more flexible model to capture users’ diverse rating criteria by replacing user-common thresholds θr in the prediction rule (10) and the loss (12) with user-specific ones θir . [sent-86, score-0.28]

47 Finally, we may as well treat the additionally introduced thresholds θir as random variables and infer their posterior distribution, hereby giving the full PM3 F model as solving L−1 min p(U,V,θ) 3. [sent-87, score-0.231]

48 (13) Infinite PM3 F (iPM3 F) As we have stated, one common problem with finite factor-based models, including PM3 F, is that we need to explicitly select the number of latent factors, i. [sent-89, score-0.093]

49 In this section, we present an infinite PM3 F model which, through Bayesian nonparametric techniques, automatically adapts and selects the number of latent factors during learning. [sent-92, score-0.283]

50 Without loss of generality, we consider learning a binary3 coefficient matrix Z ∈ {0, 1}N ×∞ . [sent-93, score-0.117]

51 For finite-sized binary matrices, we may define their prior as given by a Beta-Bernoulli process [8]. [sent-94, score-0.129]

52 Similar to the nonparametric matrix factorization model [17], we adopt IBP prior over unbounded binary matrices as previously established in [3] and furthermore, we focus on its stick-breaking construction [15], which facilitates the development of efficient inference algorithms. [sent-96, score-0.552]

53 Then the IBP prior can be described as given by the following generative process i. [sent-98, score-0.086]

54 Specifically, given a finite data set (N < +∞), the probability of seeing the kth factor decreases exponentially with k and the number of active factors K+ follows a Poisson(αHN ), where HN is the N th harmonic number. [sent-112, score-0.098]

55 Alternatively, we can use a Beta process prior over Z as in [9]. [sent-113, score-0.086]

56 As for the counterpart, we place an isotropic Gaussian prior over the item factor matrix V . [sent-114, score-0.286]

57 Prior specified, we may follow the above probabilistic framework to perform max-margin training, with U replaced by Z. [sent-115, score-0.099]

58 For ordinal ratings, we augment the iPM3 F problem from (13) likewise and, apart from adopting the same prior assumptions for ν, Z and V , assume p0 (θ) = p0 (θ|ν, Z, V ) with θir ∼ N (ρr , ς 2 ) i. [sent-138, score-0.186]

59 3 Learning real-valued coefficients can be easily done as in [3] by defining U = Z ◦ W , where W is a real-valued matrix and ◦ denotes the Hadamard product or element-wise product. [sent-148, score-0.078]

60 This approach naturally fits in our MEDbased model thanks to the adoption of the expectation operator when defining prediction rule (7) and (10). [sent-151, score-0.074]

61 And note that exactly the same process applies as well to the full model for ordinal ratings. [sent-162, score-0.122]

62 4 Learning and inference under truncated mean-field assumptions Now, we briefly discuss how to perform learning and inference in iPM3 F. [sent-163, score-0.122]

63 Specifically, we introduce a simple variational inference method to approximate the optimal posterior, which turns out to perform well in practice. [sent-166, score-0.097]

64 We make the following truncated mean-field assumption K p(ν, Z, V ) = p(ν)p(Z)p(V ) = k=1 where K is the truncation level and N p(νk ) · K i=1 k=1 p(Zik ) · p(V ), i. [sent-167, score-0.087]

65 For ordinal ratings, we make an additional mean-field assumption (21) p(ν, Z, V, θ) = p(ν, Z, V )p(θ), where p(ν, Z, V ) is treated exactly the same as for binary data and p(θ) is left in free forms. [sent-188, score-0.138]

66 One noteworthy point is that given p(Z), we may calculate the expectation of the posterior effective dimensionality of the latent factor space as K N Ep [K+ ] = k=1 1− (1 − ψik ) . [sent-189, score-0.224]

67 ): Infer p(V ): The linear discriminant function and the isotropic Gaussian prior on V leads to an M isotropic Gaussian posterior p(V ) = j=1 N (Vj |Λj , σ 2 I) while the M mean vectors Λj can be obtained via solving M independent binary SVMs min Λj 1 Λj 2σ 2 2 +C h i|ij∈I 5 Yij Λj ψi . [sent-191, score-0.36]

68 We update ψi via coordinate descent, with each conditional optimal ψik sought by binary search. [sent-195, score-0.087]

69 Note that as ς → +∞, the Gaussian distribution regresses to a uniform distribution and problem (25) reduces accordingly to the corresponding conditional subproblem for θ in the original M3 F (Appendix B. [sent-200, score-0.117]

70 5 Experiments and discussions We conduct experiments on the MovieLens 1M and EachMovie data sets, and compare our results with fast M3 F [11] and two probabilistic matrix factorization methods, PMF [13] and BPMF [12]. [sent-202, score-0.386]

71 Data sets: The MovieLens data set contains 1,000,209 anonymous ratings (ranging from 1 to 5) of 3,952 movies made by 6,040 users, among which 3,706 movies are actually rated and every user has at least 20 ratings. [sent-203, score-0.472]

72 The EachMovie data set contains 2,811,983 ratings of 1,628 movies made by 72,916 users, among which 1,623 movies are actually rated and 36,656 users has at least 20 ratings. [sent-204, score-0.609]

73 As in [7, 11], we discarded users with fewer than 20 ratings, leaving us with 2,579,985 ratings. [sent-205, score-0.189]

74 Protocol: As in [7, 11], we test our method in a pure collaborative prediction setting, neglecting any external information other than the user-item-rating triplets in the data sets. [sent-214, score-0.118]

75 We adopt as well the all-but-one protocol to partition the data set into training set and test set, that is to randomly withhold one of the observed ratings from each user into test set and use the rest as training set. [sent-215, score-0.345]

76 We partition the users accordingly as in [7, 11], namely 5,000 and 1,040 users for weak and strong respectively in MovieLens, and 30,000 and 6,565 in EachMovie. [sent-219, score-0.558]

77 Moreover, we find that the fully Bayesian formulation of iPM3 F achieves comparable performances in most cases as iPM3 F and that our coordinate descent algorithm for M3 F (M3 F∗ ) performs quite similar to the original gradient descent algorithm for M3 F. [sent-288, score-0.172]

78 In summary, the effect of endowing M3 F models with a probabilistic formulation is intriguing in that not only the performance of the model is largely improved but with the help of Bayesian nonparametric techniques, the effort of selecting the number of latent factors is saved as well. [sent-289, score-0.394]

79 al almost all models perform worse on EachMovie Algorithm weak strong than on MovieLens. [sent-291, score-0.075]

80 0059 a user has rated an item as zero star, he might either BPMF [12] . [sent-301, score-0.155]

81 0067 we should treat such a declaration as less authoritative than a regular rating of zero star and hence omit it from the data set. [sent-318, score-0.098]

82 Again, the coordinate descent M3 F performs comparably with fast M3 F; iPM3 F performs better than all the other methods; And iBPM3 F performs comparably with iPM3 F. [sent-321, score-0.16]

83 5 partition #1 partition #2 partition #3 NMAE time 0. [sent-325, score-0.162]

84 (22), we may calculate the expectation of the effective dimensionality K+ of the latent factor space to roughly have a sense of how the iPM3 F model automatically chooses the latent dimensionality. [sent-344, score-0.296]

85 Since we take α = 3 in the IBP prior (15) and N ∼ 104 , the expected prior dimensionality αHN is about 30. [sent-345, score-0.153]

86 , 60 or 80, the expected posterior dimensionality very quickly saturates, 4 5 Note that M3 F models output discretized ordinal ratings while PMF models output real-valued ratings. [sent-348, score-0.365]

87 After discarding users with less than 20 normal ratings, we are left with 35,281 users and 2,315,060 ratings. [sent-349, score-0.378]

88 7 Table 3: Performance of iPM3 F with and without probabilistic treatment of θ Algorithm w/ prob. [sent-350, score-0.107]

89 (For each truncation level, we rerun our model and perform cross-validation to select the best regularization constant C. [sent-373, score-0.081]

90 Treating thresholds θ: When predicting ordinal ratings, the introduced thresholds θ are very important since they underpin the large-margin principle of max-margin matrix factorization models. [sent-378, score-0.606]

91 Nevertheless without a proper probabilistic treatment, the subproblems on thresholds (25) are not strictly convex, very often giving rise to a section of candidate thresholds that are “equally optimal” for the solution. [sent-379, score-0.271]

92 Under our probabilistic model however, we can easily get rid of this non-strict convexity by introducing for them a Gaussian prior as stated above in section 3. [sent-380, score-0.132]

93 We compare performances of iPM3 F both with and without the probabilistic treatment of θ and as shown in Table 3, the improvement is outstanding. [sent-382, score-0.107]

94 7m 25m 50 4 CPU and about 15h on EachMovie, which are fairBPMF [12] 19m 1h 50 ly acceptable for factorizing a matrix with millions M3 F∗ 4h 10h 50 3 of entries. [sent-387, score-0.105]

95 Furthermore, the blockwise coordinate descent algorithm can naturally be parallelized, since the sub-problems of learning different Ui (or Vj ) are not coupled. [sent-399, score-0.151]

96 6 Conclusions We’ve presented an infinite probabilistic max-margin matrix factorization method, which utilizes the advantages of nonparametric Bayesian techniques to bypass the model selection problem of maxmargin matrix factorization methods. [sent-401, score-0.839]

97 We’ve also developed efficient blockwise coordinate descent algorithms for variational inference and performed extensive evaluation on two large benchmark data sets. [sent-402, score-0.249]

98 Infinite latent feature models and the Indian buffet process. [sent-427, score-0.157]

99 Bayesian matrix factorization with side information and Dirichlet process mixtures. [sent-468, score-0.34]

100 Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. [sent-480, score-0.386]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('yij', 0.344), ('eachmovie', 0.313), ('factorization', 0.235), ('nmae', 0.234), ('vj', 0.222), ('movielens', 0.207), ('zik', 0.191), ('users', 0.189), ('ratings', 0.183), ('ir', 0.155), ('med', 0.147), ('pmf', 0.147), ('ep', 0.142), ('ui', 0.135), ('ik', 0.133), ('nonparametric', 0.105), ('thresholds', 0.099), ('ordinal', 0.095), ('latent', 0.093), ('bpmf', 0.092), ('tij', 0.092), ('movies', 0.091), ('ij', 0.09), ('bayesian', 0.085), ('matrix', 0.078), ('accordingly', 0.077), ('collaborative', 0.075), ('probabilistic', 0.073), ('ibp', 0.07), ('rating', 0.068), ('buffet', 0.064), ('discrimination', 0.063), ('tsinghua', 0.06), ('prior', 0.059), ('discriminant', 0.059), ('indian', 0.058), ('beta', 0.058), ('blockwise', 0.057), ('hn', 0.057), ('isotropic', 0.057), ('truncation', 0.055), ('rated', 0.055), ('defer', 0.055), ('partition', 0.054), ('factors', 0.054), ('nonparametrics', 0.052), ('appendix', 0.052), ('user', 0.052), ('svm', 0.052), ('entropy', 0.052), ('posterior', 0.052), ('kl', 0.051), ('zhu', 0.051), ('descent', 0.05), ('yd', 0.049), ('weak', 0.049), ('item', 0.048), ('infer', 0.047), ('hinge', 0.045), ('coordinate', 0.044), ('margin', 0.044), ('factor', 0.044), ('prediction', 0.043), ('binary', 0.043), ('largely', 0.041), ('subproblem', 0.04), ('loss', 0.039), ('variational', 0.039), ('china', 0.039), ('xij', 0.038), ('preferences', 0.037), ('bernoulli', 0.037), ('bypass', 0.035), ('dimensionality', 0.035), ('coef', 0.035), ('treatment', 0.034), ('comparably', 0.033), ('min', 0.033), ('inference', 0.032), ('truncated', 0.032), ('adopting', 0.032), ('rule', 0.031), ('automatically', 0.031), ('ez', 0.031), ('resolve', 0.031), ('sdp', 0.031), ('rennie', 0.031), ('eld', 0.03), ('aaai', 0.03), ('star', 0.03), ('presents', 0.029), ('formulation', 0.028), ('training', 0.028), ('investigation', 0.027), ('millions', 0.027), ('process', 0.027), ('extensive', 0.027), ('nite', 0.027), ('perform', 0.026), ('closer', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

2 0.18512814 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

3 0.15784934 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

4 0.13335387 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy

Abstract: A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases. 1

5 0.12774211 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

6 0.12238095 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

7 0.12020247 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

8 0.11773591 208 nips-2012-Matrix reconstruction with the local max norm

9 0.11185487 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

10 0.10731132 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

11 0.097220913 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

12 0.088157415 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

13 0.084488787 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

14 0.07888554 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

15 0.078746051 125 nips-2012-Factoring nonnegative matrices with linear programs

16 0.077548876 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

17 0.076453172 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability

18 0.076302074 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

19 0.075290933 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

20 0.07524994 69 nips-2012-Clustering Sparse Graphs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.224), (1, 0.059), (2, 0.019), (3, -0.05), (4, -0.102), (5, -0.067), (6, -0.005), (7, 0.11), (8, 0.058), (9, 0.055), (10, -0.119), (11, 0.116), (12, -0.119), (13, 0.044), (14, 0.102), (15, 0.066), (16, 0.039), (17, 0.032), (18, 0.007), (19, 0.023), (20, -0.032), (21, -0.041), (22, 0.03), (23, -0.078), (24, -0.128), (25, -0.018), (26, 0.016), (27, -0.025), (28, 0.052), (29, 0.059), (30, -0.11), (31, 0.1), (32, 0.006), (33, 0.067), (34, -0.036), (35, -0.053), (36, 0.001), (37, 0.024), (38, 0.003), (39, 0.026), (40, 0.035), (41, -0.001), (42, 0.15), (43, -0.092), (44, 0.111), (45, 0.031), (46, -0.069), (47, 0.047), (48, 0.063), (49, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9266957 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

2 0.76185632 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim

Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1

3 0.69809932 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

4 0.64640987 125 nips-2012-Factoring nonnegative matrices with linear programs

Author: Ben Recht, Christopher Re, Joel Tropp, Victor Bittorf

Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes. 1

5 0.64261448 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato

Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1

6 0.62863761 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

7 0.62750131 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

8 0.60634553 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

9 0.55768102 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

10 0.54661357 22 nips-2012-A latent factor model for highly multi-relational data

11 0.54605591 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

12 0.54444039 192 nips-2012-Learning the Dependency Structure of Latent Factors

13 0.54375529 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

14 0.5121268 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

15 0.4990195 154 nips-2012-How They Vote: Issue-Adjusted Models of Legislative Behavior

16 0.49532691 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

17 0.4885366 59 nips-2012-Bayesian nonparametric models for bipartite graphs

18 0.48305222 30 nips-2012-Accuracy at the Top

19 0.48103476 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

20 0.46394441 128 nips-2012-Fast Resampling Weighted v-Statistics


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.042), (9, 0.215), (21, 0.025), (38, 0.177), (39, 0.042), (42, 0.049), (54, 0.051), (55, 0.025), (74, 0.051), (76, 0.115), (80, 0.095), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86117041 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

Author: Lloyd Elliott, Yee W. Teh

Abstract: We present a Bayesian nonparametric model for genetic sequence data in which a set of genetic sequences is modelled using a Markov model of partitions. The partitions at consecutive locations in the genome are related by the splitting and merging of their clusters. Our model can be thought of as a discrete analogue of the continuous fragmentation-coagulation process [Teh et al 2011], preserving the important properties of projectivity, exchangeability and reversibility, while being more scalable. We apply this model to the problem of genotype imputation, showing improved computational efficiency while maintaining accuracies comparable to other state-of-the-art genotype imputation methods. 1

2 0.84661591 156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

same-paper 3 0.8465088 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Author: Minjie Xu, Jun Zhu, Bo Zhang

Abstract: We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example that integrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empirical studies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages. 1

4 0.83588481 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

Author: Abner Guzmán-rivera, Dhruv Batra, Pushmeet Kohli

Abstract: We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first learn a single-output model and then produce M -Best Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem. We present a max-margin formulation that minimizes an upper-bound on this lossfunction. Experimental results on image segmentation and protein side-chain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy. 1

5 0.79810077 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

Author: Argyris Kalogeratos, Aristidis Likas

Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.

6 0.74189281 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

7 0.73851925 75 nips-2012-Collaborative Ranking With 17 Parameters

8 0.73721761 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

9 0.73572803 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

10 0.73415303 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.73386782 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

12 0.73294163 358 nips-2012-Value Pursuit Iteration

13 0.73278636 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

14 0.73246819 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

15 0.73234421 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

16 0.73194391 227 nips-2012-Multiclass Learning with Simplex Coding

17 0.73189396 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

18 0.7312181 292 nips-2012-Regularized Off-Policy TD-Learning

19 0.73065728 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

20 0.73053306 187 nips-2012-Learning curves for multi-task Gaussian process regression