nips nips2012 nips2012-30 knowledge-graph by maker-knowledge-mining

30 nips-2012-Accuracy at the Top


Source: pdf

Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic

Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. [sent-6, score-0.679]

2 We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. [sent-7, score-0.485]

3 We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. [sent-8, score-0.576]

4 Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. [sent-9, score-0.606]

5 In most examples, our solution achieves a better performance in precision at the top. [sent-10, score-0.249]

6 1 Introduction The accuracy of the items placed near the top is crucial for many information retrieval systems such as search engines or recommendation systems, since most users of these systems browse or consider only the first k items. [sent-11, score-0.813]

7 Different criteria have been introduced in the past to measure this quality, including the precision at k (Precision@k), the normalized discounted cumulative gain (NDCG) and other variants of DCG, or the mean reciprocal rank (MRR) when the rank of the most relevant document is critical. [sent-12, score-0.501]

8 A somewhat different but also related criterion adopted by [1] is based on the position of the top irrelevant item. [sent-13, score-0.486]

9 Several machine learning algorithms have been recently designed to optimize these criteria and other related ones [6, 12, 11, 21, 7, 14, 13]. [sent-14, score-0.211]

10 A general algorithm inspired by the structured prediction technique SVMStruct [22] was incorporated in an algorithm by [15] which can be used to optimize a convex upper bound on the number of errors among the top k items. [sent-15, score-0.544]

11 The algorithm seeks to solve a convex problem with exponentially many constraints via several rounds of optimization with a smaller number of constraints, augmenting the set of constraints at each round with the most violating one. [sent-16, score-0.516]

12 Another algorithm, also based on structured prediction ideas, is proposed in an unpublished manuscript of [19] and covers several criteria, including Precision@k and NDCG. [sent-17, score-0.241]

13 A regression-based solution is suggested by [10] for DCG in the case of large sample sizes. [sent-18, score-0.051]

14 Some other methods have also been proposed to optimize a smooth version of a non-convex cost function in this context [8]. [sent-19, score-0.074]

15 [1] discusses an optimization solution for an algorithm seeking to minimize the position of the top irrelevant item. [sent-20, score-0.673]

16 1 However, one obvious shortcoming of all these algorithms is that the notion of top k does not generalize to new data. [sent-21, score-0.443]

17 In fact, no generalization guarantee is available for such precision@k optimization or algorithm. [sent-23, score-0.08]

18 A more principled approach in all the applications already mentioned consists of designing algorithms that optimize accuracy in some top fraction of the scores returned by a real-valued hypothesis. [sent-24, score-0.707]

19 The desired objective is to learn a scoring function that is as accurate as possible for the items whose scores are above the top ⌧ -quantile. [sent-26, score-0.855]

20 To be more specific, when applied to a set of size n, the number of top items is k = ⌧ n for a ⌧ -quantile, while for a different set of size n0 6= n, this would correspond to k 0 = ⌧ n0 6= k. [sent-27, score-0.551]

21 The implementation of the Precision@k algorithm in [15] indirectly acknowledges the problem that the notion of top k does not generalize since the command-line flag requires k to be specified as a fraction of the positive samples. [sent-28, score-0.562]

22 Nevertheless, the formulation of the problem as well as the solution are still in terms of the top k items of the training set. [sent-29, score-0.641]

23 A study of various statistical questions related to the problem of accuracy at the top is discussed by [9]. [sent-30, score-0.368]

24 The authors also present generalization bounds for the specific case of empirical risk minimization (ERM) under some assumptions about the hypothesis set and the distribution. [sent-31, score-0.136]

25 But, to our knowledge, no previous publication has given general learning guarantees for the problem of accuracy in the top quantile scoring items or carefully addressed the corresponding algorithmic problem. [sent-32, score-1.149]

26 We discuss the formulation of this problem (Section 3. [sent-33, score-0.089]

27 1) and define an algorithm optimizing a convex surrogate of the corresponding loss in the case of linear scoring functions. [sent-34, score-0.38]

28 We discuss the solution of this problem in terms of several simple convex optimization problems and show that these problems can be extended to the case where positive semi-definite kernels are used (Section 3. [sent-35, score-0.296]

29 In Section 4, we present a Rademacher complexity analysis of the problem and give margin-based guarantees for our algorithm based on the ⌧ -quantile value of the functions in the hypothesis set. [sent-37, score-0.139]

30 In Section 5, we also report the results of several experiments evaluating the performance of our algorithm. [sent-38, score-0.089]

31 In a comparison in a bipartite setting with several algorithms seeking high precision at the top, our algorithm achieves a better performance in precision at the top. [sent-39, score-0.664]

32 We start with a presentation of notions and notation useful for the discussion in the following sections. [sent-40, score-0.071]

33 2 Preliminaries Let X denote the input space and D a distribution over X ⇥ X . [sent-41, score-0.035]

34 We interpret the presence of a pair (x, x0 ) in the support of D as the preference of x0 over x. [sent-42, score-0.035]

35 according to D m 1 b and denote by D the corresponding empirical distribution. [sent-49, score-0.035]

36 D induces a marginal distribution over X that we denote by D0 , which in the discrete case can be defined via 1 X D0 (x) = D(x, x0 ) + D(x0 , x) . [sent-50, score-0.035]

37 2 0 x 2X b We also denote by D0 the empirical distribution associated to D0 based on the sample S. [sent-51, score-0.035]

38 The learning problems we are studying are defined in terms of the top ⌧ -quantile of the values taken by a function h : X ! [sent-52, score-0.317]

39 In general, q is not unique and this equality may hold for all q in an interval [qmin , qmax ]. [sent-54, score-0.307]

40 We will be particularly interested in the properties of the set of points x whose scores are above a quantile, that is sq = {x : h(x) > q}. [sent-55, score-0.269]

41 Since for any (q, q 0 ) 2 [qmin , qmax ]2 , sq and sq0 differ only by a set of measure zero, the particular choice of q in that interval has no significant consequence. [sent-56, score-0.383]

42 Thus, in what follows, when it is not unique, we will choose the quantile value to be the maximum, qmax . [sent-57, score-0.447]

43 For any ⌧ 2 [0, 1], let ⇢⌧ denote the function defined by 8u 2 R, ⇢⌧ (u) = ⌧ (u) + (1 ⌧ )(u)+ , where (u)+ = max(u, 0) and (u) = min(u, 0) (see Figure 1(b)). [sent-58, score-0.035]

44 ⇢⌧ is convex as a sum of two convex functions since u 7! [sent-59, score-0.234]

45 We will denote by argMinu f (u) the largest minimizer of function f . [sent-62, score-0.035]

46 ρτ top τ fraction of scores u 0 page 5 Mehryar Mohri - Courant & Google (a) (b) Figure 1: (a) Illustration of the ⌧ -quantile. [sent-67, score-0.569]

47 , un ) 2 Rn can be b given by q P argMinu2R F⌧ (u), where F⌧ is the convex function defined for all u 2 R by b = n 1 F⌧ (u) = n i=1 ⇢⌧ (ui u). [sent-73, score-0.203]

48 1 Accuracy at the top (AATP) Problem formulation and algorithm The learning problem we consider is that of accuracy at the top (AATP) which consists of achieving an ordering of all items so that items whose scores are among the top ⌧ -quantile are as relevant as possible. [sent-75, score-1.718]

49 Ideally, all preferred items are ranked above the quantile and non-preferred ones ranked below. [sent-76, score-0.866]

50 Thus, the loss or generalization error of a hypothesis h : X ! [sent-77, score-0.136]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('top', 0.281), ('items', 0.27), ('quantile', 0.225), ('qh', 0.222), ('qmax', 0.222), ('courant', 0.205), ('mehryar', 0.205), ('mohri', 0.199), ('precision', 0.198), ('aatp', 0.168), ('google', 0.161), ('scores', 0.156), ('scoring', 0.148), ('corinna', 0.148), ('qmin', 0.148), ('seeking', 0.142), ('dcg', 0.129), ('ranked', 0.124), ('ninth', 0.122), ('convex', 0.117), ('sq', 0.113), ('avenue', 0.091), ('hypothesis', 0.091), ('accuracy', 0.087), ('un', 0.086), ('bipartite', 0.083), ('ny', 0.082), ('criteria', 0.082), ('optimize', 0.074), ('irrelevant', 0.074), ('surrogate', 0.074), ('ana', 0.074), ('svmstruct', 0.074), ('fraction', 0.072), ('boyd', 0.07), ('stanford', 0.07), ('preferred', 0.068), ('mrr', 0.068), ('street', 0.064), ('unpublished', 0.064), ('violating', 0.064), ('notion', 0.063), ('engines', 0.061), ('mercer', 0.061), ('page', 0.06), ('ndcg', 0.058), ('york', 0.057), ('ones', 0.055), ('publication', 0.054), ('erm', 0.054), ('relevant', 0.053), ('augmenting', 0.053), ('manuscript', 0.053), ('generalize', 0.052), ('cortes', 0.051), ('solution', 0.051), ('discuss', 0.05), ('reciprocal', 0.05), ('position', 0.049), ('acknowledges', 0.049), ('guarantees', 0.048), ('interval', 0.048), ('deals', 0.047), ('shortcoming', 0.047), ('criterion', 0.047), ('ag', 0.046), ('evaluating', 0.046), ('ranks', 0.045), ('recommendation', 0.045), ('indirectly', 0.045), ('generalization', 0.045), ('rademacher', 0.045), ('covers', 0.044), ('rounds', 0.044), ('several', 0.043), ('constraints', 0.043), ('discusses', 0.041), ('optimizing', 0.041), ('rank', 0.04), ('stephen', 0.039), ('xm', 0.039), ('formulation', 0.039), ('seeks', 0.038), ('discounted', 0.038), ('structured', 0.037), ('unique', 0.037), ('returned', 0.037), ('carefully', 0.036), ('notions', 0.036), ('studying', 0.036), ('round', 0.036), ('denote', 0.035), ('adopted', 0.035), ('placed', 0.035), ('optimization', 0.035), ('preference', 0.035), ('preliminaries', 0.035), ('presentation', 0.035), ('incorporated', 0.035), ('users', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 30 nips-2012-Accuracy at the Top

Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic

Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1

2 0.1391118 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

3 0.13718361 165 nips-2012-Iterative ranking from pair-wise comparisons

Author: Sahand Negahban, Sewoong Oh, Devavrat Shah

Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1

4 0.12565444 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

5 0.11997414 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

Author: Maksims Volkovs, Richard S. Zemel

Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1

6 0.095218368 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

7 0.07714323 60 nips-2012-Bayesian nonparametric models for ranked data

8 0.069853008 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

9 0.068502843 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

10 0.065845013 358 nips-2012-Value Pursuit Iteration

11 0.06340427 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

12 0.061684296 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

13 0.056241136 16 nips-2012-A Polynomial-time Form of Robust Regression

14 0.056003705 148 nips-2012-Hamming Distance Metric Learning

15 0.054449048 271 nips-2012-Pointwise Tracking the Optimal Regression Function

16 0.051154226 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

17 0.050726227 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

18 0.049865782 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support

19 0.049533561 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

20 0.048360918 194 nips-2012-Learning to Discover Social Circles in Ego Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.148), (1, 0.009), (2, 0.022), (3, -0.046), (4, 0.033), (5, -0.053), (6, -0.005), (7, 0.114), (8, 0.014), (9, 0.132), (10, -0.098), (11, 0.079), (12, -0.079), (13, 0.008), (14, 0.077), (15, 0.01), (16, 0.0), (17, -0.009), (18, 0.005), (19, 0.023), (20, 0.049), (21, -0.016), (22, -0.027), (23, -0.039), (24, -0.016), (25, 0.042), (26, 0.017), (27, 0.011), (28, 0.023), (29, 0.005), (30, -0.057), (31, 0.0), (32, -0.076), (33, -0.037), (34, -0.0), (35, 0.025), (36, -0.0), (37, 0.038), (38, 0.02), (39, -0.011), (40, -0.002), (41, 0.06), (42, -0.127), (43, -0.084), (44, -0.048), (45, -0.048), (46, -0.058), (47, 0.033), (48, -0.022), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91792488 30 nips-2012-Accuracy at the Top

Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic

Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1

2 0.73814046 165 nips-2012-Iterative ranking from pair-wise comparisons

Author: Sahand Negahban, Sewoong Oh, Devavrat Shah

Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1

3 0.71941304 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

4 0.70131165 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

5 0.65663868 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

Author: Maksims Volkovs, Richard S. Zemel

Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1

6 0.65350348 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

7 0.5998354 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

8 0.52078402 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

9 0.51639199 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

10 0.46345001 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

11 0.45729434 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

12 0.44617143 271 nips-2012-Pointwise Tracking the Optimal Regression Function

13 0.43980205 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

14 0.43327433 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

15 0.42744741 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

16 0.42740244 358 nips-2012-Value Pursuit Iteration

17 0.42567956 59 nips-2012-Bayesian nonparametric models for bipartite graphs

18 0.42143387 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

19 0.41870111 288 nips-2012-Rational inference of relative preferences

20 0.41789949 60 nips-2012-Bayesian nonparametric models for ranked data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.031), (21, 0.012), (38, 0.202), (42, 0.024), (53, 0.372), (55, 0.017), (74, 0.033), (76, 0.124), (80, 0.08), (92, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87513494 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

2 0.82954329 359 nips-2012-Variational Inference for Crowdsourcing

Author: Qiang Liu, Jian Peng, Alex Ihler

Abstract: Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers’ reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. 1

3 0.80912745 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

Author: Nicolò Cesa-bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) such that |E| = Ω(|V |3/2 ) by querying O(|V |3/2 ) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V | + (|V |/k)3/2 edge labels. The running time of this algorithm is at most of order |E| + |V | log |V |. 1

4 0.75682485 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

same-paper 5 0.75355858 30 nips-2012-Accuracy at the Top

Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic

Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1

6 0.73362917 12 nips-2012-A Neural Autoregressive Topic Model

7 0.68785393 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

8 0.59140342 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

9 0.58395338 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

10 0.57299101 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family

11 0.57067692 293 nips-2012-Relax and Randomize : From Value to Algorithms

12 0.56936508 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

13 0.56797922 160 nips-2012-Imitation Learning by Coaching

14 0.56594956 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

15 0.56445569 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

16 0.56283134 304 nips-2012-Selecting Diverse Features via Spectral Regularization

17 0.55664587 86 nips-2012-Convex Multi-view Subspace Learning

18 0.55451268 187 nips-2012-Learning curves for multi-task Gaussian process regression

19 0.55435121 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

20 0.55427694 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models