nips nips2012 nips2012-141 knowledge-graph by maker-knowledge-mining

141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm


Source: pdf

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Diversified ranking is a fundamental task in machine learning. [sent-12, score-0.426]

2 In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. [sent-16, score-1.022]

3 Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. [sent-18, score-0.158]

4 1 Introduction Many real applications can be reduced to a ranking problem. [sent-20, score-0.426]

5 While traditional ranking tasks mainly focus on relevance, it has been widely recognized that diversity is another highly desirable property. [sent-21, score-0.629]

6 On one hand, each team member should have some relevant skills. [sent-25, score-0.108]

7 On the other hand, the whole team should somehow be diversified, so that we can cover all the required skills for the task and different team members can benefit from each other’s diversified, complementary knowledge and social capital. [sent-26, score-0.214]

8 To date, many diversified ranking algorithms have been proposed. [sent-30, score-0.426]

9 Early works mainly focus on text data [5, 23] where the goal is to improve the coverage of (sub-)topics in the retrieval result. [sent-31, score-0.209]

10 For example, if a query bears multiple meanings (such as the key word ‘jaguar’, which could refer to either cars or cats), we would like to have each meaning (e. [sent-33, score-0.117]

11 Another recent trend is to diversify PageRank-type of algorithms for graph data [24, 11, 16]. [sent-36, score-0.123]

12 It is worth pointing out that almost all the existing diversified ranking algorithms hinge on the specific choice of the relevance function and/or the similarity function. [sent-37, score-0.765]

13 In this paper, we shift the problem to a more generic setting and ask: given an arbitrary relevance function wrt an implicit or explicit query, and an arbitrary similarity function among all the available examples, how can we diversify the resulting top-k ranking list? [sent-39, score-0.919]

14 First, we propose an objective function that admits any non-negative relevance function and any non-negative, symmetric similarity function. [sent-41, score-0.366]

15 It naturally captures both the relevance with regard to the query and the diversity of the ranking list, with a regularization parameter that balances between them. [sent-42, score-1.003]

16 Then, we show that while such an optimization problem is NP-hard in general, for a large volume of the parameter space, the objective function exhibits the diminishing returns property, including submodurality, monotonicity, etc. [sent-43, score-0.162]

17 We present our optimization formulation for diversified ranking in Section 2, followed by the analysis of its hardness and properties. [sent-46, score-0.523]

18 1 Notation In this paper: we use normal lower-case letters to denote scalers or functions, bold-face lower-case letters to denote vectors, bold-face upper-case letters to denote matrices, and calligraphic upper-case letters to denote sets. [sent-54, score-0.116]

19 , xn }, let S denote the n × n similarity matrix, which is both symmetric and non-negative. [sent-58, score-0.082]

20 For any ranking function r(·), which returns the non-negative relevance score for each example in X with respect to an implicit or explicit query, our goal is to find a subset T of k examples, which are relevant to the query and diversified among themselves. [sent-63, score-0.941]

21 Here the positive integer k is the budget of the ranking list size, and the ranking function r(·) generates an n × 1 vector r, whose ith element r i = r(xi ). [sent-64, score-1.044]

22 When we describe the objective function as well as the proposed optimization algorithm, it is convenient to introduce the following n × 1 reference vector q = S · r. [sent-65, score-0.077]

23 , )) that are relevant to the query (high r j (j = 1, 2, . [sent-71, score-0.16]

24 For example, if xi is close to the center of a big cluster relevant to the query, the value of q i is large. [sent-75, score-0.076]

25 2 Objective Function With the above notation, our goal is to find a subset T of k examples which are both relevant to the query and diversified among themselves. [sent-77, score-0.191]

26 arg max |T |=k g(T ) = w q i ri − i∈T r i S i,j r j (1) i,j∈T where w is a positive regularization parameter that defines the trade-off between the two terms, and T consists of the indices of the k examples that will be returned in the ranking list. [sent-79, score-0.576]

27 Intuitively, in the goodness function g(T ), the first term measures the weighted overall relevance of T with respect to the query, and q i is the weight for xi . [sent-80, score-0.392]

28 In other words, if two examples are equally relevant to the query, one from a big cluster and the other isolated, by using the weighted relevance, we prefer the former. [sent-82, score-0.107]

29 The second term measures 2 the similarity among the examples within T . [sent-83, score-0.148]

30 By including this term in the objective function, we seek a set of examples which are relevant to the query, but also dissimilar to each other. [sent-85, score-0.101]

31 3 The Hardness of Equation (1) In the optimization problem in Equation (1), we want to find a subset T of k examples that collectively maximize the goodness function g(T ). [sent-88, score-0.136]

32 Under such settings, the objective function in Equation (1) becomes g(T ) = 2 ¯ ri W i,j r j q i ri − i∈T i,j∈T |V| = ¯ ri W ij r j − 2 i∈T j=1 = ¯ ri W i,j r j ¯ ri W ij r j + 2 i∈Q j∈T ¯ (symmetry of W) i,j∈T ¯ W ij + 2 (dfn. [sent-103, score-0.622]

33 of q) i,j∈T i∈Q j∈T = ¯ ri W i,j r j ¯ W i,j (dfn. [sent-104, score-0.119]

34 To this end, we present the so-called diminishing returns property of the goodness function g(T ) defined in Equation (1), which is summarized in the following theorem. [sent-112, score-0.186]

35 2, if we add more examples into an existing top-k ranking list, the goodness of the overall ranking list is non-decreasing (P2). [sent-114, score-1.141]

36 However, the marginal benefit of adding additional examples into the ranking list decreases wrt the size of the existing ranking list (P1). [sent-115, score-1.254]

37 The goodness function g(T ) defined in Equation (1) has the following properties: (P1) submodularity. [sent-119, score-0.078]

38 For any w > 0, the objective function g(T ) is submodular wrt T ; (P2) monotonicity. [sent-120, score-0.082]

39 For any w ≥ 2, The objective function g(T ) is monotonically nondecreasing wrt T . [sent-121, score-0.082]

40 Therefore, we have (g(T1 ∪ x) − g(T1 )) − (g(T2 ∪ x) − g(T2 )) = S x,j rj − 2rx 2rx = S x,j rj j∈T1 j∈T2 S x,j rj ≥ 0 2rx (7) j∈T2 \T1 which completes the proof of (P1). [sent-125, score-0.437]

41 1 Algorithm Description Based on the diminishing returns property of the goodness function g(T ), we propose the following greedy algorithm to find a diversified top-k ranking list. [sent-130, score-0.635]

42 1, after we calculate the reference vector q (Step 1) and initialize the ranking list T (Step 2), we try to expand the ranking list T one-by-one (Step 4-8). [sent-132, score-1.191]

43 At each iteration, we add one more example with the highest score si into the current ranking list T (Step 5). [sent-133, score-0.625]

44 Each time we expand the current ranking list, we update the score vector s based on the newly added example i (Step 7). [sent-134, score-0.467]

45 1, ‘⊗’ means the element-wise multiplication, and diag(S) returns an n × 1 vector with the corresponding elements being the diagonal elements in the similarity matrix S. [sent-136, score-0.139]

46 Algorithm 1 GenDeR Input: The similarity matrix S n×n , the relevance vector rn×1 , the weight w ≥ 2, and the budget k; Output: A subset T of k nodes. [sent-137, score-0.373]

47 1: Compute the reference vector q: q = Sr; 2: Initialize T as an empty set; 3: Initialize the score vector s = w × (q ⊗ r) − diag(S) ⊗ r ⊗ r; 4: for iter = 1 : k do 5: Find i = argmaxj (sj |j = 1, . [sent-138, score-0.087]

48 , n; j ∈ T ); / 6: Add i to T ; 7: Update the score vector s ← s − 2ri S :,i ⊗ r 8: end for 9: Return the subset T as the ranking list (earlier selected examples ranked higher). [sent-141, score-0.711]

49 The key of the proof is to verify that for any example xj ∈ T , sj = g(T ∪ xj ) − g(T ), / where s is the score vector we calculate in Step 3 or update in Step 7, and T is the initial empty ranking list or the current ranking list in Step 6. [sent-153, score-1.232]

50 The remaining part of the proof directly follows the diminishing returns property of the goodness function in Theorem 2. [sent-154, score-0.186]

51 , q = Sr); and the quadratic term in the space complexity is the cost to store the similarity matrix S. [sent-161, score-0.082]

52 If the similarity matrix S is sparse, say we have m non-zero elements in S, we can reduce the time complexity to O(m + nk); and reduce the space complexity to O(m + n + k). [sent-162, score-0.082]

53 As all these methods aim to improve the diversity of PageRank-type of algorithms, we also present the results by PageRank [13] itself as the baseline. [sent-170, score-0.203]

54 We use two real data sets, including an IMDB actor professional network and an academic citation data set. [sent-171, score-0.358]

55 1 Results on Actor Professional Network The actor professional network is constructed from the Internet Movie Database (IMDB)1 , where the nodes are the actors/actresses and the edges are the numbers of the co-stared movies between two actors/actresses. [sent-177, score-0.237]

56 For the inputs of GenDeR, we use the adjacency matrix of the co-stared network as the similarity function S; and the ranking results by ‘DR’ as the relevance vector r. [sent-178, score-0.795]

57 Given a top-k ranking list, we use the density of the induced subgraph of S by the k nodes as the reverse measure of the diversity (lower density means higher diversity). [sent-179, score-0.739]

58 We also measure the diversity of the ranking list by the so-called ‘country coverage’ as well as ‘movie coverage’ (higher coverage means higher diversity), which are defined in [24]. [sent-180, score-1.023]

59 Notice that for a good top-k diversified ranking list, it often requires the balance between the diversity and the relevance in order to fulfill the user’s information need. [sent-181, score-0.886]

60 Therefore, we also present the relevance score (measured by PageRank) captured by the entire top-k ranking list. [sent-182, score-0.724]

61 In this application, such a relevance score measures the overall prestige of the actors/actresses in the ranking list. [sent-183, score-0.837]

62 1(d), we can see that our GenDeR is as good as ‘PageRank’ in terms of capturing the relevance of the entire top-k ranking list (notice that the two curves almost overlap with each other). [sent-189, score-0.841]

63 On the other hand, GenDeR outperforms ‘PageRank’ in terms of the diversity by all the three measures (Fig. [sent-190, score-0.238]

64 Since GenDeR uses the ranking results by ‘DR’ as its input, ‘DR’ can be viewed as another baseline method. [sent-192, score-0.426]

65 For example, when k ≥ 300, GenDeR returns both higher ‘country-coverage’ (Fig. [sent-196, score-0.084]

66 1(d)), our GenDeR captures higher relevance scores than ‘DR’, indicating the actors/actresses in our ranking list might be more prestigious than those by ‘DR’. [sent-200, score-0.868]

67 Based on these results, we conclude that our GenDeR indeed improves ‘DR’ in terms of both diversity and relevance. [sent-201, score-0.203]

68 Therefore, we conclude that ‘MF’ performs comparably with or slightly better than our GenDeR in terms of diversity, at the cost of sacrificing the relevance of the entire ranking list. [sent-210, score-0.683]

69 As for ‘GCD’, although it leads to the lowest density, it performs poorly in terms of balancing between the diversity and the relevance (Fig. [sent-211, score-0.46]

70 1(d)), as well as the coverage of countries/movies (Fig. [sent-212, score-0.209]

71 It consists of a paper citation network and a researcher citation network. [sent-216, score-0.385]

72 Here, the nodes are papers or researchers; and the edges indicate the citation relationship. [sent-217, score-0.217]

73 Overall, we have 11,609 papers and 54,208 edges in the paper citation network; 9,641 researchers and 229,719 edges in the researcher citation network. [sent-218, score-0.444]

74 For the inputs of GenDeR, we use the symmetrized adjacency matrix as the similarity function S; and the ranking results by ‘DR’ as the relevance vector r. [sent-219, score-0.765]

75 We use the same measure as in [11] (referred to as ‘coverage’), which is the total number of unique papers/researchers that cite the top-k papers/researchers in the ranking list. [sent-220, score-0.426]

76 As pointed out in [11], the ‘coverage’ might provide a better measure of the overall quality of the top-k ranking list than those traditional measures (e. [sent-221, score-0.641]

77 , h-index) as they ignore the diversity of the ranking list. [sent-223, score-0.629]

78 01 0 50 100 150 200 250 300 350 400 450 0 50 500 100 150 200 250 k 300 350 400 450 500 k (c) Density (Lower is better) (d) Relevance (Higher is better) Figure 1: The evaluations on actor professional network. [sent-246, score-0.148]

79 (a-c) are different diversity measures and (d) measures the relevance of the entire ranking list. [sent-247, score-0.956]

80 For example, with k = 50, GenDeR improves the ‘coverage’ of the next best method by 416 and 157 on the two citation networks, respectively. [sent-249, score-0.158]

81 5 Related Work Carbonell et al [5] are among the first to study diversified ranking in the context of text retrieval and summarization. [sent-252, score-0.491]

82 To this end, they propose to use the Maximal Marginal Relevance (MMR) criterion to reduce redundancy while maintaining query relevance, which is a linear combination of relevance and novelty. [sent-253, score-0.374]

83 In [23], Zhai et al address this problem from a different perspective by explicitly modeling the subtopics associated with a query, and proposing a framework to evaluate subtopic retrieval. [sent-254, score-0.139]

84 With the prevalence of graph data, such as social networks, author/paper citation networks, actor professional networks, etc, researchers have started to study the problem of diversified ranking in the presence of relationships among the examples. [sent-257, score-0.844]

85 For instance, in [24], Zhu et al propose the GRASSHOPPER algorithm by constructing random walks on the input graph, and iteratively turning the ranked nodes into absorbing states. [sent-258, score-0.21]

86 In [11], Mei et al propose the DivRank algorithm based on a reinforced random walk defined on the input graph, which automatically balances the prestige and the diversity among the top ranked nodes due to the fact that adjacent nodes compete for their ranking scores. [sent-259, score-0.904]

87 In [16], Tong et al propose a scalable algorithm to find the near-optimal solution to diversify the top-k ranking list for PageRank. [sent-260, score-0.748]

88 On a higher level, the method in [16] can be roughly viewed as an instantiation of our proposed formulation with the specific choices in the optimization problem (e. [sent-262, score-0.08]

89 g, the relevance function, the similarity function, the regularization parameter, etc). [sent-263, score-0.339]

90 In [25], Zhu et al leverage the stopping points in the manifold ranking algorithms to diversify the results. [sent-264, score-0.59]

91 All these works aim to diversify the results of one specific type of ranking function (i. [sent-265, score-0.525]

92 , improving the F-score in the ranking task, improving the classification accuracy in metric learning, etc) without the consideration of diversity. [sent-274, score-0.482]

93 Nonetheless, thanks to the generality of our formulation, the learned ranking functions and metric functions from most of these works can be naturally admitted into our optimization objective function. [sent-275, score-0.48]

94 In other words, our formulation brings the possibility to take advantage of these existing research results in the diversified ranking setting. [sent-276, score-0.452]

95 For example, such knowledge can be reflected in the (learnt) ranking function and/or the (learnt) similarity function, which can in turn serve as the input of our method. [sent-280, score-0.508]

96 The key feature of our formulation lies in its generality: it admits any non-negative relevance function and any non-negative, symmetric similarity function as input, and outputs a top-k ranking list that enjoys both relevance and diversity. [sent-282, score-1.206]

97 Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. [sent-348, score-0.426]

98 Divrank: the interplay of prestige and diversity in information networks. [sent-370, score-0.259]

99 The PageRank citation ranking: Bringing order to the web. [sent-386, score-0.158]

100 Social network effects on performance and layoffs: Evidence from the adoption of a social networking tool. [sent-453, score-0.085]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ranking', 0.426), ('gender', 0.376), ('diversi', 0.347), ('relevance', 0.257), ('coverage', 0.209), ('diversity', 0.203), ('pagerank', 0.188), ('gcd', 0.169), ('list', 0.158), ('citation', 0.158), ('dr', 0.155), ('mf', 0.151), ('rj', 0.136), ('ri', 0.119), ('query', 0.117), ('diversify', 0.099), ('professional', 0.083), ('similarity', 0.082), ('goodness', 0.078), ('al', 0.065), ('actor', 0.065), ('team', 0.065), ('returns', 0.057), ('etc', 0.057), ('divrank', 0.056), ('prestige', 0.056), ('ranked', 0.055), ('social', 0.055), ('country', 0.055), ('wrt', 0.055), ('diminishing', 0.051), ('movie', 0.045), ('hardness', 0.044), ('relevant', 0.043), ('wq', 0.043), ('sigir', 0.041), ('equation', 0.041), ('score', 0.041), ('mei', 0.041), ('researcher', 0.039), ('abdelzaher', 0.037), ('agrawal', 0.037), ('assembling', 0.037), ('capannini', 0.037), ('diversifying', 0.037), ('dks', 0.037), ('jaguar', 0.037), ('reinforced', 0.037), ('specializations', 0.037), ('subtopic', 0.037), ('subtopics', 0.037), ('szymanski', 0.037), ('rx', 0.036), ('measures', 0.035), ('budget', 0.034), ('walks', 0.033), ('imdb', 0.033), ('yorktown', 0.033), ('carbonell', 0.033), ('mmr', 0.033), ('resistive', 0.033), ('researchers', 0.033), ('big', 0.033), ('www', 0.033), ('tong', 0.032), ('examples', 0.031), ('nodes', 0.031), ('welch', 0.03), ('diminish', 0.03), ('heights', 0.03), ('network', 0.03), ('zhu', 0.029), ('letters', 0.029), ('completes', 0.029), ('skills', 0.029), ('zhai', 0.029), ('edges', 0.028), ('improving', 0.028), ('watson', 0.027), ('documents', 0.027), ('higher', 0.027), ('objective', 0.027), ('optimization', 0.027), ('density', 0.026), ('formulation', 0.026), ('absorbing', 0.026), ('notice', 0.026), ('cats', 0.025), ('bennett', 0.025), ('user', 0.025), ('job', 0.024), ('graph', 0.024), ('army', 0.024), ('greedy', 0.023), ('kdd', 0.023), ('empty', 0.023), ('reference', 0.023), ('overall', 0.022), ('academic', 0.022), ('sr', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

2 0.24984808 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

3 0.13971069 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

4 0.12653299 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

5 0.12154428 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

6 0.098421797 196 nips-2012-Learning with Partially Absorbing Random Walks

7 0.09416797 165 nips-2012-Iterative ranking from pair-wise comparisons

8 0.090344943 286 nips-2012-Random Utility Theory for Social Choice

9 0.084749252 304 nips-2012-Selecting Diverse Features via Spectral Regularization

10 0.08469703 148 nips-2012-Hamming Distance Metric Learning

11 0.08205761 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

12 0.078785881 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

13 0.076242171 22 nips-2012-A latent factor model for highly multi-relational data

14 0.071371913 330 nips-2012-Supervised Learning with Similarity Functions

15 0.069987252 271 nips-2012-Pointwise Tracking the Optimal Regression Function

16 0.064920679 60 nips-2012-Bayesian nonparametric models for ranked data

17 0.061499603 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

18 0.057587411 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

19 0.054928437 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

20 0.053643152 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, 0.017), (2, -0.0), (3, -0.048), (4, 0.002), (5, -0.076), (6, -0.002), (7, 0.134), (8, 0.035), (9, 0.196), (10, -0.074), (11, 0.02), (12, -0.076), (13, 0.064), (14, 0.091), (15, 0.031), (16, 0.055), (17, 0.028), (18, -0.017), (19, 0.037), (20, 0.067), (21, 0.02), (22, -0.009), (23, 0.094), (24, 0.134), (25, 0.024), (26, 0.025), (27, 0.128), (28, -0.128), (29, -0.061), (30, 0.136), (31, -0.173), (32, 0.068), (33, -0.005), (34, 0.045), (35, -0.046), (36, -0.027), (37, -0.077), (38, 0.113), (39, 0.028), (40, 0.011), (41, -0.063), (42, 0.006), (43, 0.071), (44, -0.09), (45, 0.002), (46, -0.026), (47, -0.028), (48, -0.063), (49, 0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96214026 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

2 0.88963693 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

3 0.78937179 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

4 0.60876983 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

5 0.57506937 196 nips-2012-Learning with Partially Absorbing Random Walks

Author: Xiao-ming Wu, Zhenguo Li, Anthony M. So, John Wright, Shih-fu Chang

Abstract: We propose a novel stochastic process that is with probability αi being absorbed at current state i, and with probability 1 − αi follows a random edge out of it. We analyze its properties and show its potential for exploring graph structures. We prove that under proper absorption rates, a random walk starting from a set S of low conductance will be mostly absorbed in S. Moreover, the absorption probabilities vary slowly inside S, while dropping sharply outside, thus implementing the desirable cluster assumption for graph-based learning. Remarkably, the partially absorbing process unifies many popular models arising in a variety of contexts, provides new insights into them, and makes it possible for transferring findings from one paradigm to another. Simulation results demonstrate its promising applications in retrieval and classification.

6 0.55092663 286 nips-2012-Random Utility Theory for Social Choice

7 0.51880705 165 nips-2012-Iterative ranking from pair-wise comparisons

8 0.51861346 75 nips-2012-Collaborative Ranking With 17 Parameters

9 0.44959199 22 nips-2012-A latent factor model for highly multi-relational data

10 0.4446134 338 nips-2012-The Perturbed Variation

11 0.43038878 330 nips-2012-Supervised Learning with Similarity Functions

12 0.4043552 148 nips-2012-Hamming Distance Metric Learning

13 0.38124198 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

14 0.38035527 30 nips-2012-Accuracy at the Top

15 0.37120998 288 nips-2012-Rational inference of relative preferences

16 0.36361301 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

17 0.33958322 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

18 0.31939459 46 nips-2012-Assessing Blinding in Clinical Trials

19 0.3170813 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

20 0.31472632 155 nips-2012-Human memory search as a random walk in a semantic network


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.056), (17, 0.014), (19, 0.262), (21, 0.035), (38, 0.129), (39, 0.012), (42, 0.024), (53, 0.015), (54, 0.027), (55, 0.021), (74, 0.078), (76, 0.148), (80, 0.061), (92, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79449081 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

2 0.7374779 25 nips-2012-A new metric on the manifold of kernel matrices with application to matrix geometric means

Author: Suvrit Sra

Abstract: Symmetric positive definite (spd) matrices pervade numerous scientific disciplines, including machine learning and optimization. We consider the key task of measuring distances between two spd matrices; a task that is often nontrivial whenever the distance function must respect the non-Euclidean geometry of spd matrices. Typical non-Euclidean distance measures such as the Riemannian metric δR (X, Y ) = log(Y −1/2 XY −1/2 ) F , are computationally demanding and also complicated to use. To allay some of these difficulties, we introduce a new metric on spd matrices, which not only respects non-Euclidean geometry but also offers faster computation than δR while being less complicated to use. We support our claims theoretically by listing a set of theorems that relate our metric to δR (X, Y ), and experimentally by studying the nonconvex problem of computing matrix geometric means based on squared distances. 1

3 0.6455906 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Author: Du Tran, Junsong Yuan

Abstract: Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because we need to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus making it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. 1

4 0.64543641 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

Author: Nan Li, Longin J. Latecki

Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1

5 0.64463186 210 nips-2012-Memorability of Image Regions

Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva

Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1

6 0.64454705 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

7 0.64420813 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

8 0.64375764 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

9 0.64355856 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

10 0.64332032 69 nips-2012-Clustering Sparse Graphs

11 0.64307588 260 nips-2012-Online Sum-Product Computation Over Trees

12 0.64291942 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

13 0.64232606 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

14 0.64181268 180 nips-2012-Learning Mixtures of Tree Graphical Models

15 0.64181155 38 nips-2012-Algorithms for Learning Markov Field Policies

16 0.6408391 304 nips-2012-Selecting Diverse Features via Spectral Regularization

17 0.64034647 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

18 0.64010519 303 nips-2012-Searching for objects driven by context

19 0.63929194 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

20 0.63898993 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models