nips nips2007 nips2007-142 knowledge-graph by maker-knowledge-mining

142 nips-2007-Non-parametric Modeling of Partially Ranked Data


Source: pdf

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. [sent-7, score-0.957]

2 We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. [sent-8, score-0.215]

3 The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. [sent-9, score-0.424]

4 In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. [sent-10, score-0.275]

5 1 Introduction Rankers such as humans, search engines, and classifiers, output full or partial rankings representing preference relations over n items. [sent-11, score-0.757]

6 The absence of numeric scores or the lack of calibration between existing numeric scores output by the rankers necessitates modeling rankings rather than numeric scores. [sent-12, score-0.618]

7 (1) Handle efficiently a very large number of items n by reverting to partial rather than full rankings. [sent-14, score-0.55]

8 (2) Probability assignment to full and partial rankings should be coherent and contradiction-free. [sent-15, score-0.79]

9 (3) Conduct inference based on training data consisting of partial rankings of different types. [sent-16, score-0.757]

10 (5) In the case of large n convergence of the estimator to the underlying process can be extremely slow for fully ranked data but should be much faster when restricted to simpler partial rankings. [sent-18, score-0.496]

11 By considering partial rankings as censored data we are able to define the model on both full and partial rankings in a coherent and contradictionfree manner. [sent-21, score-1.686]

12 Furthermore, we are able to estimate the underlying structure based on data containing partial rankings of different types. [sent-22, score-0.757]

13 We demonstrate computational efficiency for partial rankings, even in the case of a very large n, by exploiting the combinatorial and algebraic structure of the lattice of partial rankings. [sent-23, score-0.736]

14 We start below by reviewing basic concepts concerning partially ranked data (see [1] for further details) and the Mallows model and then proceed to define our non-parametric estimator. [sent-24, score-0.246]

15 We denote a permutation π using the following vertical bar notation π −1 (1)|π −1 (2)| · · · |π −1 (n). [sent-40, score-0.289]

16 In this notation the numbers correspond to items and the locations of the items in their corresponding compartments correspond to their ranks. [sent-42, score-0.475]

17 The collection of all permutations of n items forms the non-Abelian symmetric group of order n, denoted by S n , using function composition as the group operation πσ = π ◦ σ. [sent-43, score-0.536]

18 The concept of inversions and the result below, taken from [7], will be of great use later on. [sent-45, score-0.174]

19 The inversion set of a permutation π is the set of pairs def U (π) = {(i, j) : i < j, π(i) > π(j)} ⊂ {1, . [sent-47, score-0.291]

20 When n is large, the enormous number of permutations raises difficulties in using the symmetric group for modeling rankings. [sent-59, score-0.215]

21 A reasonable solution is achieved by considering partial rankings which correspond to cosets of the symmetric group. [sent-60, score-0.875]

22 For example, the subgroup of S n consisting of all permutations that fix the top k positions is denoted S1,. [sent-61, score-0.248]

23 ,1,n−k } is the set of permutations consistent with the ordering of π on the k top-ranked items. [sent-74, score-0.217]

24 It may thus be interpreted as a partial ranking of the top k items, that does not contain any information concerning the relative ranking of the bottom n−k items. [sent-75, score-0.933]

25 The set of all such partial rankings forms the quotient space S n /S1,. [sent-76, score-0.79]

26 Figure 1 (left) displays the set of permutations that corresponds to a partial ranking of the top 2 out of 4 items. [sent-80, score-0.777]

27 We generalize this concept to arbitrary partial rankings using the concept of composition. [sent-81, score-0.809]

28 , γr ) corresponds to a partial ranking with γ1 items in the first position, γ2 items in the second position and so on. [sent-91, score-0.988]

29 For such a partial ranking it is known that the first set of γ1 items are to be ranked before the second set of γ2 items etc. [sent-92, score-1.143]

30 ,1,n−k π of the top k items is a special case corresponding to γ = (1, . [sent-97, score-0.238]

31 Then the subgroup Sγ contains all permutations π for which the set equalities π(Ni ) = Ni , ∀i holds (all permutations that only permute within each Ni ). [sent-111, score-0.364]

32 A partial ranking of type γ is equivalent to a coset Sγ π = {σπ : σ ∈ Sγ , π ∈ Sn} and the set of such partial rankings forms the quotient space Sn /Sγ . [sent-112, score-1.529]

33 The vertical bar notation described above is particularly convenient for denoting partial rankings. [sent-113, score-0.485]

34 , n separated by vertical bars, indicating that items on the left side of each vertical bar are preferred to (ranked higher than) items on the right side of the bar. [sent-117, score-0.748]

35 For example, the partial ranking displayed in Figure 1 (left) is denoted by 3|1|2, 4. [sent-118, score-0.614]

36 In the notation above, the ordering of items not separated by a vertical line is meaningless, and for consistency we use the conventional ordering e. [sent-119, score-0.427]

37 The set of all partial rankings def Wn = {Sγ π : π ∈ Sn , ∀γ} (1) which includes all full rankings π ∈ Sn , is a subset of all possible partial orders on {1, . [sent-122, score-1.589]

38 While the formalism of partial rankings in Wn cannot realize all partial orderings, it is sufficiently powerful to include many useful naturally occurring orderings as special cases. [sent-126, score-1.129]

39 1|3|2, 4, corresponds to top k ordering such as the ranked list of top k webpages output by search engines. [sent-140, score-0.27]

40 In constructing a statistical model on permutations or cosets, it is essential to relate one permutation to another. [sent-147, score-0.291]

41 We do this using a distance function on permutations d : Sn × Sn → R that satisfies the usual metric function properties, and in addition is invariant under item relabeling or right action of the symmetric group [1] d(π, σ) = d(πτ, στ ) ∀ π, σ, τ ∈ Sn . [sent-148, score-0.243]

42 Kendall’s tau d(π, σ) can be interpreted as the number of pairs of items for which π and σ have opposing orderings (called disconcordant pairs) or the minimum number of adjacent transpositions needed to bring π −1 to σ −1 (adjacent transposition flips a pair of items having adjacent ranks). [sent-150, score-0.794]

43 By right invariance, d(π, σ) = d(πσ −1 , e) which, for Kendall’s tau equals the number of inversions i(πσ −1 ). [sent-151, score-0.287]

44 This is an important observation that will allow us to simplify many expressions concerning Kendall’s tau using the theory of permutation inversions from the combinatorics literature. [sent-152, score-0.448]

45 3 The Mallows Model and its Extension to Partial Rankings The Mallows model [5] is a simple model on permutations based on Kendall’s tau distance using a location parameter κ and a spread parameter c (which we often treat as a constant) pκ (π) = exp (−cd(π, κ) − log ψ(c)) π, κ ∈ Sn c ∈ R+ . [sent-153, score-0.292]

46 A natural extension to partially ranked data is to consider a partial ranking as censored data equivalent to the set of permutations in its related coset: def pκ (τ ) = ψ −1 (c) pκ (Sγ π) = τ ∈Sγ π exp (−c d(τ, κ)) . [sent-160, score-1.17]

47 However, the apparent absence of a closed form formula for more general partial rankings prevented the widespread use of the above model for large n and encouraged more ad-hoc and heuristic models [1, 6]. [sent-165, score-0.784]

48 This has become especially noticeable due to a new surge of interest, especially in the computer science community, in partial ranking models for large n. [sent-166, score-0.588]

49 The ranking lattice presented next enables extending Fligner and Verducci’s closed form to a more general setting which is critical to the practicality of our non-parametric estimator. [sent-167, score-0.391]

50 4 The Ranking Lattice Partial rankings Sγ π relate to each other in a natural way by expressing more general, more specific or inconsistent ordering. [sent-168, score-0.514]

51 We define below the concepts of partially ordered sets and lattices and then relate them to partial rankings by considering the set of partial rankings Wn as a lattice. [sent-169, score-1.671]

52 1 3 1 4 2 2 3 4 S1,1,2 π = {σ1 π, σ2 π} = 3|1|2, 4 1,2|3 1,3|2 2|1,3 3|1,2 2,3|1 1|2|3 1|3|2 2|1|3 3|1|2 2|3|1 3|2|1 σ2 π 4 PSfrag replacements 1|2,3 1 3 2 3 4 asdf Figure 1: A partial ranking corresponds to a coset or a set of permutations (left). [sent-172, score-0.94]

53 A partially ordered set or poset (Q, ), is a set Q endowed with a binary relation satisfying ∀x, y, z ∈ Q (i) reflexibility: x x, (ii) anti-symmetry: x y and y x ⇒ x = y, and (iii) transitivity: x y and y z ⇒ x z. [sent-176, score-0.22]

54 The set of partial rankings Wn defined in (1) is naturally endowed with the partial order of ranking refinement i. [sent-183, score-1.373]

55 A lower bound z of two elements in a poset x, y satisfies z x and z y. [sent-187, score-0.183]

56 Using the vertical bar notation, two elements are inconsistent iff there exists two items i, j that appear on opposing sides of a vertical bar in x, y i. [sent-199, score-0.624]

57 While the ranking poset is not a lattice, it may be turned into one by augmenting it with a minimum element ˆ 0. [sent-204, score-0.408]

58 def ˜ n = Wn ∪{0} of the ranking poset and a minimum element is a lattice. [sent-205, score-0.483]

59 Since x and y are consistent, we do not have a pair of items i, j appearing as i|j in x and j|i in y. [sent-212, score-0.23]

60 As a result we can form a lower bound z to x, y by starting with a list of numbers and adding the vertical bars that are in either x or y, for example for x = 3|1, 2, 5|4 and y = 3|2|1, 4, 5 we have z = 3|2|1, 5|4. [sent-213, score-0.214]

61 If z is comparable to z, z z since removing any vertical bar from z results in an element that is not a lower bound. [sent-216, score-0.199]

62 If z is not comparable to z, then both z, z contain the vertical bars in x and vertical bars in y possibly with some additional ones. [sent-217, score-0.326]

63 By construction z contains only the essential vertical bars to make it a lower bound and hence z z, contradicting the assumption that z, z are non-comparable. [sent-218, score-0.276]

64 , n Sγ π Sλ σ Sγ π Sλ σ PSfrag replacements PSfrag replacements ˆ 0 ˆ 0 ˜ Figure 2: Censored data in the Hasse diagram of Wn corresponding to two partial rankings with the same (left) and different (right) number of vertical bars. [sent-232, score-1.029]

65 The two big triangles correspond to the Hasse diagram of Figure 1 (right) with permutations occupying the bottom level. [sent-233, score-0.2]

66 5 Non-Parametric Models on the Ranking Lattice The censored data approach to partial ranking described by Equation (5) may be generalized to ˜ arbitrary probability models p on Sn . [sent-234, score-0.703]

67 Extending a probability model p on Sn to Wn by defining it ˜ n \ Sn and considering the partial ranking model to be zero on W g(Sγ π) = p(σ) = σ∈Sγ π p(τ ), τ ˜ τ ∈ Wn . [sent-235, score-0.612]

68 (6) Sγ π The function g, when restricted to partial rankings of the same type G = {Sγ π : π ∈ Sn } constitutes a distribution over G. [sent-236, score-0.757]

69 o For large n, modeling partial, rather than full rankings is a computational necessity. [sent-238, score-0.48]

70 It is tempting to construct a statistical model on partial rankings directly without reference to an underlying permutation model, e. [sent-239, score-0.873]

71 Figure 2 illustrates this problem for partial rankings with the same (left) and different (right) number of vertical bars. [sent-245, score-0.88]

72 The alternative we present of starting with a permutation model p : Sn → R and extending it to g via the M¨ bius inversion is a simple and effective way of o avoiding such lack of coherence. [sent-247, score-0.244]

73 Identifying partially ranked training data D = {Sγi πi : i = 1, . [sent-248, score-0.215]

74 (9) κ∈Sλ π τ ∈Sγi πi Model (8) and its partial ranking extension (9) satisfy requirement 3 in Section 1 since D contains partial rankings of possibly different types. [sent-252, score-1.419]

75 Similarly, by the censored data interpretation of partial rankings, they satisfy requirement 2. [sent-253, score-0.472]

76 In the case of a very large number of items reverting to partial ranking of type γ is a crucial element. [sent-257, score-0.826]

77 The coherence between p, g and ˆ ˆ 5 the nature of D are important factors in modeling partially ranked data. [sent-258, score-0.25]

78 In the next section we show that even for n → ∞ (as is nearly the case for web-search), the required computation is feasible as it depends only on the complexity of the composition γ characterizing the data D and the partial rankings on which g is evaluated. [sent-259, score-0.858]

79 The set appearing in the definition of aγ (τ ) k contains all label pairs (s, t) that are inversions of τ −1 and that appear in the k-compartment of the γ decomposition γ. [sent-266, score-0.26]

80 The set appearing in the definition of bkl (τ ) contains label pairs (s, t) that are inversions of τ −1 and for which s and t appear in the l and k compartments of γ respectively. [sent-267, score-0.349]

81 The cross 1 2 compartment inversions are (4, 3), (4, 2) making bγ (τ ) = 2. [sent-273, score-0.214]

82 The significance of (10) is that as we 12 sum over all representatives of the coset τ ∈ Sγ π the cross compartmental inversions bγ (τ ) remain kl constant while the within-compartmental inversions aγ (τ ) vary over all possible combinations. [sent-274, score-0.543]

83 For π ∈ Sn , q > 0, and a composition γ we have q i(τ ) = q r k=1 r l=k+1 r γs −1 bγ (π) kl τ ∈Sγ π j (13) qk . [sent-277, score-0.223]

84 q i(τ ) = τ ∈Sγ π q r k=1 aγ (τ )+ k r k=1 r l=k+1 bγ (τ ) kl =q r k=1 r l=k+1 bγ (π) kl =q aγ (τ ) k τ ∈Sγ π τ ∈Sγ π r k=1 r k=1 q r l=k+1 bγ (π) kl r q i(τ ) s=1 τ ∈Sγs =q r k=1 r l=k+1 bγ (π) kl r γs −1 j qk . [sent-279, score-0.41]

85 The remaining terms depend only on the partial ranking type γ and thus may be pre-computed and tabulated for efficient computation. [sent-282, score-0.588]

86 The partial ranking extension corresponding to the Mallows model p κ is pκ (Sγ π) = r s=1 γs −1 j −kc j=1 k=0 e n−1 j −kc j=1 k=0 e e−c r k=1 r l=k+1 bγ (πκ−1 ) kl ∝ e−c r k=1 r l=k+1 bγ (πκ−1 ) kl Proof. [sent-293, score-0.78]

87 The complexity of computing (14) and (8), (9) for some popular partial ranking types appears in Table 1. [sent-298, score-0.588]

88 We use fully ranked APA election data (rankings are ballots for five APA presidential candidates), and during each iteration, 30% of the examples are randomly selected for testing. [sent-302, score-0.185]

89 To visualize the advantage of the non-parametric model over the Mallows model we display in Figure 3 (bottom row) their estimated probabilities by scaling the vertices of the permutation polytope proportionally. [sent-306, score-0.169]

90 The displayed polytope has vertices corresponding to rankings of 4 items and whose edges correspond to an adjacent transposition (Kendall’s tau distance is the shortest path between two vertices). [sent-307, score-0.882]

91 In this case the four ranked items are movies no. [sent-308, score-0.379]

92 357, 1356, 440, 25 from the EachMovie dataset containing rankings of 1628 movies. [sent-309, score-0.445]

93 7 Figure 3 (top right) demonstrates modeling partial rankings of a much larger n. [sent-311, score-0.792]

94 We used 10043 rankings from the Jester dataset which contains user rankings of n = 100 jokes. [sent-312, score-0.919]

95 We kept the partial ranking type of the testing data fixed at (5, n − 5) and experimented with different censoring of the training data. [sent-313, score-0.637]

96 The figure illustrates the slower consistency rate for fully ranked training data and the statistical benefit in censoring full rankings in the training data. [sent-314, score-0.675]

97 non-parametric model for APA election data (left) and non-parametric model with different partial ranking types for Jester data (right). [sent-325, score-0.618]

98 The key component is our ability to efficiently compute (14) for simple partial ranking types and large n. [sent-328, score-0.588]

99 Table 1 indicates the resulting complexity scales up with complexity of the composition k but is independent of n which is critical for modeling practical situations of k n partial rankings. [sent-329, score-0.448]

100 Experiments show the statistical advantage of the non-parametric partial ranking modeling in addition to its computational feasibility. [sent-330, score-0.623]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rankings', 0.445), ('partial', 0.312), ('mallows', 0.302), ('ranking', 0.276), ('sn', 0.214), ('items', 0.2), ('wn', 0.187), ('ranked', 0.155), ('coset', 0.151), ('permutations', 0.151), ('inversions', 0.148), ('kc', 0.132), ('poset', 0.132), ('vertical', 0.123), ('permutation', 0.116), ('tau', 0.115), ('censored', 0.115), ('composition', 0.101), ('kl', 0.096), ('cosets', 0.094), ('hasse', 0.094), ('kendall', 0.093), ('lattice', 0.088), ('proposition', 0.083), ('compartments', 0.075), ('def', 0.075), ('inversion', 0.071), ('psfrag', 0.066), ('compartment', 0.066), ('orderings', 0.06), ('partially', 0.06), ('mum', 0.058), ('apa', 0.057), ('bius', 0.057), ('disconcordant', 0.057), ('fligner', 0.057), ('lebanon', 0.057), ('replacements', 0.05), ('bar', 0.05), ('lattices', 0.049), ('censoring', 0.049), ('diagram', 0.049), ('requirement', 0.045), ('inconsistent', 0.045), ('bars', 0.04), ('parzen', 0.04), ('cd', 0.04), ('item', 0.039), ('ordering', 0.039), ('corollary', 0.039), ('top', 0.038), ('bkl', 0.038), ('jester', 0.038), ('lafayette', 0.038), ('purdue', 0.038), ('reverting', 0.038), ('transposition', 0.038), ('verducci', 0.038), ('combinatorics', 0.038), ('numeric', 0.036), ('modeling', 0.035), ('coherent', 0.033), ('quotient', 0.033), ('contradicting', 0.033), ('opposing', 0.033), ('subgroup', 0.033), ('adjacent', 0.031), ('ni', 0.031), ('concerning', 0.031), ('appearing', 0.03), ('west', 0.03), ('rankers', 0.03), ('election', 0.03), ('pairs', 0.029), ('group', 0.029), ('estimator', 0.029), ('contains', 0.029), ('preferred', 0.028), ('endowed', 0.028), ('polytope', 0.028), ('consistent', 0.027), ('closed', 0.027), ('eachmovie', 0.026), ('qk', 0.026), ('exp', 0.026), ('concept', 0.026), ('lower', 0.026), ('rank', 0.026), ('denoted', 0.026), ('consistency', 0.026), ('bound', 0.025), ('vertices', 0.025), ('right', 0.024), ('considering', 0.024), ('ranks', 0.024), ('movies', 0.024), ('decomposition', 0.024), ('combinatorial', 0.024), ('relate', 0.024), ('window', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 142 nips-2007-Non-parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1

2 0.13277359 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson

Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1

3 0.12127723 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola

Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1

4 0.11466832 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index

Author: Harald Steck, Balaji Krishnapuram, Cary Dehing-oberije, Philippe Lambin, Vikas C. Raykar

Abstract: In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model assessment in survival analysis. In contrast, the standard approach to learning the popular proportional hazard (PH) model is based on Cox’s partial likelihood. We devise two bounds on CI–one of which emerges directly from the properties of PH models–and optimize them directly. Our experimental results suggest that all three methods perform about equally well, with our new approach giving slightly better results. We also explain why a method designed to maximize the Cox’s partial likelihood also ends up (approximately) maximizing the CI. 1

5 0.10475378 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

Author: Ping Li, Qiang Wu, Christopher J. Burges

Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.

6 0.10174575 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

7 0.09108945 39 nips-2007-Boosting the Area under the ROC Curve

8 0.081394888 77 nips-2007-Efficient Inference for Distributions on Permutations

9 0.076421618 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

10 0.075346626 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

11 0.074627705 169 nips-2007-Retrieved context and the discovery of semantic structure

12 0.067854404 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

13 0.052512079 15 nips-2007-A general agnostic active learning algorithm

14 0.046516083 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis

15 0.043236285 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

16 0.03916784 159 nips-2007-Progressive mixture rules are deviation suboptimal

17 0.036579277 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

18 0.03460484 196 nips-2007-The Infinite Gamma-Poisson Feature Model

19 0.034569152 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

20 0.033955496 199 nips-2007-The Price of Bandit Information for Online Optimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.129), (1, 0.018), (2, -0.065), (3, 0.025), (4, 0.04), (5, 0.007), (6, 0.099), (7, -0.064), (8, 0.031), (9, -0.091), (10, -0.056), (11, 0.043), (12, 0.254), (13, 0.081), (14, -0.011), (15, -0.094), (16, -0.013), (17, 0.029), (18, -0.065), (19, 0.01), (20, -0.015), (21, 0.01), (22, 0.046), (23, -0.115), (24, 0.014), (25, -0.033), (26, -0.002), (27, -0.037), (28, 0.024), (29, -0.091), (30, -0.015), (31, 0.175), (32, 0.054), (33, 0.177), (34, -0.033), (35, 0.143), (36, 0.195), (37, 0.082), (38, 0.067), (39, 0.158), (40, -0.026), (41, 0.065), (42, -0.026), (43, 0.164), (44, -0.021), (45, 0.066), (46, 0.046), (47, 0.017), (48, -0.042), (49, 0.16)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97323149 142 nips-2007-Non-parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1

2 0.56518489 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson

Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1

3 0.5562982 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index

Author: Harald Steck, Balaji Krishnapuram, Cary Dehing-oberije, Philippe Lambin, Vikas C. Raykar

Abstract: In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model assessment in survival analysis. In contrast, the standard approach to learning the popular proportional hazard (PH) model is based on Cox’s partial likelihood. We devise two bounds on CI–one of which emerges directly from the properties of PH models–and optimize them directly. Our experimental results suggest that all three methods perform about equally well, with our new approach giving slightly better results. We also explain why a method designed to maximize the Cox’s partial likelihood also ends up (approximately) maximizing the CI. 1

4 0.51863009 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola

Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1

5 0.43831441 77 nips-2007-Efficient Inference for Distributions on Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact representations such as graphical models cannot efficiently capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent such distributions compactly. We present Kronecker conditioning, a general and efficient approach for maintaining these distributions directly in the Fourier domain. Low order Fourier-based approximations can lead to functions that do not correspond to valid distributions. To address this problem, we present an efficient quadratic program defined directly in the Fourier domain to project the approximation onto a relaxed form of the marginal polytope. We demonstrate the effectiveness of our approach on a real camera-based multi-people tracking setting. 1

6 0.36780056 15 nips-2007-A general agnostic active learning algorithm

7 0.36474177 159 nips-2007-Progressive mixture rules are deviation suboptimal

8 0.35943681 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

9 0.3499563 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

10 0.34597802 169 nips-2007-Retrieved context and the discovery of semantic structure

11 0.34532049 196 nips-2007-The Infinite Gamma-Poisson Feature Model

12 0.33575818 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

13 0.33067319 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

14 0.31574365 39 nips-2007-Boosting the Area under the ROC Curve

15 0.3063156 119 nips-2007-Learning with Tree-Averaged Densities and Distributions

16 0.24665755 82 nips-2007-Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

17 0.24642038 12 nips-2007-A Spectral Regularization Framework for Multi-Task Structure Learning

18 0.2460755 49 nips-2007-Colored Maximum Variance Unfolding

19 0.23364769 43 nips-2007-Catching Change-points with Lasso

20 0.22609973 26 nips-2007-An online Hebbian learning rule that performs Independent Component Analysis


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.026), (13, 0.041), (16, 0.014), (18, 0.016), (21, 0.049), (31, 0.014), (34, 0.055), (35, 0.029), (36, 0.33), (47, 0.056), (49, 0.027), (83, 0.132), (85, 0.042), (87, 0.02), (90, 0.071)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77378315 142 nips-2007-Non-parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1

2 0.69640642 164 nips-2007-Receptive Fields without Spike-Triggering

Author: Guenther Zeck, Matthias Bethge, Jakob H. Macke

Abstract: S timulus selectivity of sensory neurons is often characterized by estimating their receptive field properties such as orientation selectivity. Receptive fields are usually derived from the mean (or covariance) of the spike-triggered stimulus ensemble. This approach treats each spike as an independent message but does not take into account that information might be conveyed through patterns of neural activity that are distributed across space or time. Can we find a concise description for the processing of a whole population of neurons analogous to the receptive field for single neurons? Here, we present a generalization of the linear receptive field which is not bound to be triggered on individual spikes but can be meaningfully linked to distributed response patterns. More precisely, we seek to identify those stimulus features and the corresponding patterns of neural activity that are most reliably coupled. We use an extension of reverse-correlation methods based on canonical correlation analysis. The resulting population receptive fields span the subspace of stimuli that is most informative about the population response. We evaluate our approach using both neuronal models and multi-electrode recordings from rabbit retinal ganglion cells. We show how the model can be extended to capture nonlinear stimulus-response relationships using kernel canonical correlation analysis, which makes it possible to test different coding mechanisms. Our technique can also be used to calculate receptive fields from multi-dimensional neural measurements such as those obtained from dynamic imaging methods. 1

3 0.65481472 107 nips-2007-Iterative Non-linear Dimensionality Reduction with Manifold Sculpting

Author: Michael Gashler, Dan Ventura, Tony Martinez

Abstract: Many algorithms have been recently developed for reducing dimensionality by projecting data onto an intrinsic non-linear manifold. Unfortunately, existing algorithms often lose significant precision in this transformation. Manifold Sculpting is a new algorithm that iteratively reduces dimensionality by simulating surface tension in local neighborhoods. We present several experiments that show Manifold Sculpting yields more accurate results than existing algorithms with both generated and natural data-sets. Manifold Sculpting is also able to benefit from both prior dimensionality reduction efforts. 1

4 0.58628792 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

Author: Lars Buesing, Wolfgang Maass

Abstract: We show that under suitable assumptions (primarily linearization) a simple and perspicuous online learning rule for Information Bottleneck optimization with spiking neurons can be derived. This rule performs on common benchmark tasks as well as a rather complex rule that has previously been proposed [1]. Furthermore, the transparency of this new learning rule makes a theoretical analysis of its convergence properties feasible. A variation of this learning rule (with sign changes) provides a theoretically founded method for performing Principal Component Analysis (PCA) with spiking neurons. By applying this rule to an ensemble of neurons, different principal components of the input can be extracted. In addition, it is possible to preferentially extract those principal components from incoming signals X that are related or are not related to some additional target signal YT . In a biological interpretation, this target signal YT (also called relevance variable) could represent proprioceptive feedback, input from other sensory modalities, or top-down signals. 1

5 0.47645926 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1

6 0.47275314 128 nips-2007-Message Passing for Max-weight Independent Set

7 0.4691304 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

8 0.4686563 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

9 0.46837041 141 nips-2007-New Outer Bounds on the Marginal Polytope

10 0.4680016 156 nips-2007-Predictive Matrix-Variate t Models

11 0.46619892 187 nips-2007-Structured Learning with Approximate Inference

12 0.46612537 49 nips-2007-Colored Maximum Variance Unfolding

13 0.46549767 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

14 0.46199316 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

15 0.46161288 134 nips-2007-Multi-Task Learning via Conic Programming

16 0.4612608 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

17 0.4609175 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

18 0.46034357 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search

19 0.46031123 185 nips-2007-Stable Dual Dynamic Programming

20 0.46024469 146 nips-2007-On higher-order perceptron algorithms