jmlr jmlr2010 jmlr2010-10 knowledge-graph by maker-knowledge-mining

10 jmlr-2010-An Exponential Model for Infinite Rankings


Source: pdf

Author: Marina Meilă, Le Bao

Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. [sent-8, score-0.361]

2 This model assigns a permutation π over n items a probability that decays exponentially with its distance to a central permutation σ. [sent-15, score-0.735]

3 Here we study this class of models in the limit n → ∞, with the assumption that out of the infinitely many items ordered, one only observes those occupying the first t ranks. [sent-16, score-0.289]

4 Ordering an infinite number of items is common in retrieval tasks: search engines, programs that match a face, or a fingerprint, or a biological sequence against a database, all output the first t items in a ordering over virtually infinitely many objects. [sent-17, score-0.771]

5 An even more realistic scenario, that we will not tackle for now, is one where a voter (ranker) does not have access to the whole population of items (e. [sent-26, score-0.289]

6 1 Permutations, Infinite Permutations and top-t Orderings A permutation σ is a function from a set of items { i1 , i2 , . [sent-38, score-0.449]

7 the set of items can be considered to be the set 1 : n. [sent-46, score-0.289]

8 Therefore σ(i) denotes the rank of item i and σ−1 ( j) denotes the item at rank j in σ. [sent-47, score-0.39]

9 This will be the case with the other definitions; hence, from now on, we will always consider that the set of items is P. [sent-57, score-0.289]

10 Any permutation σ can be represented by the (infinite) ranked list (σ−1 (1)|σ−1 (2)| . [sent-58, score-0.269]

11 Then σ(1) = 3 means that item 1 has rank 3 in this permutation; σ(2) = 1 means that item 2 has rank 1, etc. [sent-72, score-0.39]

12 Conversely, σ−1 (1) = 2 and σ−1 (3 j) = 3 j − 2 mean that the first in the list representation of σ is item 2, and that at rank 3 j is to be found item 3 j − 2, respectively. [sent-73, score-0.36]

13 A top-t ordering can be seen as defining a set consisting of those infinite orderings which start with the prefix π. [sent-80, score-0.334]

14 If we denote by SP the set of all permutations over P and by SP−t = {σ ∈ SP | σ(i) = i, for i = 1 : t} the subgroup of all permutations that leave the top-t ranks unchanged, then a top-t ordering π corresponds to a unique element of the right coset SP /SP−t . [sent-81, score-0.544]

15 We will use Greek letters like π and σ for both full permutations and for top-t orderings to keep the notation light. [sent-82, score-0.312]

16 The matrix Π of a top-t ordering π is a truncation of some infinite permutation matrix Σ. [sent-90, score-0.315]

17 For a permutation σ and a top-t ordering π, the matrix ΣT Π corresponds to the list of ranks in σ of the items in π. [sent-135, score-0.768]

18 The inversion table of a permutation σ, with respect to the identity permutation id is an infinite sequence of non-negative integers (s1 , s2 , . [sent-137, score-0.426]

19 If π is the top-t ordering of an infinite permutation σ, then the inversion table of π is the t-prefix of the inversion table of σ. [sent-161, score-0.527]

20 3483 ˘ M EIL A AND BAO Another property of the inversion table is that it can be defined with respect to any infinite permutation σ0 , by letting the ordered list corresponding to σ0 replace the list (1|2|3| . [sent-169, score-0.348]

21 Once π−1 (1) is picked, this item is deleted from the ordering σ. [sent-196, score-0.279]

22 From this new list of available items, the second rank in π is picked by skipping the first s2 ranks, and chosing the item in the (s2 + 1)-th rank. [sent-197, score-0.236]

23 The “V” form of the inversion table is closely related to the inversion table we use here. [sent-231, score-0.212]

24 3 Kendall Type Divergences For finite permutations of n items π and σ, dK (π, σ) = n−1 n−1 j=1 j=1 ∑ s j (π|σ) = 3484 ∑ s j (σ|π) A N E XPONENTIAL M ODEL FOR I NFINITE R ANKINGS denotes the Kendall distance (Mallows, 1957) (or inversion distance) which is a metric. [sent-234, score-0.553]

25 1 When θ j are all equal dθ (π, σ) is proportional to the Kendall distance between σ and the set of orderings compatible with π and counts the number of inversions needed to make σ compatible with π. [sent-244, score-0.307]

26 In general, this “distance” between a top-t ordering and an infinite ordering is a set distance. [sent-245, score-0.31]

27 The permutation σ is called the central permutation of Pθ,σ . [sent-271, score-0.364]

28 For any π, let tπ be its length and denote tmax = maxSN tπ , T = ∑SN tπ , and by n the number of distinct items observed in the data. [sent-317, score-0.436]

29 As we shall see, tmax is the dimension of the concentration parameter θ, n is the order of the estimated central permutation σ, and T counts the total number of items in the data, playing a role akin to the sample size. [sent-318, score-0.599]

30 Then, ln Pθ,σ (SN ) = − ∑ [θ j Lσ (R j ) + N j ln ψ(θ j )] with R j = q j 1T − Q j , (8) j≥1 To prove this result, we first introduce an alternative expression for the inversion table s j (π|σ). [sent-320, score-0.28]

31 The log-likelihood of SN is given by t ln Pθ,σ (SN ) = − ∑ ∑ πθ j s j (π|σ) + ln ψ(θ j ) π∈SN , j=1 = − ∑ θj j≥1 ∑ π∈SN s j (π|σ) + N j ln ψ(θ j ) . [sent-333, score-0.261]

32 (11) Note that qi , Qii′ represent respectively the number of times item i is observed in the data and the number of times item i′ precedes i in the data. [sent-342, score-0.317]

33 However, for any practically observed data set, tmax will be finite and the total number of items observed will be finite. [sent-347, score-0.477]

34 Thus, N j , R j will be 0 for any j > tmax and R = ∑ j∈P R j will have non-zero entries only for the items observed in some π ∈ SN . [sent-348, score-0.436]

35 This is obviously the ordering that sorts the items in descending order of their frequency q. [sent-431, score-0.444]

36 Let n be the number of distinct items observed in the data. [sent-481, score-0.33]

37 From σ, we can estimate at most its restriction to the items observed, that is, the restriction of σ to the set π∈SN {π−1 (1), π−1 (2), . [sent-482, score-0.289]

38 The next proposition shows that the ML estimate will always be a permutation which puts the observed items before any unobserved items. [sent-486, score-0.561]

39 Proposition 4 Let SN be a sample of top-t orderings, and let σ be a permutation over P that ranks at least one unobserved item i0 before an item observed in SN . [sent-487, score-0.572]

40 Then there exists another permutation ˜ σ which ranks all observed items before any unobserved items, so that for any parameter vector (θ1:t ), Pθ,σ (SN ) < Pθ,σ (SN ). [sent-488, score-0.613]

41 ˜ Proof For an item i0 not observed in the data, qi0 , j , Qi0 i, j and Ri0 i, j are 0, for any j = 1 : t and any observed item i. [sent-489, score-0.33]

42 Also note that if we switch among each other items that were not observed, there is no effect in the likelihood. [sent-491, score-0.289]

43 3492 A N E XPONENTIAL M ODEL FOR I NFINITE R ANKINGS The idea of the proof is to show that if we switch items i0 and i the lower triangle of any R j will not increase, and for at least one R j it will strictly decrease. [sent-507, score-0.289]

44 By examining the likelihood expression in Equation (8), we can see that for any positive parameters θ1: j we have ln Pθ,σ (SN ) < ln Pθ,σ′ (SN ). [sent-514, score-0.208]

45 By successive switches like the one described here, we can move all observed items before the ˜ unobserved items in a finite number of steps. [sent-515, score-0.619]

46 In other words σML is a permutation of the observed items, followed by the unobserved items in any order. [sent-519, score-0.49]

47 Hence the ordering of the unobserved items is completely non-identifiable (naturally so). [sent-520, score-0.444]

48 But not even the restriction of σ to the observed items is always completely determined. [sent-521, score-0.33]

49 t the ranking of two items c, d, that is, when Rcd = Rdc > 0. [sent-530, score-0.38]

50 Also, since more observations typically increase the counts more for the first items in σ, this type of indeterminacy is also more likely to occur for the later ranks of σML . [sent-533, score-0.412]

51 Thus, in general, there is a finite set of permutations of the observed items which have equal likelihood. [sent-534, score-0.463]

52 We expect that these permutations will agree more on their first ranks and less in the last ranks. [sent-535, score-0.256]

53 The IGM model has t real parameters θ j , j = 1 : t and a discrete and infinite dimensional parameter, the central permutation σ. [sent-538, score-0.234]

54 We give partial results on the consistency of the ML estimators, under the assumption that the true model is an IGM model and that t the length of the observed permutations is fixed or bounded. [sent-539, score-0.234]

55 that the central permutation of the true model is the identity permutation, this proposition implies that for any IGM, with single or multiple parameters, and for any σ, ˆ the statistics Lσ (R j ) are consistent when t is constant. [sent-554, score-0.305]

56 Proposition 6 If the true model is Pid,θ (multiple or single parameter) and t is fixed, then for any ˆ ˆ infinite permutation σ, denote by θ j (σ) (or θ(σ)) the ML estimate of θ j (or θ) given that the estimate of the central permutation is σ. [sent-555, score-0.394]

57 Non-parametric Clustering The above estimation algorithms can be thought of as finding a consensus ordering for the observed data. [sent-572, score-0.232]

58 For instance, the extensions of the K-means and EM algorithms to infinite orderings is immediate, and so are extensions to other distance-based clustering methods. [sent-575, score-0.252]

59 Reduce data set by counting only the distinct permutations to obtain reduced S˜N and counts Ni ≥ 1 for each ordering πi ∈ S˜N . [sent-581, score-0.288]

60 For ranked data, this is not the case: since at each iteration step we round the local estimator into the nearest permutation, the algorithm will stop in a finite number of steps, when no ordering moves from its current position. [sent-605, score-0.223]

61 As soon as two or more orderings become identical, we replace this cluster with a single ordering with a weight proportional to the cluster’s number of members. [sent-607, score-0.374]

62 Critchlow (1985) studied them, and here we adopt for d(π1 , π2 ) what is called the set distance, that is, the distance between the sets of infinite orderings compatible with π1 respectively π2 . [sent-611, score-0.269]

63 2 2 j=1 (15) C1 B1 C2 B2 C3 C1 C2 C3 C4 C4 B1 B2 ˜ ˜ Figure 5: An example of obtaining two partial orderings π1 , π2 compatible respectively with π1 , π2 that achieve the set distance. [sent-615, score-0.217]

64 Empty spaces represent the common items, while B and C symbols mark the items in B, respectively C. [sent-616, score-0.289]

65 We extend π1 and π2 to two ˜ ˜ ˜ ˜ ˜ longer orderings π1 , π2 , so that: (i) π1 , π2 have identical sets of items, (ii) π1 is the closest ordering ˜ to π2 which is compatible with π1 , and (iii) reciprocally, π2 is the closest ordering to π1 which ˜ is compatible with π2 . [sent-619, score-0.565]

66 We obtain π1 by taking all items in π2 but not in π1 and appending them ˜ at the end of π1 while preserving their relative order. [sent-620, score-0.289]

67 Because S1 + 2 is the sum of ranks of π−1 1,2 (1) in σ, we can easily infer that ∗ the first two items in σ must be either (1|2) or (2|1) since any other σ will have S1 = 1. [sent-691, score-0.412]

68 The conjugate prior is given by ν > 0 and a single matrix R0 corresponding to the normalized sufficient statistics matrix R, T νR0 Σ)+tν ln ψ(θ) Pν,R0 (θ, σ) ∝ e−θL(Σ The posterior is , T (νR0 +R)Σ)+t(ν+N) ln ψ(θ)] Pν,R0 (θ, σ | π1:N ) ∝ e−L(θΣ (19) . [sent-700, score-0.242]

69 1 Estimation Experiments, Single θ In these experiments we generated data from an infinite GM model with constant θ j = ln 2, ln 4 and estimated the central permutation and the parameter θ. [sent-710, score-0.408]

70 This is due to the fact that, as either N or t increase, n, the number of items to be ranked, increases. [sent-714, score-0.289]

71 The least frequent items will have less support from the data and will be those misranked. [sent-716, score-0.289]

72 20 Number observed items n 200 500 1000 mean std mean std mean std 9. [sent-824, score-0.45]

73 Bottom: number of observed items n (mean and standard deviation). [sent-889, score-0.33]

74 3501 ˘ M EIL A AND BAO Now we consider the recovery of the central permutation σ. [sent-899, score-0.204]

75 From our previous remarks, we expect to see more ordering errors in the bottom ranks of σML , where the distribution is less concentrated (smaller θ j ) and there is less data available. [sent-900, score-0.309]

76 When t is small, 2, 4 or 8, the total number of items to be ranked (Figure 8) is several times larger than t for our experiments. [sent-902, score-0.357]

77 Figure 10 gives a summary view of number of samples for each rank N j , j = 1 : t, the number of distinct items n and the cumulative number of ranks observed (i. [sent-920, score-0.524]

78 In other words, ranks 1 : r − 1 have distinct parameters, while the following ranks share parameter θr . [sent-930, score-0.246]

79 2 1 2 3 4 5 6 7 8 1 2 3 4 n = 500 theta theta 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 1. [sent-955, score-0.516]

80 For each query and each model size, we computed the rank of the true university home page, that is, the target, under the estimated central permutation σML . [sent-997, score-0.305]

81 The rank is assigned to tmax + 1 = 31 if the target is not among the items returned by the search engines. [sent-1001, score-0.504]

82 3503 ˘ M EIL A AND BAO θ1 = ln 2 θ1 = ln 4 N = 200 N = 200 theta theta 1 0. [sent-1008, score-0.69]

83 Each box plot represents the distribution of the number of items over 50 random samples. [sent-1063, score-0.289]

84 Figure 10: Summaries of the universities data: boxplots of the number of samples per rank, (a), histogram of total items observed T (b), histogram of the number n of distinct items observed (c), over all 147 queries. [sent-1079, score-0.705]

85 Model size Mean rank (good) Median rank (good) Mean rank (all) Median rank (all) 1 5. [sent-1092, score-0.284]

86 We generated sample orderings with 3 clusters of 150 rankings each. [sent-1154, score-0.323]

87 Note that the error rate in Table 3 is computed including the outliers, that is, we compared a true clustering with 53 clusters (3 clusters and 50 singletons) to the clustering obtained when the algorithm converged. [sent-1172, score-0.246]

88 Since the largest cluster contains about three quarters of the data, we run the EBMS clustering again for the applicants within this cluster and find four major subgroups in term of vocational callings: Law, Business, Drama, and English. [sent-1191, score-0.227]

89 For finite number of items n, j ranges in 1 : n − 1 and s j in 0 : n − j + 1. [sent-1230, score-0.289]

90 Note that while the heuristics S ORT R and G REEDY R are simple, they could not have been applied before to the GM model over top-t orderings because it was not known that this model has sufficient statistics for σ. [sent-1249, score-0.239]

91 A greedy algorithm for consensus ordering with partially observed data is introduced in Cohen et al. [sent-1290, score-0.232]

92 They note that it is sufficient to search for the optimal permutation in each strongly connected component of the resulting directed graph, which can sometimes greatly reduce the dimension of the search space. [sent-1298, score-0.236]

93 For instance, in the university web sites ranking experiment, our model assumed that there is a “true” central permutation from which the observations were generated as random perturbations. [sent-1314, score-0.325]

94 But we also have the liberty to assume that the observations are very long orderings which are close to the central permutation only in their highest ranks, and which can diverge arbitrarily far from it in the latter ranks. [sent-1316, score-0.383]

95 One of the interesting con˜ ˜ tributions of this paper is an algorithm for averaging the d(π1 , π2 ) over all (infinite) permutations ˜ 1 , π2 compatible with given partial orderings π1 , π2 . [sent-1321, score-0.35]

96 One important advantage of having a model with multiple parameters, with n finite or infinite, is that each rank can be modeled by a separate parameter θ j (the standard GM/IGM) or one can 3513 ˘ M EIL A AND BAO tie the parameters of different ranks (the way we did in Section 6). [sent-1336, score-0.224]

97 This way, one can use larger θ j values to penalize the errors in the first ranks more, and smaller θ j values for the lower ranks, where presumably more noise in the orderings is permissible. [sent-1337, score-0.302]

98 In many instances the number of items to rank is very large. [sent-1349, score-0.36]

99 An IGM Pσ,θ has complete consensus if for any two items i, i′ with i ≺σ i, we have that P[i ≺π i′ ] > P[i′ ≺π i]. [sent-1377, score-0.325]

100 Moreover, condition (22) entails that for any fixed t, any j = 1 : t, and any items i, i′ with σ(i) < σ(i′ ) Rii′ , j < Ri′ i, j . [sent-1380, score-0.289]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('items', 0.289), ('igm', 0.266), ('theta', 0.258), ('gm', 0.219), ('fligner', 0.192), ('orderings', 0.179), ('eil', 0.174), ('ankings', 0.165), ('ebms', 0.165), ('nfinite', 0.165), ('verducci', 0.165), ('xponential', 0.165), ('bao', 0.164), ('ml', 0.164), ('permutation', 0.16), ('ordering', 0.155), ('meil', 0.134), ('permutations', 0.133), ('sj', 0.127), ('mallows', 0.125), ('item', 0.124), ('ranks', 0.123), ('dn', 0.115), ('sn', 0.109), ('tmax', 0.106), ('odel', 0.098), ('rankings', 0.094), ('dk', 0.094), ('ranking', 0.091), ('ln', 0.087), ('blurring', 0.082), ('igma', 0.082), ('bic', 0.081), ('inversion', 0.079), ('tr', 0.079), ('clustering', 0.073), ('proposition', 0.071), ('rank', 0.071), ('ranked', 0.068), ('gormley', 0.064), ('pid', 0.064), ('tj', 0.06), ('kendall', 0.056), ('lebanon', 0.056), ('lid', 0.055), ('arts', 0.055), ('beta', 0.052), ('distance', 0.052), ('clusters', 0.05), ('reedy', 0.049), ('applicants', 0.047), ('stimate', 0.047), ('heta', 0.046), ('igmat', 0.046), ('universities', 0.045), ('central', 0.044), ('observed', 0.041), ('sp', 0.041), ('list', 0.041), ('cluster', 0.04), ('std', 0.04), ('dublin', 0.039), ('ort', 0.039), ('qii', 0.039), ('search', 0.038), ('compatible', 0.038), ('conjugate', 0.038), ('nite', 0.037), ('jester', 0.037), ('physiotherapy', 0.037), ('rli', 0.037), ('consensus', 0.036), ('mao', 0.036), ('murphy', 0.036), ('pre', 0.035), ('college', 0.035), ('law', 0.035), ('likelihood', 0.034), ('business', 0.033), ('stagewise', 0.031), ('concentrated', 0.031), ('prior', 0.03), ('delete', 0.03), ('engines', 0.03), ('model', 0.03), ('cohen', 0.03), ('courses', 0.028), ('rii', 0.028), ('medicine', 0.028), ('qi', 0.028), ('bess', 0.027), ('critchlow', 0.027), ('ctitious', 0.027), ('females', 0.027), ('trinity', 0.027), ('vocational', 0.027), ('outliers', 0.027), ('exponential', 0.027), ('female', 0.027), ('table', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 10 jmlr-2010-An Exponential Model for Infinite Rankings

Author: Marina Meilă, Le Bao

Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound

2 0.11867193 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

Author: Jitkomut Songsiri, Lieven Vandenberghe

Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization

3 0.098807044 80 jmlr-2010-On-Line Sequential Bin Packing

Author: András György, Gábor Lugosi, György Ottucsàk

Abstract: We consider a sequential version of the classical bin packing problem in which items are received one by one. Before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If the new item does not fit, it is lost. If a bin is closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. We present an algorithm that has a cumulative loss not much larger than any strategy in a finite class of reference strategies for any sequence of items. Special attention is payed to reference strategies that use a fixed threshold at each step to decide whether a new bin is opened. Some positive and negative results are presented for this case. Keywords: bin packing, on-line learning, prediction with expert advice

4 0.090828449 73 jmlr-2010-Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds

Author: Jacek P. Dmochowski, Paul Sajda, Lucas C. Parra

Abstract: The presence of asymmetry in the misclassification costs or class prevalences is a common occurrence in the pattern classification domain. While much interest has been devoted to the study of cost-sensitive learning techniques, the relationship between cost-sensitive learning and the specification of the model set in a parametric estimation framework remains somewhat unclear. To that end, we differentiate between the case of the model including the true posterior, and that in which the model is misspecified. In the former case, it is shown that thresholding the maximum likelihood (ML) estimate is an asymptotically optimal solution to the risk minimization problem. On the other hand, under model misspecification, it is demonstrated that thresholded ML is suboptimal and that the risk-minimizing solution varies with the misclassification cost ratio. Moreover, we analytically show that the negative weighted log likelihood (Elkan, 2001) is a tight, convex upper bound of the empirical loss. Coupled with empirical results on several real-world data sets, we argue that weighted ML is the preferred cost-sensitive technique. Keywords: empirical risk minimization, loss function, cost-sensitive learning, imbalanced data sets

5 0.081946298 55 jmlr-2010-Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

Author: Nguyen Xuan Vinh, Julien Epps, James Bailey

Abstract: Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. In this paper, we perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0, 1] range better than other normalized variants. Keywords: clustering comparison, information theory, adjustment for chance, normalized information distance

6 0.078952335 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

7 0.067150004 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

8 0.061762299 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

9 0.04952563 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

10 0.049217325 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

11 0.04821099 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

12 0.04394282 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

13 0.042903546 83 jmlr-2010-On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

14 0.041525405 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods

15 0.041107953 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

16 0.039966408 81 jmlr-2010-On Finding Predictors for Arbitrary Families of Processes

17 0.038483713 109 jmlr-2010-Stochastic Composite Likelihood

18 0.037167396 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

19 0.035844482 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

20 0.0351489 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.193), (1, 0.011), (2, -0.02), (3, -0.012), (4, -0.068), (5, 0.045), (6, 0.061), (7, 0.083), (8, -0.086), (9, 0.127), (10, -0.077), (11, 0.144), (12, 0.035), (13, -0.206), (14, -0.089), (15, -0.26), (16, 0.082), (17, 0.13), (18, 0.083), (19, -0.058), (20, -0.119), (21, 0.082), (22, 0.064), (23, 0.008), (24, 0.058), (25, 0.099), (26, -0.042), (27, 0.042), (28, 0.088), (29, -0.151), (30, -0.094), (31, -0.092), (32, 0.102), (33, -0.027), (34, -0.066), (35, 0.031), (36, 0.209), (37, 0.104), (38, 0.108), (39, 0.069), (40, -0.083), (41, -0.099), (42, 0.004), (43, -0.036), (44, 0.067), (45, -0.008), (46, 0.142), (47, 0.058), (48, 0.073), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93331814 10 jmlr-2010-An Exponential Model for Infinite Rankings

Author: Marina Meilă, Le Bao

Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound

2 0.51257867 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

Author: Jitkomut Songsiri, Lieven Vandenberghe

Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization

3 0.49477458 80 jmlr-2010-On-Line Sequential Bin Packing

Author: András György, Gábor Lugosi, György Ottucsàk

Abstract: We consider a sequential version of the classical bin packing problem in which items are received one by one. Before the size of the next item is revealed, the decision maker needs to decide whether the next item is packed in the currently open bin or the bin is closed and a new bin is opened. If the new item does not fit, it is lost. If a bin is closed, the remaining free space in the bin accounts for a loss. The goal of the decision maker is to minimize the loss accumulated over n periods. We present an algorithm that has a cumulative loss not much larger than any strategy in a finite class of reference strategies for any sequence of items. Special attention is payed to reference strategies that use a fixed threshold at each step to decide whether a new bin is opened. Some positive and negative results are presented for this case. Keywords: bin packing, on-line learning, prediction with expert advice

4 0.40094054 55 jmlr-2010-Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

Author: Nguyen Xuan Vinh, Julien Epps, James Bailey

Abstract: Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. In this paper, we perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0, 1] range better than other normalized variants. Keywords: clustering comparison, information theory, adjustment for chance, normalized information distance

5 0.39754167 73 jmlr-2010-Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds

Author: Jacek P. Dmochowski, Paul Sajda, Lucas C. Parra

Abstract: The presence of asymmetry in the misclassification costs or class prevalences is a common occurrence in the pattern classification domain. While much interest has been devoted to the study of cost-sensitive learning techniques, the relationship between cost-sensitive learning and the specification of the model set in a parametric estimation framework remains somewhat unclear. To that end, we differentiate between the case of the model including the true posterior, and that in which the model is misspecified. In the former case, it is shown that thresholding the maximum likelihood (ML) estimate is an asymptotically optimal solution to the risk minimization problem. On the other hand, under model misspecification, it is demonstrated that thresholded ML is suboptimal and that the risk-minimizing solution varies with the misclassification cost ratio. Moreover, we analytically show that the negative weighted log likelihood (Elkan, 2001) is a tight, convex upper bound of the empirical loss. Coupled with empirical results on several real-world data sets, we argue that weighted ML is the preferred cost-sensitive technique. Keywords: empirical risk minimization, loss function, cost-sensitive learning, imbalanced data sets

6 0.36855382 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods

7 0.36845466 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

8 0.32418466 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

9 0.2722896 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

10 0.25403601 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

11 0.24005461 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity

12 0.2335161 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure

13 0.2202905 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

14 0.21362479 63 jmlr-2010-Learning Instance-Specific Predictive Models

15 0.19802594 83 jmlr-2010-On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

16 0.19469102 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

17 0.17737409 61 jmlr-2010-Learning From Crowds

18 0.17710103 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

19 0.17036283 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

20 0.17010441 86 jmlr-2010-On the Rate of Convergence of the Bagged Nearest Neighbor Estimate


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.019), (4, 0.013), (8, 0.018), (21, 0.026), (24, 0.012), (32, 0.059), (36, 0.038), (37, 0.048), (75, 0.118), (81, 0.492), (85, 0.051), (96, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77789658 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting

Author: Philippos Mordohai, Gérard Medioni

Abstract: We address instance-based learning from a perceptual organization standpoint and present methods for dimensionality estimation, manifold learning and function approximation. Under our approach, manifolds in high-dimensional spaces are inferred by estimating geometric relationships among the input instances. Unlike conventional manifold learning, we do not perform dimensionality reduction, but instead perform all operations in the original input space. For this purpose we employ a novel formulation of tensor voting, which allows an N-D implementation. Tensor voting is a perceptual organization framework that has mostly been applied to computer vision problems. Analyzing the estimated local structure at the inputs, we are able to obtain reliable dimensionality estimates at each instance, instead of a global estimate for the entire data set. Moreover, these local dimensionality and structure estimates enable us to measure geodesic distances and perform nonlinear interpolation for data sets with varying density, outliers, perturbation and intersections, that cannot be handled by state-of-the-art methods. Quantitative results on the estimation of local manifold structure using ground truth data are presented. In addition, we compare our approach with several leading methods for manifold learning at the task of measuring geodesic distances. Finally, we show competitive function approximation results on real data. Keywords: dimensionality estimation, manifold learning, geodesic distance, function approximation, high-dimensional processing, tensor voting

same-paper 2 0.74859577 10 jmlr-2010-An Exponential Model for Infinite Rankings

Author: Marina Meilă, Le Bao

Abstract: This paper presents a statistical model for expressing preferences through rankings, when the number of alternatives (items to rank) is large. A human ranker will then typically rank only the most preferred items, and may not even examine the whole set of items, or know how many they are. Similarly, a user presented with the ranked output of a search engine, will only consider the highest ranked items. We model such situations by introducing a stagewise ranking model that operates with finite ordered lists called top-t orderings over an infinite space of items. We give algorithms to estimate this model from data, and demonstrate that it has sufficient statistics, being thus an exponential family model with continuous and discrete parameters. We describe its conjugate prior and other statistical properties. Then, we extend the estimation problem to multimodal data by introducing an Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of our model and demonstrate that infinite models over permutations can be simple, elegant and practical. Keywords: permutations, partial orderings, Mallows model, distance based ranking model, exponential family, non-parametric clustering, branch-and-bound

3 0.44132981 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

Author: Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović

Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent “popular” nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families. Keywords: nearest neighbors, curse of dimensionality, classification, semi-supervised learning, clustering

4 0.40990654 55 jmlr-2010-Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

Author: Nguyen Xuan Vinh, Julien Epps, James Bailey

Abstract: Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. In this paper, we perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0, 1] range better than other normalized variants. Keywords: clustering comparison, information theory, adjustment for chance, normalized information distance

5 0.40200827 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization

Author: Jarkko Venna, Jaakko Peltonen, Kristian Nybo, Helena Aidos, Samuel Kaski

Abstract: Nonlinear dimensionality reduction methods are often used to visualize high-dimensional data, although the existing methods have been designed for other related tasks such as manifold learning. It has been difficult to assess the quality of visualizations since the task has not been well-defined. We give a rigorous definition for a specific visualization task, resulting in quantifiable goodness measures and new visualization methods. The task is information retrieval given the visualization: to find similar data based on the similarities shown on the display. The fundamental tradeoff between precision and recall of information retrieval can then be quantified in visualizations as well. The user needs to give the relative cost of missing similar points vs. retrieving dissimilar points, after which the total cost can be measured. We then introduce a new method NeRV (neighbor retrieval visualizer) which produces an optimal visualization by minimizing the cost. We further derive a variant for supervised visualization; class information is taken rigorously into account when computing the similarity relationships. We show empirically that the unsupervised version outperforms existing unsupervised dimensionality reduction methods in the visualization task, and the supervised version outperforms existing supervised methods. Keywords: information retrieval, manifold learning, multidimensional scaling, nonlinear dimensionality reduction, visualization

6 0.37132862 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

7 0.36500877 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

8 0.36122331 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

9 0.3575829 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

10 0.34623003 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

11 0.34370711 109 jmlr-2010-Stochastic Composite Likelihood

12 0.34119666 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices

13 0.33775315 19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods

14 0.33455521 22 jmlr-2010-Classification Using Geometric Level Sets

15 0.33385909 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

16 0.33296311 69 jmlr-2010-Lp-Nested Symmetric Distributions

17 0.33128849 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

18 0.33088246 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

19 0.32952219 61 jmlr-2010-Learning From Crowds

20 0.32778233 23 jmlr-2010-Classification with Incomplete Data Using Dirichlet Process Priors