nips nips2009 nips2009-206 knowledge-graph by maker-knowledge-mining

206 nips-2009-Riffled Independence for Ranked Data

Source: pdf

Author: Jonathan Huang, Carlos Guestrin

Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called rifﬂed independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efﬁcient inference and reducing sample complexity. In rifﬂed independence, one draws two permutations independently, then performs the rifﬂe shufﬂe, common in card games, to combine the two permutations to form a single permutation. In ranking, rifﬂed independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using rifﬂed independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. [sent-3, score-0.181]

2 One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. [sent-4, score-0.203]

3 We identify a novel class of independence structures, called rifﬂed independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efﬁcient inference and reducing sample complexity. [sent-5, score-0.168]

4 In rifﬂed independence, one draws two permutations independently, then performs the rifﬂe shufﬂe, common in card games, to combine the two permutations to form a single permutation. [sent-6, score-0.166]

5 In ranking, rifﬂed independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. [sent-7, score-0.222]

6 We provide a formal introduction and present algorithms for using rifﬂed independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. [sent-8, score-0.142]

7 1 Introduction Distributions over permutations play an important role in applications such as multi-object tracking, visual feature matching, and ranking. [sent-9, score-0.07]

8 In tracking, for example, permutations represent joint assignments of individual identities to track positions, and in ranking, permutations represent the preference orderings of a list of items. [sent-10, score-0.152]

9 Representing distributions over permutations is a notoriously difﬁcult problem since there are n! [sent-11, score-0.096]

10 The quest for exploitable problem structure has led researchers to consider a number of possibilities including distribution sparsity [17, 9], exponential family parameterizations [15, 5, 14, 16], algebraic/Fourier structure [13, 12, 6, 7], and probabilistic independence [8]. [sent-13, score-0.177]

11 While sparse distributions have been successfully applied in certain tracking domains, we argue that they are less suitable in ranking problems where it might be necessary to model indifference over a number of objects. [sent-14, score-0.151]

12 In contrast, Fourier methods handle smooth distributions well but are not easily scalable without making aggressive independence assumptions [8]. [sent-15, score-0.168]

13 In this paper, we argue that while probabilistic independence might be useful in tracking, it is a poor assumption in ranking. [sent-16, score-0.179]

14 We propose a novel generalization of independence, called rifﬂed independence, which we argue to be far more suitable for modeling distributions over rankings, and develop algorithms for working with rifﬂed independence in the Fourier domain. [sent-17, score-0.18]

15 • We introduce an intuitive generalization of independence on permutations, which we call rifﬂed independence, and show it to be a more appropriate notion of independence for ranked data, offering possibilities for efﬁcient inference and reduced sample complexity. [sent-19, score-0.33]

16 • We introduce a novel family of distributions, called biased rifﬂe shufﬂes, that are useful for rifﬂed independence and propose an algorithm for computing its Fourier transform. [sent-20, score-0.191]

17 By rearranging rows, one sees that independence imposes block-diagonal structure on the matrices. [sent-24, score-0.142]

18 2 Distributions on permutations and independence relations In the context of ranking, a permutation σ = [σ1 , . [sent-25, score-0.265]

19 If we are ranking a list of fruits/vegetables enumerated as (1) Artichoke, (2) Broccoli, (3) Cherry, and (4) Dates, then the permutation σ = [σA σB σC σD ] = [2 3 1 4] ranks Cherry ﬁrst, Artichoke second, Broccoli third, Dates last. [sent-29, score-0.108]

20 There have been a variety of methods proposed for summarizing distributions over permutations ranging from older ad-hoc methods such as maintaining k-best hypotheses [17] to the more recent Fourier-based methods which maintain a set of low-order summary statistics [18, 2, 11, 7]. [sent-44, score-0.096]

21 , for tracking problems, since confusion in data association only occurs over small independent subgroups of objects in many problems. [sent-55, score-0.073]

22 Probabilistic independence assumptions on the symmetric group can simply be stated as follows. [sent-57, score-0.142]

23 Typically, one decomposes the distribution recursively and stores factors exactly for small enough factors, or compresses factors using Fourier coefﬁcients (but using higher frequency terms than what would be possible without the independence assumption). [sent-89, score-0.22]

24 In order to exploit probabilistic independence in the Fourier domain, Huang et al. [sent-90, score-0.153]

25 [8] proposed algorithms for joining factors and splitting distributions into independent components in the Fourier domain. [sent-91, score-0.071]

26 Despite its utility for many tracking problems, however, we argue that the independence assumption on permutations implies a rather restrictive constraint on distributions, rendering independence highly unrealistic in ranking applications. [sent-93, score-0.443]

27 , n}, σX is a permutation of elements in Y and σX is a permutation ¯ ¯ of its complement, Y , with probability 1. [sent-97, score-0.078]

28 In our ranking example however, the ﬁrst-order condition forces the probability of any vegetable being in third place to be zero, even though both vegetables will, in general, have nonzero marginal probability of being in second place, which seems quite unrealistic. [sent-103, score-0.158]

29 3 Going beyond full independence: Rifﬂed independence The rifﬂe (or dovetail) shufﬂe [1] is perhaps the most popular method of card shufﬂing, in ¯ which one cuts a deck of n cards into two piles, X = {1, . [sent-105, score-0.257]

30 Rankings that are rifﬂe independent are formed by independently selecting rankings for two disjoint subsets of objects, then interleaving the rankings using a rifﬂe shufﬂe to form a ranking over all objects. [sent-114, score-0.193]

31 For example, we might ﬁrst ‘cut the deck’ into two piles, vegetables (X) and fruits ¯ (X), independently decide that Broccoli is preferred over Artichoke (σB < σA ) and that Dates is preferred over Cherry (σD < σC ), then interleave the fruit and vegetable rankings to form σB < σD < σA < σC (i. [sent-115, score-0.305]

32 Intuitively, rifﬂed independence models complex ¯ relationships within each of set X and X while allowing correlations Figure 2: Rifﬂe shufﬂing a between sets to be modeled only by a constrained form of shufﬂing. [sent-118, score-0.142]

33 The ranking σ after a shufﬂe is generated from the ranking prior to that shufﬂe, σ, by drawing a permutation, τ from a shufﬂing distribution m(τ ), and setting σ = τ σ. [sent-122, score-0.126]

34 To answer this question, we use the distinguishing property of the rifﬂe shufﬂe, that, after cutting the deck into two piles of size p and q = n − p, the relative ranking relations within each pile are preserved. [sent-126, score-0.201]

35 Thus, if the ith card lies above the j th in one of the piles, then after shufﬂing, the ith card remains above the j th . [sent-127, score-0.08]

36 Any allowable rifﬂe shufﬂing distribution must therefore assign zero probability to permutations which do not preserve relative ranking relations. [sent-129, score-0.131]

37 The set of permutations which do preserve these relations have a simple description. [sent-130, score-0.084]

38 The (p, q)-interleavings can be shown to preserve the relative ranking relations within each ¯ of the subsets X = {1, . [sent-143, score-0.064]

39 One possible rifﬂe shufﬂing distribution on S4 p q might, for example, assign uniform probability (munif (σ) = 1/6) to each permutation in Ω2,2 2,2 and zero probability to everything else, reﬂecting indifference between vegetables and fruits. [sent-155, score-0.155]

40 We now formally deﬁne our generalization of independence where a distribution which fully factors independently is allowed to undergo a single rifﬂe shufﬂe. [sent-156, score-0.19]

41 We denote rifﬂed independence by: h = f ⊥mp,q g, and refer to f, g as rifﬂed factors. [sent-165, score-0.142]

42 To draw from h, one independently draws a permutation σp , of cards {1, . [sent-166, score-0.105]

43 In our example, the rankings σp = [2 1] (Broccoli preferred to Artichoke) and σq = [4 3] (Cherry preferred to Dates) are selected, then shufﬂed (multiplied by τ{1,3} = [1 3 2 4]) to obtain σ = [3 1 4 2]. [sent-173, score-0.096]

44 We remark that setting mp,q to be the delta distribution on any of the (p, q)-interleavings in Ωp,q recovers the deﬁnition of ordinary probabilistic independence, and thus rifﬂed independence is a strict generalization thereof. [sent-174, score-0.18]

45 Just as in the full independence regime, where the distributions f and ¯ g are marginal distributions of rankings of X and X, in the rifﬂed independence regime, they can ¯ be thought of as marginal distributions of the relative rankings of X and X. [sent-175, score-0.468]

46 There is, in the general case, a signiﬁcant increase in storage required for rifﬂed independence over full independence. [sent-177, score-0.166]

47 ) storage required for distributions f and g, we now require O( n ) storage for the nonzero terms of the rifﬂe shufﬂing p distribution mp,q . [sent-180, score-0.065]

48 We might, for example, have complex preference relations amongst vegetables and amongst fruits, but be completely indifferent with respect to the subsets, vegetables and fruits, as a whole. [sent-184, score-0.192]

49 Starting with a deck of n cards cut into a left pile ({1, . [sent-186, score-0.136]

50 , n}), pick one of the piles with probability proportional to its size (p/n for the left pile, q/n for the right) and drop the bottommost card, thus mapping either card p or card n to rank n. [sent-192, score-0.119]

51 Then recurse on the n − 1 remaining undropped cards, drawing a (p − 1, q)-interleaving if the right pile was picked, or a (p, q − 1)-interleaving if the left pile was picked. [sent-193, score-0.129]

52 We model this bias using a simple one-parameter family of distributions in which cards from the left and right piles drop with probability proportional to αp and (1 − α)q, respectively, instead of p and q. [sent-198, score-0.136]

53 We will refer to α as the bias parameter, and the family of distributions parameterized by α as the biased rifﬂe shufﬂes. [sent-199, score-0.086]

54 The bias parameter α can be thought of as a knob controlling the preference for one subset over the other, and might reﬂect, for example, a preference for fruits over vegetables, or perhaps indifference between the two subsets. [sent-201, score-0.135]

55 Setting α = 0 or 1 recovers the full independence assumption, preferring objects in X (vegetables) ¯ over objects in X (fruits) with probability one (or vice-versa), and setting α = . [sent-202, score-0.198]

56 For example, α might depend on the number of cards that have been dropped from each pile (allowing perhaps, for distributions to prefer crunchy fruits over crunchy vegetables, but soft vegetables over soft fruits). [sent-206, score-0.313]

57 We are the ﬁrst to (1) use the recurrence to Fourier transform mp,q , and to (2) consider biased versions. [sent-209, score-0.097]

58 The biased rifﬂe shufﬂes in [4] are not similar to our biased rifﬂe shufﬂes. [sent-210, score-0.098]

59 Setting α = 0 or 1 recovers full independence, where a subset of objects is preferred over the other with probability one. [sent-224, score-0.065]

60 4 Between independence and conditional independence We have presented rifﬂe independent distributions as fully independent distributions which have been convolved by a certain class of shufﬂing distributions. [sent-225, score-0.376]

61 In this section, we provide an alternative view of rifﬂed independence based on conditional independence, showing that the notion of rifﬂed independence lies somewhere between full and conditional independence. [sent-226, score-0.294]

62 In Section 3, we formed a ranking by ﬁrst independently drawing permutations πp and πq , of object sets {1, . [sent-227, score-0.147]

63 An equivalent way to form the same σ, however, is to ﬁrst draw an interleaving τY ∈ Ωp,q , then, conditioned on the choice of Y , draw independent permutations of ¯ the sets Y and Y . [sent-243, score-0.127]

64 In our example, we might ﬁrst draw the (2,2)-interleaving [1 3 2 4] (so that after shufﬂing, we would obtain σV eg < σF ruit < σV eg < σF ruit ). [sent-244, score-0.063]

65 Then we would draw a permutation ¯ of the vegetable ranks (Y = {1, 3}), say, [3 1], and a permutation of the fruit ranks (Y = {2, 4}), [4 2], to obtain a ﬁnal ranking over all items: σ = [3 1 4 2], or σB < σD < σA < σC . [sent-245, score-0.213]

66 It is tempting to think that rifﬂed independence is exactly the conditional independence assumption, in which case the distribution would factor as h(σ) = h(Y ) · h(σX |Y ) · h(σX |Y ). [sent-246, score-0.295]

67 + 1)) parameters, while rifﬂed p independence requires only O( n + p! [sent-249, score-0.142]

68 p We now provide a simple correspondence between the conditional independence view of rifﬂed independence presented in this section to the shufﬂe theoretic deﬁnition from Section 3 (Def. [sent-252, score-0.296]

69 ¯ Deﬁne the map φ, which, given a permutation of Y (or Y ), returns the permutation in σp ∈ Sp (or Sq ) such that [σp ]i is the rank of [σX ]i relative to the set Y . [sent-254, score-0.089]

70 For example, if the permutation of the vegetable ranks is σX = [3 1] (with Artichoke ranked third, Broccoli ﬁrst), then φ(σX ) = [2 1] since, relative to the set of vegetables, Artichoke is ranked second, and Broccoli ﬁrst. [sent-255, score-0.149]

71 ¯ ¯ ¯ Proposition 3 is useful because it shows that the probability of a single ranking can be computed without summing over the entire symmetric group (a convolution)— a fact that might not be obvious from Deﬁnition 2. [sent-259, score-0.064]

72 The factorization h(σ) = m(τY )f (φ(σX ))g(φ(σX )) also suggests that ¯ rifﬂed independence behaves essentially like full independence (without the ﬁrst-order condition), where, in addition to the independent variables σX and σX , we also independently randomize ¯ over the subset Y . [sent-260, score-0.326]

73 An immediate consequence is thatjust as in the full independence regime, conditioning operations on certain observations and MAP (maximum a posteriori) assignment problems decompose according to rifﬂed independence structure. [sent-261, score-0.294]

74 Consider rifﬂe independent prior and likelihood functions, hprior and hlike , on Sn which factor as: hprior = fprior ⊥mprior gprior and hlike = flike ⊥mlike glike , respectively. [sent-263, score-0.124]

75 The posterior distribution under Bayes rule can be written as the rifﬂe independent distribution: hpost ∝ (fprior flike ) ⊥mprior mlike (gprior glike ),where the symbol denotes the pointwise product operation. [sent-264, score-0.07]

76 As a corollary, it follows that conditioning on simple pairwise ranking likelihood functions (that depend only on whether object i is preferred to object j) decomposes along rifﬂed independence structures. [sent-266, score-0.216]

77 We begin with a brief introduction to Fourier theoretic inference on permutations (see [11, 7] for a detailed exposition). [sent-268, score-0.082]

78 • (Convolution) The Fourier transform of a convolution is a product of Fourier transforms: [f ∗ g]i = fi · gi , for each frequency level i, where the operation · is matrix multiplication. [sent-281, score-0.066]

79 σ∈Sn A number of papers in recent years ([13, 6, 8, 7]) have considered approximating distributions over permutations using a truncated (bandlimited) set of Fourier coefﬁcients and have proposed inference algorithms that operate on these Fourier coefﬁcient matrices. [sent-283, score-0.096]

80 [8] introduced Fourier domain algorithms, Join and Split, for combining independent factors to form joints and for extracting the factors from a joint distribution, respectively. [sent-286, score-0.07]

81 However, for the class of biased rifﬂe shufﬂes from Section 3, one can efﬁciently compute the low-frequency terms of mα by employing p,q the recurrence relation in Alg. [sent-303, score-0.074]

82 1 expresses a biased rifﬂe shufﬂe on Sn as a linear combination of biased rifﬂe shufﬂes on Sn−1 . [sent-306, score-0.098]

83 Instead, our RifﬂeSplit algorithm left-multiplies each hi term by munif i , which p,q can be shown to be equivalent to convolving the distribution h by the ‘dual shufﬂe’, m∗ , deﬁned as m∗ (σ) = munif (σ −1 ). [sent-315, score-0.111]

84 3) (with h ˆ p,q As with RifﬂeJoin, it is necessary Fourier transform munif , which we can again accomplish via the p,q recurrence in Alg. [sent-319, score-0.092]

85 Given enough Fourier terms to reconstruct the k th -order marginals of f and g, RifﬂeJoin returns enough Fourier terms to exactly reconstruct the k th -order marginals of h. [sent-326, score-0.132]

86 Likewise, given enough Fourier terms to reconstruct the k th -order marginals of h, RifﬂeSplit returns enough Fourier terms to exactly reconstruct the k th -order marginals of both f and g. [sent-327, score-0.132]

87 6 Experiments In this section, we validate our algorithms and show that rifﬂed independence exists in real data. [sent-331, score-0.142]

88 We ﬁrst perform an exhaustive search for subsets X and X that are closest to rifﬂe independent (with respect to DKL ), and ﬁnd that candidate 2 is nearly rifﬂe independent of the remaining candidates. [sent-334, score-0.064]

89 4(a) we plot the true vote distribution and the best approximation by a distribution in which candidate 2 is rifﬂe independent of the rest. [sent-336, score-0.066]

90 We are now able to verify his conjecture in a rifﬂed independence sense. [sent-341, score-0.142]

91 We ﬁnd that X = {1, 3} and ¯ X = {4, 5} are very nearly rifﬂe independent and thus are able to verify that candidate sets {2}, {1, 3}, {4, 5} are indeed grouped in a rifﬂe independent sense in the APA data. [sent-343, score-0.064]

92 The sushi dataset [10] consists of 5000 full rankings of ten types of sushi. [sent-350, score-0.083]

93 4(b) plots testset log-likelihood as a function of training set size — we see that rifﬂe independence assumptions can help signiﬁcantly to lower the sample complexity of learning. [sent-354, score-0.142]

94 4(c) which shows the ﬁrst-order marginals of Uni (Sea Urchin) rankings, and the biased rifﬂe approximation. [sent-369, score-0.085]

95 To understand the behavior of RifﬂeSplit in approximately rifﬂe independent situations, we draw sample sets of varying sizes from a rifﬂe independent distribution on S8 (with bias parameter α = . [sent-371, score-0.073]

96 For example, several papers note that graphical models cannot compactly represent distributions over permutations due to mutual exclusivity. [sent-386, score-0.096]

97 An interesting question which our paper opens, is whether it is possible to use something similar to graphical models by substituting conditional generalizations of rifﬂed independence for ordinary conditional independence. [sent-387, score-0.154]

98 Other possibilities include going beyond the algebraic approach and studying rifﬂed independence in non-Fourier frameworks and developing statistical (rifﬂed) independence tests. [sent-388, score-0.297]

99 In summary, we have introduced rifﬂed independence and discussed how to exploit such structure in a Fourier-theoretic framework. [sent-389, score-0.142]

100 Rifﬂed independence is a new tool for analyzing ranked data and has the potential to offer novel insights into datasets both new and old. [sent-390, score-0.175]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rif', 0.883), ('shuf', 0.286), ('independence', 0.142), ('fourier', 0.138), ('esplit', 0.089), ('vegetables', 0.083), ('ing', 0.076), ('ejoin', 0.076), ('permutations', 0.07), ('fruits', 0.064), ('pile', 0.057), ('sn', 0.054), ('broccoli', 0.051), ('ranking', 0.05), ('biased', 0.049), ('rankings', 0.048), ('apa', 0.044), ('munif', 0.044), ('piles', 0.044), ('cards', 0.043), ('permutation', 0.039), ('artichoke', 0.039), ('iffle', 0.038), ('marginals', 0.036), ('deck', 0.036), ('coef', 0.035), ('ranked', 0.033), ('tracking', 0.027), ('distributions', 0.026), ('convolution', 0.026), ('card', 0.026), ('recurrence', 0.025), ('cherry', 0.025), ('sushi', 0.025), ('vegetable', 0.025), ('factors', 0.025), ('cients', 0.025), ('preferred', 0.024), ('candidate', 0.024), ('transform', 0.023), ('indifference', 0.022), ('huang', 0.022), ('dkl', 0.021), ('independent', 0.02), ('ranks', 0.019), ('foreach', 0.019), ('plit', 0.019), ('riffle', 0.019), ('ruit', 0.019), ('urchin', 0.019), ('split', 0.017), ('frequency', 0.017), ('nif', 0.017), ('reconstruct', 0.016), ('recovers', 0.016), ('dates', 0.016), ('objects', 0.015), ('interleaving', 0.015), ('exclusivity', 0.015), ('drawing', 0.015), ('guestrin', 0.014), ('relations', 0.014), ('favorite', 0.014), ('join', 0.014), ('might', 0.014), ('th', 0.014), ('storage', 0.014), ('possibilities', 0.013), ('crunchy', 0.013), ('dovetail', 0.013), ('flike', 0.013), ('fprior', 0.013), ('glike', 0.013), ('gprior', 0.013), ('hlike', 0.013), ('hprior', 0.013), ('mlike', 0.013), ('mprior', 0.013), ('oin', 0.013), ('preference', 0.012), ('hi', 0.012), ('ed', 0.012), ('independently', 0.012), ('theoretic', 0.012), ('argue', 0.012), ('generalizations', 0.012), ('drop', 0.012), ('distribution', 0.011), ('rank', 0.011), ('confusion', 0.011), ('diaconis', 0.011), ('fruit', 0.011), ('draw', 0.011), ('bias', 0.011), ('probabilistic', 0.011), ('full', 0.01), ('candidates', 0.01), ('bb', 0.01), ('team', 0.01), ('sth', 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

2 0.056160159 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

Author: Arno Onken, Steffen Grünewälder, Klaus Obermayer

Abstract: The linear correlation coefﬁcient is typically used to characterize and analyze dependencies of neural spike counts. Here, we show that the correlation coefﬁcient is in general insufﬁcient to characterize these dependencies. We construct two neuron spike count models with Poisson-like marginals and vary their dependence structure using copulas. To this end, we construct a copula that allows to keep the spike counts uncorrelated while varying their dependence strength. Moreover, we employ a network of leaky integrate-and-ﬁre neurons to investigate whether weakly correlated spike counts with strong dependencies are likely to occur in real networks. We ﬁnd that the entropy of uncorrelated but dependent spike count distributions can deviate from the corresponding distribution with independent components by more than 25 % and that weakly correlated but strongly dependent spike counts are very likely to occur in biological networks. Finally, we introduce a test for deciding whether the dependence structure of distributions with Poissonlike marginals is well characterized by the linear correlation coefﬁcient and verify it for different copula-based models. 1

3 0.043118425 78 nips-2009-Efficient Moments-based Permutation Tests

Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang

Abstract: In this paper, we develop an efficient moments-based permutation test approach to improve the test s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency. 1 In t ro d u c t i o n Permutation tests are flexible nonparametric alternatives to parametric tests in small samples, or when the distribution of a test statistic is unknown or mathematically intractable. In permutation tests, except exchangeability, no other statistical assumptions are required. The p-values can be obtained by using the permutation distribution. Permutation tests are appealing in many biomedical studies, which often have limited observations with unknown distribution. They have been used successfully in structural MR image analysis [1, 2, 3], in functional MR image analysis [4], and in 3D face analysis [5]. There are three common approaches to construct the permutation distribution [6, 7, 8]: (1) exact permutation enumerating all possible arrangements; (2) approximate permutation based on random sampling from all possible permutations; (3) approximate permutation using the analytical moments of the exact permutation distribution under the null hypothesis. The main disadvantage of the exact permutation is the computational cost, due to the factorial increase in the number of permutations with the increasing number of subjects. The second technique often gives inflated type I errors caused by random sampling. When a large number of repeated tests are needed, the random permutation strategy is also computationally expensive to achieve satisfactory accuracy. Regarding the third approach, the exact permutation distribution may not have moments or moments with tractability. In most applications, it is not the existence but the derivation of moments that limits the third approach. To the best of our knowledge, there is no systematic and efficient way to derive the moments of the permutation distribution. Recently, Zhou [3] proposed a solution by converting the permutation of data to that of the statistic coefficients that are symmetric to the permutation. Since the test statistic coefficients usually have simple presentations, it is easier to track the permutation of the test statistic coefficients than that of data. However, this method requires the derivation of the permutation for each specific test statistic, which is not accessible to practical users. In this paper, we propose a novel strategy by employing a general theoretical method to derive the moments of the permutation distribution of any weighted v-statistics, for both univariate and multivariate data. We note that any moments of the permutation distribution for weighted v-statistics [9] can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Our key idea is to divide the whole index set into several permutation equivalent (see Definition 2) index subsets such that the summation of the data/index function term over all permutations is invariant within each subset and can be calculated without conducting any permutation. Then we can obtain the moments by summing up several subtotals. The proposed method can be extended to equivalent weighted v-statistics by replacing them with monotonic weighted v-statistics. This is due to the fact that only the order of test statistics of all permutations matters for obtaining the p-values, so that the monotonic weighted v-statistics shares the same p-value with the original test statistic. Given the first four moments, the permutation distribution can be well fitted by Pearson distribution series. The p-values are then obtained without conducting any real permutation. For multiple comparison of two-group difference, given the sample size n1 = 21 and n2 = 21, the number of tests m = 2,000, we need to conduct m×(n1 + n2 )!/n1 !/n2 ! 1.1×1015 permutations for the exact permutation test. Even for 20,000 random permutations per test, we still need m×20,000 4×107 permutations. Alternatively, our moments-based permutation method using Pearson distribution approximation only involves the calculation of the first four analytically-derived moments of exact permutation distributions to achieve high accuracy (see section 3). Instead of calculating test statistics in factorial scale with exact permutation, our moments-based permutation only requires computation of polynomial order. For example, the computational cost for univariate mean difference test statistic and modified multivariate Hotelling's T2 test statistics [8] are O(n) and O(n3 ), respectively, where n = n 1 + n2 . 2 M e t h o d o lo g y In this section, we shall mainly discuss how to calculate the moments of the permutation distribution for weighted v-statistics. For other test statistics, a possible solution is to replace them with their equivalent weighted v-statistics by monotonic transforms. The detailed discussion about equivalent test statistics can be found in [7, 8, 10]. 2.1 C o m p ut a t i o n a l c h a l l e n g e Let us first look at a toy example. Suppose we have a two-group univariate data x = ( x1 , L , xn1 , xn1 +1 ,L , xn1 + n2 ) , where the first n1 elements are in group A and the rest, n2 ,are in group B. For comparison of the two groups, the hypothesis is typically constructed as: H 0 : m A = mB vs. H a : m A ¹ m B , where m A , m B are the population means of the groups A n1 n i =1 i = n1 +1 and B, respectively. Define x A = å xi / n1 and xB = å xi / n2 as the sample means of two groups, where n=n1+n2. We choose the univariate group mean difference statistic, i.e., n T ( x) = x A - xB = å w(i ) xi , i =1 where the index function as the test w(i ) = 1/ n1 , if i Î {1, L , n1} and w(i ) = -1/ n2 , if i Î {n1 + 1, L, n} . Then the total number of all possible permutations of {1, L, n} is n!. To calculate the fourth moment of the permutation distribution, n n n n n 1 1 4 å ( å w(i ) xp (i ) ) = å å å å å w(i1 ) w(i2 ) w(i3 ) w(i4 )xp ( i1 ) xp ( i2 ) xp ( i3 ) xp ( i4 ) , n ! p ÎS n i =1 n ! p ÎS n i1 =1 i2 =1 i3 =1 i4 =1 where is the permutation operator and the symmetric group Sn [11] includes all distinct permutations. The above example shows that the moment calculation can be considered as a summation over all possible permutations and a large index set. It is noticeable that the computational challenge here is to go through the factorial level permutations and polynomial level indices. Ep (T 4 ( x ))= 2.2 P a r t i t i o n t h e i n de x s e t In this paper, we assume that the test statistic T can be expressed as a weighted v-statistic of n n i1 =1 id =1 degree d [9], that is, T ( x) = å L å w(i1 , L , id ) h( xi1 ,L , xid ) , where x = ( x1 , x 2 , L , x n ) T is a data with n observations, and w is a symmetric index function. h is a symmetric data function, i.e., invariant under permutation of (i1 ,L , id ) . Though the symmetry property is not required for our method, it helps reduce the computational cost. Here, each observation xk can be either univariate or multivariate. In the above toy example, d=1 and h is the identity function. Therefore, the r-th moment of the test statistic from the permutated data is: Ep (T r ( x)) = Ep ( å w(i1 ,L , id )h( xp (i1 ) ,L , xp ( id ) ))r i1 ,i2 ,L,id = Ep [ å i1(1) ,L, id (1) , r r k =1 k =1 { Õ w(i1(k ) ,L, id ( k ) ) Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) )}] . L i1( r ) ,L,id ( r ) d 1 Then we can exchange the summation order of permutations and that of indices, Ep (T r ( x)) = å i1(1) ,L, id (1) , L i1( r ) ,L,id ( r ) r r k =1 k =1 {( Õ w(i1( k ) ,L , id ( k ) )) Ep ( Õ h( xp (i ( k ) ) ,L, xp (i ( k ) ) ))}. d 1 Thus any moment of permutation distribution can be considered as a summation of the product of data function term and index function term over a high dimensional index set and all possible permutations. Since all possible permutations map any index value between 1 and n to all possible index r values from 1 to n with equal probability, Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) , the summation of k =1 1 d data function over all permutations is only related to the equal/unequal relationship among indices. It is natural to divide the whole index set U = {i1 ,L , id }r = {(i1(1) , L , id (1) ), L (r ) , ( i1 , L , id (r ) r )} into the union of disjoint index subsets, in which Ep ( Õ h( xp (i ( k ) ) ,L , xp (i ( k ) ) )) k =1 1 d is invariant. Definition 1. Since h is a symmetric function, two index elements (i1 ,L , id ) and ( j1 ,L , jd ) are said to be equivalent if they are the same up to the order. For example, for d = 3, (1, 4, 5) = (1,5,4) = (4,1,5) = (4,5,1) = (5,1,4) = (5,4,1). Definition 2. Two indices {(i1(1) , L , id (1) ), L , (i1( r ) , L , id ( r ) )} and {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} are said to be permutation equivalent/ if there exists a permutation p Î Sn such that {(p (i1(1) ), L , p (id (1) )), L , (p (i1( r ) ), L , p (id ( r ) ))} = {( j1(1) , L , jd (1) ), L , ( j1( r ) , L , jd ( r ) )} . Here

4 0.039862346 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classiﬁcation tasks, and deﬁne a so-called essential loss for ranking as the weighted sum of the classiﬁcation errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modiﬁcations can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

5 0.036644895 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we deﬁne a top-k ranking framework, where the true loss (and thus the risks) are deﬁned on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufﬁcient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modiﬁcations, the methods can work signiﬁcantly better than their original versions. 1

6 0.026701314 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

7 0.026665768 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

8 0.025530649 7 nips-2009-A Data-Driven Approach to Modeling Choice

9 0.023733007 87 nips-2009-Exponential Family Graph Matching and Ranking

10 0.021891547 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

11 0.020765223 53 nips-2009-Complexity of Decentralized Control: Special Cases

12 0.020359566 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

13 0.019856272 190 nips-2009-Polynomial Semantic Indexing

14 0.019204322 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

15 0.018494485 248 nips-2009-Toward Provably Correct Feature Selection in Arbitrary Domains

16 0.01775826 47 nips-2009-Boosting with Spatial Regularization

17 0.017752709 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

18 0.016543582 67 nips-2009-Directed Regression

19 0.016389297 3 nips-2009-AUC optimization and the two-sample problem

20 0.016166447 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.055), (1, -0.0), (2, -0.001), (3, -0.009), (4, -0.003), (5, -0.02), (6, -0.04), (7, -0.037), (8, 0.02), (9, -0.048), (10, 0.012), (11, -0.009), (12, -0.015), (13, -0.015), (14, 0.005), (15, 0.009), (16, -0.013), (17, 0.028), (18, 0.008), (19, 0.008), (20, -0.042), (21, 0.016), (22, -0.035), (23, 0.023), (24, -0.04), (25, 0.029), (26, -0.027), (27, -0.02), (28, -0.038), (29, -0.015), (30, -0.027), (31, -0.04), (32, -0.052), (33, -0.017), (34, 0.029), (35, 0.008), (36, -0.045), (37, 0.023), (38, -0.066), (39, -0.012), (40, 0.029), (41, -0.027), (42, 0.028), (43, -0.003), (44, -0.018), (45, -0.028), (46, 0.078), (47, -0.046), (48, 0.001), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89724457 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

2 0.63754898 78 nips-2009-Efficient Moments-based Permutation Tests

Author: Chunxiao Zhou, Huixia J. Wang, Yongmei M. Wang

3 0.54336053 7 nips-2009-A Data-Driven Approach to Modeling Choice

Author: Vivek Farias, Srikanth Jagabathula, Devavrat Shah

Abstract: We visit the following fundamental problem: For a ‘generic’ model of consumer choice (namely, distributions over preference lists) and a limited amount of data on how consumers actually make decisions (such as marginal preference information), how may one predict revenues from oﬀering a particular assortment of choices? This problem is central to areas within operations research, marketing and econometrics. We present a framework to answer such questions and design a number of tractable algorithms (from a data and computational standpoint) for the same. 1

4 0.52410352 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

5 0.47744063 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

6 0.44927832 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

7 0.38921249 11 nips-2009-A General Projection Property for Distribution Families

8 0.38080674 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

9 0.37684694 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test

10 0.36724317 138 nips-2009-Learning with Compressible Priors

11 0.3527211 248 nips-2009-Toward Provably Correct Feature Selection in Arbitrary Domains

12 0.34799641 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

13 0.34050742 175 nips-2009-Occlusive Components Analysis

14 0.33464345 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison

15 0.3315616 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

16 0.32581934 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

17 0.32431442 87 nips-2009-Exponential Family Graph Matching and Ranking

18 0.31879911 3 nips-2009-AUC optimization and the two-sample problem

19 0.30922773 170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models

20 0.30259329 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.035), (25, 0.045), (35, 0.039), (36, 0.094), (39, 0.038), (58, 0.06), (61, 0.015), (71, 0.081), (81, 0.015), (86, 0.049), (91, 0.017), (98, 0.356)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75287712 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

2 0.62693983 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efﬁciently via variational Bayesian analysis, with example results presented on two image databases.

3 0.60132229 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efﬁciently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to ﬁnd designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

4 0.53251231 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefﬁcient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which ﬁrst identiﬁes jump points of structural changes and then identiﬁes relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

5 0.41542572 260 nips-2009-Zero-shot Learning with Semantic Output Codes

Author: Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell

Abstract: We consider the problem of zero-shot learning, where the goal is to learn a classiﬁer f : X → Y that must predict novel values of Y that were omitted from the training set. To achieve this, we deﬁne the notion of a semantic output code classiﬁer (SOC) which utilizes a knowledge base of semantic properties of Y to extrapolate to novel classes. We provide a formalism for this type of classiﬁer and study its theoretical properties in a PAC framework, showing conditions under which the classiﬁer can accurately predict novel classes. As a case study, we build a SOC classiﬁer for a neural decoding task and show that it can often predict words that people are thinking about from functional magnetic resonance images (fMRI) of their neural activity, even without training examples for those words. 1

6 0.41467521 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

7 0.41108739 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

8 0.41060063 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

9 0.41046196 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

10 0.41028622 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

11 0.40979362 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

12 0.40944576 97 nips-2009-Free energy score space

13 0.4087567 129 nips-2009-Learning a Small Mixture of Trees

14 0.40834525 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

15 0.40781885 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

16 0.40772477 226 nips-2009-Spatial Normalized Gamma Processes

17 0.40768334 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

18 0.40582451 56 nips-2009-Conditional Neural Fields

19 0.4049266 204 nips-2009-Replicated Softmax: an Undirected Topic Model

20 0.40473726 192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models