jmlr jmlr2009 jmlr2009-36 knowledge-graph by maker-knowledge-mining

36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations


Source: pdf

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario. Keywords: identity management, permutations, approximate inference, group theoretical methods, sensor networks

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This task of maintaining a belief state for the correct association between object tracks and object identities while accounting for local mixing events and sensor observations, was introduced in Shin et al. [sent-29, score-0.356]

2 The identity management problem poses a challenge for probabilistic inference because it needs to address the fundamental combinatorial challenge that there is a factorial number of associations to maintain between tracks and identities. [sent-31, score-0.31]

3 In this example, an internal tracker declares that two tracks have ‘mixed’ whenever they get too close to each other and announces the identity of any track that enters or exits the room. [sent-69, score-0.368]

4 In this example, track 1 crosses with track 2, then with track 3, then leaves the room, at which point it is observed that the identity at Track 1 is in fact Bob. [sent-74, score-0.451]

5 In identity management, a permutation σ represents a joint assignment of identities to internal tracks, with σ(i) being the track belonging to the ith identity. [sent-79, score-0.364]

6 t=2 The conditional probability distribution P(σ(t) |σ(t−1) ) is called the transition model, and might reflect, for example, that the identities belonging to two tracks were swapped with some probability by a mixing event. [sent-94, score-0.331]

7 Strictly speaking, a map from identities to tracks is not a permutation since a permutation always maps a set into itself. [sent-178, score-0.44]

8 The simplest example of a representation is called the trivial representation ρ(n) : Sn → R1×1 , which maps each element of the symmetric group to 1, the multiplicative identity on the real numbers. [sent-227, score-0.33]

9 The first-order permutation representation of Sn , which we alluded to in Example 1, is the degree n representation, τ(n−1,1) (we explain the terminology in Section 5) , which maps a permutation σ to its corresponding permutation matrix given by [τ(n−1,1) (σ)]i j = 1 {σ( j) = i}. [sent-230, score-0.389]

10 These representations are referred to as the irreducibles of a group, and they are defined simply to be the collection of representations (up to equivalence) which are not reducible. [sent-277, score-0.358]

11 As it happens, there are only three irreducible representations of S3 (Diaconis, 1988), up to equivalence: the trivial representation ρ(3) , the degree 2 representation ρ(2,1) , and the alternating representation ρ(1,1,1) . [sent-279, score-0.625]

12 The complete set of irreducible representation matrices of S3 are shown in the Table 2. [sent-280, score-0.444]

13 Unfortunately, the analysis of the irreducible representations for n > 3 is far more complicated and we postpone this more general discussion for Section 5. [sent-281, score-0.412]

14 2 The Fourier Transform The link between group representation theory and Fourier analysis is given by the celebrated PeterWeyl theorem (Diaconis, 1988; Terras, 1999; Sagan, 2001) which says that the matrix entries of the irreducibles of G form a complete set of orthogonal basis functions on G. [sent-283, score-0.471]

15 σ The collection of Fourier Transforms at all irreducible representations of G form the Fourier Transform of f . [sent-296, score-0.412]

16 If P were the uniform distribution, then Pρ = 0 at every irreducible ρ except at the trivial representation. [sent-305, score-0.326]

17 The Fourier transform of the same distribution at all irreducibles is: Pρ(3) = 1, Pρ(2,1) = 1/4 √ 3/4 √ 3/4 1/4 , Pρ(1,1,1) = 0. [sent-318, score-0.306]

18 Unfortunately, as mentioned in Section 3, the Fourier transform at the first-order permutation representation cannot capture more complicated statements like: P(Alice and Bob occupy Tracks 1 and 2) = 0. [sent-320, score-0.306]

19 We define the second-order unordered permutation representation by: [τ(n−2,2) (σ)]{i, j},{k,ℓ} = 1 {σ({k, ℓ}) = {i, j}} , where we index the matrix rows and columns by unordered pairs {i, j}. [sent-322, score-0.313]

20 In general however, the permutation representations as they have been defined above are reducible, intuitively due to the fact that it is possible to recover lower order marginal probabilities from higher order marginal probabilities. [sent-331, score-0.333]

21 Rather than giving a fully rigorous treatment of the subject, our goal is to give some intuition about 1010 F OURIER T HEORETIC P ROBABILISTIC I NFERENCE OVER P ERMUTATIONS the kind of information which can be captured by the irreducible representations of Sn . [sent-336, score-0.412]

22 H UANG , G UESTRIN AND G UIBAS In general, we will be able to use the Fourier coefficients at irreducible representations to compute the marginal probabilities of Young tabloids. [sent-354, score-0.505]

23 If P(σ) is a probability distribution, then Pτλ ij = ∑ P(σ) [τλ (σ)]i j , σ∈Sn = ∑ P(σ), {σ : σ({t j })={ti }} = P(σ : σ({t j }) = {ti }), and therefore, the matrix of marginals corresponding to Young tabloids of shape λ is given exactly by the Fourier transform at the representation τλ . [sent-395, score-0.448]

24 As we showed earlier, the simplest marginals (the zeroth order normalization constant), correspond to the Fourier transform at τ(n) , while the first-order marginals correspond to τ(n−1,1) , and the second-order unordered marginals correspond to τ(n−2,2) . [sent-396, score-0.562]

25 Intuitively, representations corresponding to partitions which are high in the dominance ordering are ‘low frequency’, while representations corresponding to partitions which are low in the dominance ordering are ‘high frequency. [sent-411, score-0.364]

26 The corresponding Fourier coefficient matrices for each partition (at irreducible representations) are shown in the right diagram (b). [sent-419, score-0.401]

27 that, for each permutation representation τλ , there exists a corresponding irreducible representation ρλ , which, loosely, captures all of the information at the ‘frequency’ λ which was not already captured at lower frequency irreducibles. [sent-422, score-0.558]

28 Moreover, it can be shown that there exists no irreducible representation besides those indexed by the partitions of n. [sent-423, score-0.458]

29 (Uniqueness) For each partition, λ, of n, there exists an irreducible representation, ρλ , which is unique up to equivalence. [sent-426, score-0.326]

30 (Completeness) Every irreducible representation of Sn corresponds to some partition of n. [sent-428, score-0.425]

31 In other words, the irreducibles at λ form a basis for functions that have “interesting” λth -order marginal probabilities, but uniform marginals at all partitions µ such that µ ⊲ λ. [sent-438, score-0.477]

32 Example 7 As an example, we demonstrate a “preference” function which is “purely” secondorder (unordered) in the sense that its Fourier coefficients are equal to zero at all irreducible representations except ρ(n−2,2) (and the trivial representation). [sent-439, score-0.412]

33 On the other hand, Alice and Bob 1015 H UANG , G UESTRIN AND G UIBAS λ (n) (n − 1, 1) (n − 2, 2) (n − 2, 1, 1) (n − 3, 3) (n − 3, 2, 1) dim ρλ 1 n−1 n(n−3) 2 (n−1)(n−2) 2 n(n−1)(n−5) 6 n(n−2)(n−4) 3 Table 3: Dimensions of low-order irreducible representation matrices. [sent-444, score-0.397]

34         fˆρ(n−2,2) The sizes of the irreducible representation matrices are typically much smaller than their corresponding permutation representation matrices. [sent-448, score-0.605]

35 In practice, since the irreducible representation matrices are determined only up to equivalence, it is necessary to choose a basis for the irreducible representations in order to explicitly construct the representation matrices. [sent-462, score-0.963]

36 See Appendix B for details on constructing irreducible matrix representations with respect to the Gel’fand-Tsetlin basis. [sent-465, score-0.46]

37 7 In our identity management example, π(t) represents a random identity permutation that might occur among tracks when they get close to each other (what we call a mixing event). [sent-484, score-0.495]

38 Had we defined π to map from tracks to identities (instead of identities to tracks), then π would be multiplied from the right. [sent-492, score-0.343]

39 1 C OMPLEXITY OF P REDICTION /ROLLUP We will discuss complexity in terms of the dimension of the largest maintained irreducible Fourier coefficient matrix, which we will denote by dmax (see Table 3 for irreducible dimensions). [sent-510, score-0.711]

40 Performing a single prediction/rollup step in the Fourier domain involves performing a single 3 matrix multiplication for each irreducible and thus requires O(dmax ) time using the naive multiplication algorithm. [sent-512, score-0.374]

41 If we use the Gel’fand-Tsetlin basis adapted to the subgroup chain S1 ⊂ · · · Sn (see Appendix B), then we also know that the irreducible representation matrices evaluated at adjacent transpositions 2 are sparse with no more than O(dmax ) nonzero entries. [sent-519, score-0.577]

42 We present a convolution-based conditioning algorithm which we call Kronecker Conditioning, which, in contrast to the pointwise nature of the Fourier Domain prediction/rollup step, and much like convolution, smears the information at an irreducible ρν to other irreducibles. [sent-576, score-0.477]

43 In particular, if ρλ and ρµ are any two irreducibles of G, there exists a similarity transform Cλµ such that, for any σ ∈ G, z −1 Cλµ · [ρλ ⊗ ρµ ] (σ) ·Cλµ = λµν MM ν ℓ=1 ρν (σ). [sent-622, score-0.306]

44 (10) The ⊕ symbols here refer to a matrix direct sum as in Equation 2, ν indexes over all irreducible representations of Sn , while ℓ indexes over a number of copies of ρν which appear in the decomposition. [sent-623, score-0.46]

45 The Kronecker Product Decomposition problem is that of finding the irreducible components of the Kronecker product representation, and thus to find the Clebsch-Gordan series/coefficients for each pair of irreducible representations (ρλ , ρµ ). [sent-628, score-0.738]

46 Here we consider conditioning only on the symmetric group of order n with the assumption that the number of irreducibles maintained is very small (and in particular, not allowed to grow with respect to n). [sent-662, score-0.419]

47 Specifically, we fix a bandlimit λMIN and maintain the Fourier transform of P only at irreducibles which are at λMIN or above in the dominance ordering: Λ = {ρλ : λ λMIN }. [sent-699, score-0.341]

48 And so instead of considering the approximation accuracy of the bandlimited Fourier transform to the true joint distribution, we consider the accuracy only at the marginals which are maintained by our method. [sent-723, score-0.322]

49 In multiobject tracking, a mixing model might account for the fact that two tracks may have swapped identities with some probability. [sent-786, score-0.331]

50 In marginal based constructions, we first compute the low-order ‘marginals’ of some probabilistic model, then project the result onto the irreducible Fourier basis. [sent-793, score-0.39]

51 In many of these cases, computing the appropriate Fourier transform reduces to computing the Fourier transform of the indicator function of some related subgroup of Sn , and so we also mention the relevant subgroup in the second column. [sent-796, score-0.363]

52 1 PAIRWISE M IXING The simplest mixing model for identity management assumes that with probability p, nothing happens, and that with probability (1 − p), the identities for tracks i and j are swapped. [sent-806, score-0.427]

53 2 Marginal Based Construction In marginal based constructions, we first compute the low-order ‘marginals’11 of some probabilistic model, then project the result onto the irreducible Fourier basis. [sent-826, score-0.39]

54 1035 H UANG , G UESTRIN AND G UIBAS at irreducibles, we proceed by computing the first-order Fourier coefficients at the first-order permutation representation (the first-order “marginals”), then transforming to irreducible coefficients. [sent-840, score-0.487]

55 =  Bob ij 1/4 1/2 3/4  Cathy 1 1/2 0 The corresponding coefficients at the irreducible representations are: ˆ L(3) = 1. [sent-852, score-0.412]

56 In sports, we might observe that the first k tracks belong to the red team and that the last n − k tracks belong to the blue team. [sent-864, score-0.354]

57 ¯ ¯ Intuitively, Equation 28 fixes all of the tracks outside of X and says that with uniform probability, the set of tracks in X experience some permutation of their respective identities. [sent-950, score-0.444]

58 Both problems need to address the fundamental combinatorial challenge that there is a factorial or exponential number of associations to maintain between tracks and identities, or between tracks and observations respectively. [sent-1056, score-0.354]

59 In the case of n tracks and n identities, the belief matrix B is an n × n doubly-stochastic matrix of non-negative entries bi j , where bi j is the probability that identity i is associated with track j. [sent-1072, score-0.464]

60 8 Fraction of Observation events (a) Kronecker Conditioning Accuracy—we measure the (b) HMM Accuracy—we measure the average accuracy of accuracy of a single Kronecker conditioning operation af- posterior marginals over 250 timesteps, varying the proter some number of mixing events. [sent-1139, score-0.332]

61 The task after conditioning on each observation is to predict identities for all tracks which are inside the room, and the evaluation metric is the fraction of accurate predictions. [sent-1165, score-0.366]

62 1049 H UANG , G UESTRIN AND G UIBAS Definition 19 (Subgroup) If G is a group (with group operation ·), a subset H ⊂ G is called a subgroup if it is itself a group with respect to the same group operation. [sent-1263, score-0.416]

63 Constructing Irreducible Representation Matrices In this section, we present (without proof) some standard algorithms for constructing the irreducible representation matrices with respect to the Gel’fand-Tsetlin (GZ) basis (constructed with respect to the subgroup chain S1 ⊂ S2 ⊂ · · · ⊂ Sn ). [sent-1271, score-0.524]

64 There are several properties which make the irreducible representation matrices, written with respect to the GZ basis, fairly useful in practice. [sent-1274, score-0.397]

65 The significance of the standard tableaux is that the set of all standard tableaux of shape λ can be used to index the set of GZ basis vectors for the irreducible representation ρλ . [sent-1284, score-0.689]

66 The irreducible representation matrices in this Appendix are also sometimes referred to as Young’s Orthogonal Representation (YOR). [sent-1286, score-0.444]

67 1050 F OURIER T HEORETIC P ROBABILISTIC I NFERENCE OVER P ERMUTATIONS five total standard tableaux of shape (3, 2), we see, for example, that the irreducible corresponding to the partition (3, 2) is 5-dimensional. [sent-1287, score-0.496]

68 The pseudocode for constructing the irreducible representation matrices for adjacent swaps is summarized in Algorithm 3. [sent-1332, score-0.444]

69 1052 F OURIER T HEORETIC P ROBABILISTIC I NFERENCE OVER P ERMUTATIONS Algorithm 3: Pseudocode for computing irreducible representations matrices with respect to the Gel’fand-Tsetlin basis at adjacent transpositions. [sent-1334, score-0.495]

70 For example, the permutation (1, 2, 5) can be factored into: (1, 2, 5) = (4, 5)(3, 4)(1, 2)(2, 3)(3, 4)(4, 5), 1053 H UANG , G UESTRIN AND G UIBAS Algorithm 4: Pseudocode for computing irreducible representation matrices for arbitrary permutations. [sent-1345, score-0.534]

71 1 Standard Tableaux and the Gel’fand-Tsetlin Basis It is straightforward to see that any irreducible representation ρλ of Sn , can also be seen as a representation of Sn−1 when restricted to permutations in Sn−1 (the set of elements which fix n). [sent-1364, score-0.576]

72 However, the irreducibility property may not Sn be preserved by restriction, which is to say that, as a representation of Sn−1 , ρλ is not necessarily irreducible and might decompose as Equation 2 would dictate. [sent-1366, score-0.397]

73 Theorem 21 (Branching Rule, see Vershik and Okounkov (2006) for a proof) For each irreducible representation ρλ of Sn , there exists a matrix Cλ such that: −1 Cλ · ρλ (σ) ·Cλ = M λ− ρλ− (σ) holds for any σ ∈ Sn−1 . [sent-1370, score-0.445]

74 The branching rule states that given an irreducible matrix representation ρ(3,2) of S5 , then there is a matrix C(3,2) such that, for any permutation σ ∈ S5 such that σ(5) = 5, 0 ρ(2,2) (σ) −1 . [sent-1373, score-0.666]

75 Thus the irreducible representation matrices constructed with respect to the GZ basis have the property that the equation: M ρλ− (σ) ρλ (σ) = λ− holds identically for all σ ∈ Sn−1 . [sent-1375, score-0.48]

76 First observe that the branching rule allows us to associate each column of the irreducible ρλ with some partition of n − 1. [sent-1377, score-0.437]

77 Proposition 22 Let ρλ be an irreducible matrix representation of Sn (constructed with respect to the Gel’fand-Tsetlin basis). [sent-1393, score-0.445]

78 Since δSk is supported on Sk ⊂ Sn , the Fourier transform of δSk at the irreducible ρλ can be written as a direct sum of Fourier coefficient matrices at the irreducibles which appear in the n − kth level of the branching tree corresponding to λ. [sent-1413, score-0.762]

79             Our discussion has been focused on the indicator function δSk , but computing δSX,Y with |X| = |Y | = n − k can be accomplished by first constructing the Fourier coefficient matrices for δSk , then relabeling the tracks and identities using a change of basis. [sent-1435, score-0.342]

80 Decomposing the Tensor Product Representation We now turn to the Tensor Product Decomposition problem, which is that of finding the irreducible components of the typically reducible tensor product representation. [sent-1449, score-0.374]

81 If ρλ and ρµ are irreducible representations of Sn , then there exists an intertwining operator Cλµ such that: z Cλµ −1 · (ρλ ⊗ ρµ (σ)) ·Cλµ = 1059 λµν MM ν ℓ=1 ρν (σ). [sent-1450, score-0.492]

82 (32) H UANG , G UESTRIN AND G UIBAS In this section, we will present a set of numerical methods for computing the Clebsch-Gordan series (zλµν ) and Clebsch-Gordan coefficients (Cλµ ) for a pair of irreducible representations ρλ ⊗ ρµ . [sent-1451, score-0.412]

83 If ρ is a representation of a group G, then the character of the representation ρ, is defined simply to be the trace of the representation at each element σ ∈ G: χρ (σ) = Tr (ρ(σ)) . [sent-1461, score-0.306]

84 Even more surprising is that the space of possible group characters is orthogonally spanned by the characters of the irreducible representations. [sent-1463, score-0.487]

85 Proposition 27 Let χρ1 and χρ2 be characters corresponding to irreducible representations. [sent-1467, score-0.36]

86 0 otherwise χρ1 , χρ2 = Proposition 27 shows that the irreducible characters form an orthonormal set of functions. [sent-1469, score-0.36]

87 The next proposition says that the irreducible characters span the space of all possible characters. [sent-1470, score-0.36]

88 Proposition 28 Suppose ρ is any representation of G and which decomposes into irreducibles as: ρ≡ zλ MM λ ℓ=1 where λ indexes over all irreducibles of G. [sent-1471, score-0.443]

89 The character of ρ is a linear combination of irreducible characters (χρ = ∑λ zλ χρλ ), 2. [sent-1473, score-0.36]

90 A simple way to decompose any group representation ρ, is given by Proposition 28, which says that we can take inner products of χρ against the basis of irreducible characters to obtain the irreducible multiplicities zλ . [sent-1475, score-0.92]

91 Theorem 29 Let ρλ and ρµ be irreducible representations with characters χλ , χµ respectively. [sent-1477, score-0.446]

92 15 Unlike the Clebsch-Gordan series which are basis-independent, intertwining operators must be recomputed if we change the underlying basis by which the irreducible representation matrices are constructed. [sent-1522, score-0.56]

93 Let X be any degree d group representation of G, and let Y be an equivalent direct sum of irreducibles, for example, Y (σ) = zν MM ν ℓ=1 ρν (σ), (34) where each irreducible ρν has degree dν . [sent-1524, score-0.49]

94 To find the matrix which relates marginal probabilities to irreducible coefficients, we would set X = τλ , and the multiplicities would be given by the Kostka numbers (Equation 6). [sent-1528, score-0.501]

95 Though the fundamental ideas in this section hold for a general finite group, we will continue to index irreducible by partitions and think of representations as being real-valued. [sent-1532, score-0.473]

96 1065 (37) H UANG , G UESTRIN AND G UIBAS Algorithm 6: Pseudocode for computing an orthogonal intertwining operators I NT XY input : A degree d orthogonal matrix representation X evaluated at permutations (1, 2) and (1, . [sent-1613, score-0.381]

97 , n), and the multiplicity zν , of the irreducible ρν in X T output: A matrix Cν with orthogonal rows such that Cν · ⊕zν ρν ·Cν = X 1 K1 ← Id×d ⊗ (⊕zν ρν (1, 2)) − X(1, 2) ⊗ Id×d ; 2 K2 ← Id×d ⊗ (⊕zν ρν (1, . [sent-1616, score-0.411]

98 Theorem 38 Let X be any orthogonal group representation of G and Y an equivalent orthogonal irreducible decomposition (As in Equation 34). [sent-1627, score-0.564]

99 An algorithm for the machine calculation of complex fourier series. [sent-1665, score-0.513]

100 The analysis of the kronecker product of irreducible representations of the symmetric group. [sent-1746, score-0.519]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fourier', 0.513), ('irreducible', 0.326), ('sn', 0.199), ('irreducibles', 0.186), ('tracks', 0.177), ('ourier', 0.157), ('uang', 0.157), ('uestrin', 0.157), ('uibas', 0.157), ('marginals', 0.13), ('track', 0.13), ('ermutations', 0.129), ('heoretic', 0.129), ('robabilistic', 0.129), ('transform', 0.12), ('nference', 0.116), ('tableaux', 0.114), ('alice', 0.111), ('permutations', 0.108), ('conditioning', 0.106), ('group', 0.093), ('permutation', 0.09), ('int', 0.09), ('representations', 0.086), ('tableau', 0.085), ('sk', 0.084), ('bob', 0.083), ('identities', 0.083), ('branching', 0.083), ('intertwining', 0.08), ('coef', 0.074), ('kronecker', 0.073), ('bandlimited', 0.072), ('young', 0.071), ('mixing', 0.071), ('representation', 0.071), ('cients', 0.066), ('marginal', 0.064), ('identity', 0.061), ('partitions', 0.061), ('dmax', 0.059), ('comy', 0.055), ('unordered', 0.052), ('diaconis', 0.052), ('kondor', 0.052), ('shin', 0.051), ('tabloids', 0.051), ('tensor', 0.048), ('matrix', 0.048), ('convolution', 0.047), ('matrices', 0.047), ('pointwise', 0.045), ('transforms', 0.044), ('subgroup', 0.044), ('gel', 0.042), ('tr', 0.041), ('cathy', 0.04), ('polytope', 0.039), ('bluetooth', 0.038), ('noncommutative', 0.038), ('nullspace', 0.038), ('rt', 0.037), ('inference', 0.037), ('orthogonal', 0.037), ('basis', 0.036), ('dominance', 0.035), ('indicator', 0.035), ('management', 0.035), ('ferrers', 0.034), ('fft', 0.034), ('multiplicities', 0.034), ('symmetric', 0.034), ('characters', 0.034), ('room', 0.033), ('walk', 0.03), ('invertible', 0.03), ('axial', 0.03), ('bandlimiting', 0.03), ('col', 0.03), ('deck', 0.03), ('leonidas', 0.03), ('mz', 0.03), ('probabilities', 0.029), ('tracking', 0.028), ('shape', 0.028), ('nonzero', 0.028), ('partition', 0.028), ('vec', 0.028), ('shuf', 0.027), ('groups', 0.026), ('projection', 0.026), ('compactly', 0.026), ('commutant', 0.025), ('obs', 0.025), ('occupy', 0.025), ('sagan', 0.025), ('schumitsch', 0.025), ('transpositions', 0.025), ('events', 0.025), ('foreach', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario. Keywords: identity management, permutations, approximate inference, group theoretical methods, sensor networks

2 0.10922517 49 jmlr-2009-Learning Permutations with Exponential Weights

Author: David P. Helmbold, Manfred K. Warmuth

Abstract: We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes predictions using an efficient method for decomposing the weight matrix into a convex combination of permutations. The weight matrix is updated by multiplying the current matrix entries by exponential factors, and an iterative procedure is needed to restore double stochasticity. Even though the result of this procedure does not have a closed form, a new analysis approach allows us to prove an optimal (up to small constant factors) bound on the regret of our algorithm. This regret bound is significantly better than that of either Kalai and Vempala’s more efficient Follow the Perturbed Leader algorithm or the computationally expensive method of explicitly representing each permutation as an expert. Keywords: permutation, ranking, on-line learning, Hedge algorithm, doubly stochastic matrix, relative entropy projection, Sinkhorn balancing

3 0.10522851 18 jmlr-2009-Consistency and Localizability

Author: Alon Zakai, Ya'acov Ritov

Abstract: We show that all consistent learning methods—that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y )—are necessarily localizable, by which we mean that they do not signiÄ?Ĺš cantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be deÄ?Ĺš ned in a non-local manner, such as support vector machines in classiÄ?Ĺš cation and least-squares estimators in regression. Aside from showing that consistency implies a speciÄ?Ĺš c form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method’s global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global. Keywords: consistency, local learning, regression, classiÄ?Ĺš cation

4 0.090514623 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Author: Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page

Abstract: A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. Keywords: correlation immune functions, skewing, relevant variables, Boolean functions, product distributions c 2009 Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray and David Page. H ELLERSTEIN , ROSELL , BACH , R AY AND PAGE

5 0.052241687 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models

Author: Shaowei Lin, Bernd Sturmfels, Zhiqiang Xu

Abstract: Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of SegreVeronese varieties. Keywords: marginal likelihood, exact integration, mixture of independence model, computational algebra

6 0.047257666 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

7 0.045560513 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

8 0.045218188 64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning

9 0.043048248 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

10 0.041741427 78 jmlr-2009-Refinement of Reproducing Kernels

11 0.040159818 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

12 0.039033383 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

13 0.035766136 90 jmlr-2009-Structure Spaces

14 0.035258248 92 jmlr-2009-Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

15 0.034758501 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning

16 0.032779202 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks

17 0.032686282 29 jmlr-2009-Estimating Labels from Label Proportions

18 0.032550178 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

19 0.031868011 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes

20 0.031765696 50 jmlr-2009-Learning When Concepts Abound


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, -0.05), (2, 0.077), (3, -0.006), (4, -0.123), (5, 0.055), (6, -0.125), (7, -0.036), (8, -0.0), (9, 0.031), (10, 0.121), (11, -0.093), (12, -0.09), (13, 0.201), (14, -0.298), (15, 0.083), (16, -0.009), (17, -0.14), (18, 0.164), (19, -0.02), (20, 0.079), (21, -0.035), (22, -0.288), (23, -0.104), (24, -0.117), (25, -0.145), (26, -0.058), (27, -0.208), (28, -0.148), (29, -0.09), (30, 0.099), (31, -0.088), (32, 0.047), (33, 0.063), (34, -0.126), (35, 0.002), (36, -0.026), (37, 0.102), (38, -0.101), (39, -0.019), (40, -0.047), (41, 0.108), (42, -0.066), (43, -0.001), (44, 0.049), (45, 0.037), (46, 0.101), (47, 0.103), (48, -0.111), (49, 0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9703691 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario. Keywords: identity management, permutations, approximate inference, group theoretical methods, sensor networks

2 0.50509703 49 jmlr-2009-Learning Permutations with Exponential Weights

Author: David P. Helmbold, Manfred K. Warmuth

Abstract: We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes predictions using an efficient method for decomposing the weight matrix into a convex combination of permutations. The weight matrix is updated by multiplying the current matrix entries by exponential factors, and an iterative procedure is needed to restore double stochasticity. Even though the result of this procedure does not have a closed form, a new analysis approach allows us to prove an optimal (up to small constant factors) bound on the regret of our algorithm. This regret bound is significantly better than that of either Kalai and Vempala’s more efficient Follow the Perturbed Leader algorithm or the computationally expensive method of explicitly representing each permutation as an expert. Keywords: permutation, ranking, on-line learning, Hedge algorithm, doubly stochastic matrix, relative entropy projection, Sinkhorn balancing

3 0.49599174 18 jmlr-2009-Consistency and Localizability

Author: Alon Zakai, Ya'acov Ritov

Abstract: We show that all consistent learning methods—that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y )—are necessarily localizable, by which we mean that they do not signiÄ?Ĺš cantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be deÄ?Ĺš ned in a non-local manner, such as support vector machines in classiÄ?Ĺš cation and least-squares estimators in regression. Aside from showing that consistency implies a speciÄ?Ĺš c form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method’s global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global. Keywords: consistency, local learning, regression, classiÄ?Ĺš cation

4 0.47256485 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Author: Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray, David Page

Abstract: A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. Keywords: correlation immune functions, skewing, relevant variables, Boolean functions, product distributions c 2009 Lisa Hellerstein, Bernard Rosell, Eric Bach, Soumya Ray and David Page. H ELLERSTEIN , ROSELL , BACH , R AY AND PAGE

5 0.23435661 53 jmlr-2009-Marginal Likelihood Integrals for Mixtures of Independence Models

Author: Shaowei Lin, Bernd Sturmfels, Zhiqiang Xu

Abstract: Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of SegreVeronese varieties. Keywords: marginal likelihood, exact integration, mixture of independence model, computational algebra

6 0.2276167 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

7 0.20896049 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

8 0.19927809 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

9 0.18843867 57 jmlr-2009-Multi-task Reinforcement Learning in Partially Observable Stochastic Environments

10 0.1786851 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes

11 0.17724495 64 jmlr-2009-On The Power of Membership Queries in Agnostic Learning

12 0.1691221 86 jmlr-2009-Similarity-based Classification: Concepts and Algorithms

13 0.1650406 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

14 0.16210936 29 jmlr-2009-Estimating Labels from Label Proportions

15 0.15963963 2 jmlr-2009-A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

16 0.15680002 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

17 0.15237471 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors

18 0.15161356 90 jmlr-2009-Structure Spaces

19 0.13786167 39 jmlr-2009-Hybrid MPI OpenMP Parallel Linear Support Vector Machine Training

20 0.13500215 28 jmlr-2009-Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.018), (11, 0.032), (19, 0.028), (21, 0.02), (26, 0.014), (37, 0.386), (38, 0.029), (47, 0.022), (52, 0.038), (55, 0.042), (58, 0.032), (66, 0.129), (68, 0.021), (90, 0.078), (96, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74204743 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact and factorized probability distribution representations, such as graphical models, cannot capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent distributions over permutations compactly. We present Kronecker conditioning, a novel approach for maintaining and updating these distributions directly in the Fourier domain, allowing for polynomial time bandlimited approximations. Low order Fourier-based approximations, however, may lead to functions that do not correspond to valid distributions. To address this problem, we present a quadratic program defined directly in the Fourier domain for projecting the approximation onto a relaxation of the polytope of legal marginal distributions. We demonstrate the effectiveness of our approach on a real camera-based multi-person tracking scenario. Keywords: identity management, permutations, approximate inference, group theoretical methods, sensor networks

2 0.39501482 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

Author: Haizhang Zhang, Yuesheng Xu, Jun Zhang

Abstract: We introduce the notion of reproducing kernel Banach spaces (RKBS) and study special semiinner-product RKBS by making use of semi-inner-products and the duality mapping. Properties of an RKBS and its reproducing kernel are investigated. As applications, we develop in the framework of RKBS standard learning schemes including minimal norm interpolation, regularization network, support vector machines, and kernel principal component analysis. In particular, existence, uniqueness and representer theorems are established. Keywords: reproducing kernel Banach spaces, reproducing kernels, learning theory, semi-innerproducts, representer theorems

3 0.39466798 82 jmlr-2009-Robustness and Regularization of Support Vector Machines

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well. Keywords: robustness, regularization, generalization, kernel, support vector machine

4 0.39171508 52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

Author: Cynthia Rudin, Robert E. Schapire

Abstract: We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore, RankBoost, when given a specific intercept, achieves a misclassification error that is as good as AdaBoost’s. This may help to explain the empirical observations made by Cortes and Mohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve. Keywords: ranking, RankBoost, generalization bounds, AdaBoost, area under the ROC curve

5 0.38880295 27 jmlr-2009-Efficient Online and Batch Learning Using Forward Backward Splitting

Author: John Duchi, Yoram Singer

Abstract: We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1 . We derive concrete and very simple algorithms for minimization of loss functions with ℓ1 , ℓ2 , ℓ2 , and ℓ∞ regularization. We also show how to construct ef2 ficient algorithms for mixed-norm ℓ1 /ℓq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural data sets. Keywords: subgradient methods, group sparsity, online learning, convex optimization

6 0.38828602 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

7 0.38754374 78 jmlr-2009-Refinement of Reproducing Kernels

8 0.38699302 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm    (Special Topic on Mining and Learning with Graphs and Relations)

9 0.3866021 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

10 0.38635936 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

11 0.38476381 18 jmlr-2009-Consistency and Localizability

12 0.38462585 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

13 0.38224414 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models

14 0.38126734 66 jmlr-2009-On the Consistency of Feature Selection using Greedy Least Squares Regression

15 0.38090995 87 jmlr-2009-Sparse Online Learning via Truncated Gradient

16 0.38085845 51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

17 0.3806268 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

18 0.38058326 79 jmlr-2009-Reinforcement Learning in Finite MDPs: PAC Analysis

19 0.38026297 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

20 0.37971631 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning