nips nips2007 nips2007-77 knowledge-graph by maker-knowledge-mining

77 nips-2007-Efficient Inference for Distributions on Permutations

Source: pdf

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

Abstract: Permutations are ubiquitous in many real world problems, such as voting, rankings and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and typical compact representations such as graphical models cannot efﬁciently capture the mutual exclusivity constraints associated with permutations. In this paper, we use the “low-frequency” terms of a Fourier decomposition to represent such distributions compactly. We present Kronecker conditioning, a general and efﬁcient approach for maintaining these distributions directly in the Fourier domain. Low order Fourier-based approximations can lead to functions that do not correspond to valid distributions. To address this problem, we present an efﬁcient quadratic program deﬁned directly in the Fourier domain to project the approximation onto a relaxed form of the marginal polytope. We demonstrate the effectiveness of our approach on a real camera-based multi-people tracking setting. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Representing uncertainty over permutations is challenging, since there are n! [sent-8, score-0.135]

2 possibilities, and typical compact representations such as graphical models cannot efﬁciently capture the mutual exclusivity constraints associated with permutations. [sent-9, score-0.122]

3 To address this problem, we present an efﬁcient quadratic program deﬁned directly in the Fourier domain to project the approximation onto a relaxed form of the marginal polytope. [sent-13, score-0.119]

4 As an example, consider a sensor network that tracks the positions of n people, but can only gather identity information when they walk near certain sensors. [sent-16, score-0.221]

5 A typical tracking system maintains tracks of n people and the identity of the person corresponding to each track. [sent-19, score-0.235]

6 What makes the problem difﬁcult is that identities can be confused when tracks cross in what we call mixing events. [sent-20, score-0.25]

7 Permutations pose a challenge for probabilistic inference, because distributions on the group of permutations on n elements require storing at least n! [sent-22, score-0.242]

8 • We show that approximate conditioning can sometimes yield Fourier coefﬁcients which do not correspond to any valid distribution, and present a method for projecting the result back onto a relaxation of the marginal polytope. [sent-32, score-0.281]

9 1 2 Filtering over permutations In identity management, a permutation σ represents a joint assignment of identities to internal tracks, with σ(i) being the track belonging to the ith identity. [sent-34, score-0.39]

10 When people walk too closely together, their identities can be confused, leading to uncertainty over σ. [sent-35, score-0.15]

11 , z (T ) ) = P (z (1) |σ (1) ) Y P (z t |σ (t) ) · P (σ (t) |σ (t−1) ), t (t) (t) where the σ are latent permutations and the z denote observed variables. [sent-48, score-0.135]

12 The conditional probability distribution P (σ (t) |σ (t−1) ) is called the transition model, and might reﬂect for example, that the identities belonging to two tracks were swapped with some probability. [sent-49, score-0.202]

13 , z (t+1) ) in two steps: a prediction/rollup step and a conditioning step. [sent-58, score-0.137]

14 Perhaps less familiar, is its group theoretic generalization, which we review in this section with an eye towards approximating functions on the group of permutations, the Symmetric Group. [sent-80, score-0.203]

15 For permutations on n objects, the Symmetric Group will be abbreviated by Sn . [sent-81, score-0.135]

16 The formal deﬁnition of the Fourier Transform relies on the theory of group representations, which we brieﬂy discuss ﬁrst. [sent-82, score-0.109]

17 Our goal in this section is to motivate the idea that the Fourier transform of a distribution P is related to certain marginals of P . [sent-83, score-0.237]

18 A representation of a group G is a map ρ from G to a set of invertible dρ × dρ matrix operators which preserves algebraic structure in the sense that for all σ1 , σ2 ∈ G, ρ(σ1 σ2 ) = ρ(σ1 ) · ρ(σ2 ). [sent-86, score-0.211]

19 The matrices which lie in the image of this map are called the representation matrices, and we will refer to dρ as the degree of the representation. [sent-87, score-0.125]

20 The simplest basis functions are constant functions — and our ﬁrst example of a representation is the trivial representation ρ0 : G → R which maps every element of G to 1. [sent-89, score-0.223]

21 As a more pertinent example, we deﬁne the 1st order permutation representation of Sn to be the degree n representation, τ1 , which maps a permutation σ to its corresponding permutation matrix given by: [τ1 (σ)]ij = 1 {σ(j) = i}. [sent-90, score-0.465]

22 For example, the permutation in S3 which swaps the second and third elements maps to: 0 1 τ1 (1 → 1, 2 → 3, 3 → 2) = @ 0 0 0 0 1 1 0 1 A. [sent-91, score-0.18]

23 0 The τ1 representation can be thought of as a collection of n2 functions at once, one for each matrix entry, [τ1 (σ)]ij . [sent-92, score-0.135]

24 There are other possible permutation representations - for example the 2nd order unordered permutation representation, τ2 , is deﬁned by the action of a permutation on unordered pairs of objects, ([ρ(σ)]{i,j},{ℓ,k} = 1 {σ({ℓ, k}) = {i, j}}), and is a degree n(n−1) representation. [sent-93, score-0.5]

25 2 It is useful to think of two representations as being the same if the representation matrices are equal up to some consistent change of basis. [sent-95, score-0.195]

26 This idea is formalized by declaring two representations ρ and τ to be equivalent if there exists an invertible matrix C such that C −1 · ρ(σ) · C = τ (σ) for all σ ∈ G. [sent-96, score-0.138]

27 Most representations can be seen as having been built up by smaller representations. [sent-98, score-0.092]

28 We say that a representation ρ is reducible if there exist smaller representations ρ1 , ρ2 such that ρ ≡ ρ1 ⊕ ρ2 where ⊕ is deﬁned to be the direct sum representation: ρ1 ⊕ ρ2 (g) „ ρ1 (g) 0 0 ρ2 (g) « . [sent-99, score-0.207]

29 However, for any ﬁnite group, there is always a ﬁnite collection of atomic representations which can be used to build up any other representation using direct sums. [sent-101, score-0.202]

30 These representations are referred to as the irreducibles of a group, and they are simply the collection of representations which are not reducible. [sent-102, score-0.569]

31 It can be shown that any representation of a ﬁnite group G is equivalent to a direct sum of irreducibles [3], and hence, for any representation τ , there exists a matrices C for which C −1 · τ · C = ⊕ρi ∈R ⊕ ρi , where the inner ⊕ refers to some ﬁnite number of copies of the irreducible ρi . [sent-104, score-0.744]

32 Describing the irreducibles of Sn up to equivalence is a subject unto itself; We will simply say that there is a natural way to order the irreducibles of Sn that corresponds to ‘simplicity’ in the same way that low frequency sinusoids are simpler than higher frequency ones. [sent-105, score-0.75]

33 We will refer to the irreducibles in this order as ρ0 , ρ1 , . [sent-106, score-0.378]

34 For example, the ﬁrst two irreducibles form the ﬁrst order permutation representation (τ1 ≡ ρ0 ⊕ ρ1 ), and the second order permutation representation can be formed by the ﬁrst 3 irreducibles. [sent-110, score-0.712]

35 Irreducible representation matrices are not always orthogonal, but they can always be chosen to be so (up to equivalence). [sent-111, score-0.103]

36 For notational convenience, the irreducible representations in this paper will always be assumed to be orthogonal. [sent-112, score-0.18]

37 1 The Fourier transform On the real line, the Fourier Transform corresponds to computing inner products of a function with sines and cosines at varying frequencies. [sent-114, score-0.174]

38 The analogous deﬁnition for ﬁnite groups replaces the sinusoids by group representations. [sent-115, score-0.126]

39 Let f : G → R be any function on a group G and let ρ be any representation on G. [sent-117, score-0.146]

40 There are two important points which distinguish this Fourier Transform from the familiar version ˆ on the real line — it is matrix-valued, and instead of real numbers, the inputs to f are representations of G. [sent-119, score-0.203]

41 The collection of Fourier Transforms of f at all irreducibles form the Fourier Transform of f . [sent-120, score-0.385]

42 As in the familiar case, there is an inverse transform given by: f (σ) = h i 1 X ˆT dρk Tr fρk · ρk (σ) , |G| (2) k where k indexes over the collection of irreducibles of G. [sent-121, score-0.585]

43 This is also true for functions on a group; If f : G → R is any function, then the Fourier Transform of f at the trivial representation is constant with ˆ ˆ fρ0 = σ f (σ). [sent-124, score-0.112]

44 If P were the uniform ˆρ = 0 at all irreducibles except at the trivial representation. [sent-126, score-0.383]

45 σ:σ(j)=i ˆ Thus, if P is a distribution, then Pτ1 is a matrix of marginal probabilties, where the ij-th element is the marginal probability that a random permutation drawn from P maps element j to i. [sent-128, score-0.277]

46 Similarly, the Fourier transform of P at the second order permutation representation is a matrix of marginal probabilities of the form P (σ({i, j}) = {k, ℓ}). [sent-129, score-0.398]

47 4 Inference in the Fourier domain Bandlimiting allows for compactly storing a distribution over permutations, but the idea is rather moot if it becomes necessary to transform back to the primal domain each time an inference operation is called. [sent-131, score-0.306]

48 Fourier coefﬁcients, but it will allow us to discuss the bandlimiting approximation in the next section. [sent-138, score-0.106]

49 This model assumes that σ (t+1) is generated from σ (t) by drawing a random permutation τ (t) from some distribution Q(t) and setting σ (t+1) = τ (t) σ (t) . [sent-144, score-0.12]

50 In our identity management example, τ (t) represents a random identity permutation that might occur among tracks when they get close to each other (a mixing event), but the random walk model appears in other applications such as modeling card shufﬂes [3]. [sent-145, score-0.482]

51 Then for any representation ρ, Q ∗ P = ρ Qρ · Pρ , where the operation on the right side is matrix multiplication. [sent-149, score-0.103]

52 Furthermore, the update is pointwise in the Fourier domain in the sense that the coefﬁcients at the representation ρ affect (t+1) Pρ only at ρ. [sent-153, score-0.247]

53 2 Fourier conditioning An application of Bayes rule to ﬁnd a posterior distribution P (σ|z) after observing some evidence z requires a pointwise product of likelihood L(z|σ) and prior P (σ), followed by a normalization step. [sent-155, score-0.336]

54 We showed earlier that the normalization constant σ L(z|σ) · P (σ) is given by the Fourier transform of L(t) P (t) at the trivial representation — and therefore the normalization step of conditioning can be implemented by simply dividing each Fourier coefﬁcient by the scalar L(t) P (t) . [sent-156, score-0.366]

55 ρ0 The pointwise product of two functions f and g, however, is trickier to formulate in the Fourier domain. [sent-157, score-0.226]

56 For functions on the real line, the pointwise product of functions can be implemented ˆ by convolving the Fourier coefﬁcients of f and g , and so a natural question is: can we apply a ˆ similar operation for functions over other groups? [sent-158, score-0.334]

57 We present a convolution-based conditioning algorithm which we call Kronecker Conditioning, which, in contrast to the pointwise nature of the Fourier Domain prediction/rollup step, and much like convolution, smears the information at an irreducible ρk to other irreducibles. [sent-160, score-0.372]

58 4 Fourier transforming the pointwise product Our approach to Fourier Transforming the pointˆ wise product in terms of f and g is to manipulate the function f (σ)g(σ) so that it can be seen as the ˆ result of an inverse Fourier Transform. [sent-161, score-0.297]

59 For any σ ∈ G we can write the pointwise product in terms f and g using the ˆ inverse Fourier Transform (Equation 2): f (σ) · g(σ) = = " # " # “ ” “ ” X 1 X T ˆT · ρi (σ) · 1 ˆ dρi Tr fρi dρj Tr gρj · ρj (σ) |G| i |G| j „ «2 X h “ ” “ ”i 1 ˆT ˆT dρi dρj Tr fρi · ρi (σ) · Tr gρj · ρj (σ) . [sent-163, score-0.199]

60 |G| i,j (4) Now we want to manipulate this product of traces in the last line to be just one trace (as in Equation 3), by appealing to some properties of the matrix Kronecker product. [sent-164, score-0.121]

61 The connection to the pointwise product (ﬁrst observed in [8]), lies in the property that for any matrices U, V , Tr (U ⊗ V ) = (Tr U ) · (Tr V ). [sent-165, score-0.244]

62 This means that if ρi and ρj are any two irreducibles of G, there exists a similarity transform Cij such that for any σ ∈ G, −1 Cij · [ρi ⊗ ρj ] (σ) · Cij = zijk MM k ρk (σ). [sent-169, score-0.619]

63 ℓ=1 The ⊕ symbols here refer to a matrix direct sum as in Equation 1, k indexes over all irreducible representations of Sn , while ℓ indexes over a number of copies of ρk which appear in the decomposition. [sent-170, score-0.33]

64 The number of copies of each ρk is denoted by the integer zijk , the collection of which, taken over all triples (i, j, k), are commonly referred to as the Clebsch-Gordan series. [sent-172, score-0.176]

65 Note that we allow the zijk to be zero, in which case ρk does not contribute to the direct sum. [sent-173, score-0.142]

66 The Kronecker Product Decomposition problem is that of ﬁnding the irreducible components of the Kronecker product representation, and thus to ﬁnd the Clebsch-Gordan series/coefﬁcients for each pair of representations (ρi , ρj ). [sent-175, score-0.232]

67 Let f , g be the Fourier Transforms of functions f and g respectively, and for each ˆ ordered pair of irreducibles (ρi , ρj ), deﬁne the matrix: Aij C −1 · fρ ⊗ gρ · Cij . [sent-177, score-0.383]

68 Then the ˆ ij i j Fourier tranform of the pointwise product f g is: h i c fg ρk zijk X kℓ 1 X Aij , dρi dρj = dρk |G| ij (6) ℓ=1 z where Akℓ is the block of Aij corresponding to the (k, ℓ) block in ⊕k ⊕ℓ ijk ρk . [sent-178, score-0.454]

69 The Clebsch-Gordan series, zijk , plays an important role in Equation 6, which says that the (ρi , ρj ) crossterm contributes to the pointwise product at ρk only when zijk > 0. [sent-180, score-0.437]

70 5 Approximate inference by bandlimiting We approximate the probability distribution P (σ) by ﬁxing a bandlimit B and maintaining the Fourier transform of P only at irreducibles ρ0 , . [sent-188, score-0.68]

71 For example, when B = 3, B is the set ρ0 , ρ1 , ρ2 , and ρ3 , which corresponds to maintaining marginal probabilities of the form P (σ((i, j)) = (k, ℓ)). [sent-194, score-0.115]

72 Pseudocode for bandlimited prediction/rollup and Kronecker conditioning is given in Figures 1 and 2. [sent-196, score-0.205]

73 Since the Prediction/Rollup step is pointwise in the Fourier domain, the update is exact for the maintained irreducibles because higher order irreducibles cannot affect those below the bandlimit. [sent-197, score-0.859]

74 As in [5], we ﬁnd that the error from bandlimiting creeps in through the conditioning step. [sent-198, score-0.222]

75 For example, Equation 7 shows that if B = 1 (so that we maintain ﬁrst-order marginals), then the pointwise product spreads information to second-order marginals. [sent-199, score-0.224]

76 Conversely, pairs of higher-order irreducibles may propagate information to lower-order irreducibles. [sent-200, score-0.356]

77 However, it is when the distribution is sharply concentrated at a small subset of permutations, that the low-order Fourier projection is unable to faithfully approximate the distribution, in many circumstances, resulting in a bandlimited Fourier Transform with negative “marginal probabilities”! [sent-202, score-0.138]

78 Projecting to a relaxed marginal polytope The marginal polytope, M, is the set of marginals which are consistent with some joint distribution over permutations. [sent-204, score-0.241]

79 We project our approximation onto a relaxation of the marginal polytope, M′ , deﬁned by linear inequality constraints that marginals be nonnegative, and linear equality constraints that they correspond to some legal Fourier transform. [sent-205, score-0.203]

80 Intuitively, our relaxation produces matrices of marginals which are doubly stochastic (rows and columns sum to one and all entries are nonnegative), and satisfy lower-order marginal consistency (different high-order marginals are consistent at lower orders). [sent-206, score-0.355]

81 After each conditioning step, we apply a ‘correction’ to the approximate posterior P (t) by ﬁnding the bandlimited function in M′ which is closest to P (t) in an L2 sense. [sent-207, score-0.205]

82 We remark that even though the projection will always produce a Fourier transform corresponding to nonnegative marginals, there might not necessarily exist a joint probability distribution on Sn consistent with those marginals. [sent-212, score-0.237]

83 In the case of ﬁrst-order marginals, however, the existence of a consistent joint distribution is guaranteed by the Birkhoff-von Neumann theorem [10], which states that a matrix is doubly stochastic if and only if it can be written as a convex combination of permutation matrices. [sent-213, score-0.177]

84 Akℓ //Akℓ is the (k, ℓ) block of Aij ij ij k ρk ρk h i Z ← L(t) P (t) ; ρ0 h i i h 1 //Normalization foreach ρk ∈ B do L(t) P (t) ← Z L(t) P (t) ρk ρk Kondor et al. [sent-221, score-0.183]

85 Their FFT method saves a factor of D due to the fact that certain representation matrices can be shown to be sparse. [sent-226, score-0.103]

86 Willsky [8] was the ﬁrst to formulate a nonabelian version of the FFT algorithm (for Metacyclic groups) as well as to note the connection between pointwise products and Kronecker product decompositions for general ﬁnite groups. [sent-230, score-0.199]

87 7 Experimental results For small n, we compared our algorithm to exact inference on synthetic datasets in which tracks are drawn at random to be observed or swapped. [sent-233, score-0.143]

88 3(b)) we allow for consecutive conditioning steps and we see that that the projection step is fundamental, especially when mixing events are rare, reducing the error dramatically. [sent-239, score-0.292]

89 In the data, there are n = 11 people walking around a room in fairly close proximity. [sent-244, score-0.092]

90 To handle the fact that people can freely leave and enter the room, we maintain a list of the tracks which are external to the room. [sent-245, score-0.175]

91 Each time a new track leaves the room, it is added to the list and a mixing event is called to allow for m2 pairwise swaps amongst the m external tracks. [sent-246, score-0.109]

92 The task after conditioning on each observation is to predict identities for all tracks inside the room, and the evaluation metric is the fraction of accurate predictions. [sent-249, score-0.337]

93 08 Averaged over 250 timesteps 5 b=1 b=2 b=3 L1 error at 1st order Marginals L1 error at 1st order marginals 0. [sent-261, score-0.093]

94 To illustrate the difﬁculty of predicting based on appearance alone, the rightmost bar reﬂects the performance of an omniscient tracker who knows the result of each mixing event and is therefore left only with the task of distinguishing between appearances. [sent-271, score-0.123]

95 In particular, we developed the Kronecker Conditioning algorithm which performs a convolution-like operation on Fourier coefﬁcients to ﬁnd the Fourier transform of the posterior distribution. [sent-273, score-0.168]

96 We argued that bandlimited conditioning can result in Fourier coefﬁcients which correspond to no distribution, but that the problem can be remedied by projecting to a relaxation of the marginal polytope. [sent-274, score-0.327]

97 Our evaluation on data from a camera network shows that our methods outperform well when compared to the optimal solution in small problems, or to an omniscient tracker in large problems. [sent-275, score-0.114]

98 Furthermore, we demonstrated that our projection step is fundamental to obtaining these high-quality results. [sent-276, score-0.095]

99 In fact, both the prediction/rollup and conditioning formulations hold over any ﬁnite group, providing a principled method for approximate inference for problems with underlying group structure. [sent-278, score-0.26]

100 The analysis of the kronecker product of irreducible representations of the symmetric group. [sent-333, score-0.421]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fourier', 0.697), ('irreducibles', 0.356), ('kronecker', 0.16), ('pointwise', 0.147), ('transform', 0.144), ('conditioning', 0.137), ('permutations', 0.135), ('permutation', 0.12), ('zijk', 0.119), ('tracks', 0.108), ('sn', 0.104), ('marginals', 0.093), ('cij', 0.093), ('representations', 0.092), ('cients', 0.091), ('irreducible', 0.088), ('group', 0.088), ('bandlimiting', 0.085), ('coef', 0.083), ('tr', 0.081), ('projection', 0.07), ('bandlimited', 0.068), ('foreach', 0.067), ('identities', 0.065), ('aij', 0.063), ('maintaining', 0.06), ('representation', 0.058), ('marginal', 0.055), ('fft', 0.054), ('mixing', 0.052), ('product', 0.052), ('guibas', 0.051), ('schumitsch', 0.051), ('room', 0.05), ('ij', 0.048), ('kondor', 0.047), ('identity', 0.047), ('matrices', 0.045), ('omniscient', 0.044), ('transforms', 0.044), ('convolution', 0.044), ('camera', 0.043), ('walk', 0.043), ('people', 0.042), ('domain', 0.042), ('management', 0.04), ('tracking', 0.038), ('polytope', 0.038), ('sinusoids', 0.038), ('ak', 0.038), ('doubly', 0.036), ('pseudocode', 0.036), ('inference', 0.035), ('reducible', 0.034), ('swaps', 0.034), ('projecting', 0.034), ('events', 0.033), ('relaxation', 0.033), ('guestrin', 0.032), ('real', 0.03), ('exclusivity', 0.03), ('collection', 0.029), ('symmetric', 0.029), ('transition', 0.029), ('equation', 0.028), ('copies', 0.028), ('familiar', 0.028), ('indexes', 0.028), ('trivial', 0.027), ('proposition', 0.027), ('tracker', 0.027), ('functions', 0.027), ('observation', 0.027), ('maps', 0.026), ('maintain', 0.025), ('invertible', 0.025), ('manipulate', 0.025), ('card', 0.025), ('confused', 0.025), ('fundamental', 0.025), ('unordered', 0.024), ('decomposing', 0.024), ('formulas', 0.024), ('operation', 0.024), ('track', 0.023), ('nonnegative', 0.023), ('direct', 0.023), ('sensor', 0.023), ('line', 0.023), ('carlos', 0.023), ('timestep', 0.023), ('running', 0.022), ('onto', 0.022), ('refer', 0.022), ('matrix', 0.021), ('transforming', 0.021), ('discuss', 0.021), ('block', 0.02), ('storing', 0.019), ('algebraic', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 77 nips-2007-Efficient Inference for Distributions on Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

2 0.19241922 160 nips-2007-Random Features for Large-Scale Kernel Machines

Author: Ali Rahimi, Benjamin Recht

Abstract: To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user speciﬁed shiftinvariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classiﬁcation and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large-scale kernel machines. 1

3 0.08975286 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis

Author: Christoforos Christoforou, Paul Sajda, Lucas C. Parra

Abstract: Traditional analysis methods for single-trial classiﬁcation of electroencephalography (EEG) focus on two types of paradigms: phase locked methods, in which the amplitude of the signal is used as the feature for classiﬁcation, e.g. event related potentials; and second order methods, in which the feature of interest is the power of the signal, e.g. event related (de)synchronization. The procedure for deciding which paradigm to use is ad hoc and is typically driven by knowledge of the underlying neurophysiology. Here we propose a principled method, based on a bilinear model, in which the algorithm simultaneously learns the best ﬁrst and second order spatial and temporal features for classiﬁcation of EEG. The method is demonstrated on simulated data as well as on EEG taken from a benchmark data used to test classiﬁcation algorithms for brain computer interfaces. 1 1.1

4 0.081394888 142 nips-2007-Non-parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efﬁcient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the ﬁrst time a non-parametric coherent and consistent model capable of efﬁciently aggregating partially ranked data of different types. 1

5 0.07455761 141 nips-2007-New Outer Bounds on the Marginal Polytope

Author: David Sontag, Tommi S. Jaakkola

Abstract: We give a new class of outer bounds on the marginal polytope, and propose a cutting-plane algorithm for efﬁciently optimizing over these constraints. When combined with a concave upper bound on the entropy, this gives a new variational inference algorithm for probabilistic inference in discrete Markov Random Fields (MRFs). Valid constraints on the marginal polytope are derived through a series of projections onto the cut polytope. As a result, we obtain tighter upper bounds on the log-partition function. We also show empirically that the approximations of the marginals are signiﬁcantly more accurate when using the tighter outer bounds. Finally, we demonstrate the advantage of the new constraints for ﬁnding the MAP assignment in protein structure prediction. 1

6 0.058768965 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

7 0.058619037 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

8 0.058570087 130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes

9 0.049253754 63 nips-2007-Convex Relaxations of Latent Variable Training

10 0.047356922 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model

11 0.046907213 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

12 0.043541852 185 nips-2007-Stable Dual Dynamic Programming

13 0.043464139 203 nips-2007-The rat as particle filter

14 0.041404944 86 nips-2007-Exponential Family Predictive Representations of State

15 0.041087545 80 nips-2007-Ensemble Clustering using Semidefinite Programming

16 0.041007742 97 nips-2007-Hidden Common Cause Relations in Relational Learning

17 0.039924119 112 nips-2007-Learning Monotonic Transformations for Classification

18 0.039526664 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

19 0.038908426 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding

20 0.038217831 49 nips-2007-Colored Maximum Variance Unfolding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.143), (1, 0.003), (2, -0.031), (3, 0.015), (4, -0.027), (5, -0.031), (6, 0.001), (7, 0.069), (8, -0.062), (9, -0.012), (10, 0.015), (11, -0.006), (12, 0.079), (13, 0.005), (14, 0.004), (15, -0.092), (16, -0.001), (17, -0.006), (18, -0.034), (19, -0.001), (20, -0.015), (21, 0.022), (22, 0.094), (23, 0.1), (24, 0.022), (25, 0.013), (26, 0.023), (27, 0.003), (28, 0.087), (29, -0.103), (30, 0.042), (31, 0.177), (32, 0.136), (33, 0.135), (34, 0.001), (35, 0.035), (36, 0.276), (37, 0.069), (38, -0.014), (39, 0.042), (40, 0.084), (41, -0.194), (42, 0.087), (43, -0.065), (44, -0.214), (45, 0.178), (46, -0.14), (47, 0.026), (48, 0.015), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96316427 77 nips-2007-Efficient Inference for Distributions on Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

2 0.62129343 160 nips-2007-Random Features for Large-Scale Kernel Machines

Author: Ali Rahimi, Benjamin Recht

3 0.44625813 142 nips-2007-Non-parametric Modeling of Partially Ranked Data

Author: Guy Lebanon, Yi Mao

4 0.41469744 130 nips-2007-Modeling Natural Sounds with Modulation Cascade Processes

Author: Richard Turner, Maneesh Sahani

Abstract: Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences (∼ 1 s); phonemes (∼ 10−1 s); glottal pulses (∼ 10−2 s); and formants ( 10−3 s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscienceinspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difﬁcult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efﬁcient learning and inference algorithms. We demonstrate the success of this approach on a missing data task. 1

5 0.35729527 141 nips-2007-New Outer Bounds on the Marginal Polytope

Author: David Sontag, Tommi S. Jaakkola

6 0.32154945 196 nips-2007-The Infinite Gamma-Poisson Feature Model

7 0.31985143 35 nips-2007-Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms

8 0.31152406 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

9 0.30495694 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

10 0.30332965 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

11 0.30161989 15 nips-2007-A general agnostic active learning algorithm

12 0.28498933 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

13 0.2785311 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

14 0.27739707 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis

15 0.25010598 25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence

16 0.23693043 13 nips-2007-A Unified Near-Optimal Estimator For Dimension Reduction in $l \alpha$ ($0<\alpha\leq 2$) Using Stable Random Projections

17 0.23604585 99 nips-2007-Hierarchical Penalization

18 0.23444319 187 nips-2007-Structured Learning with Approximate Inference

19 0.22971241 131 nips-2007-Modeling homophily and stochastic equivalence in symmetric relational data

20 0.22711846 86 nips-2007-Exponential Family Predictive Representations of State

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.056), (13, 0.033), (16, 0.022), (18, 0.019), (21, 0.053), (34, 0.033), (35, 0.042), (47, 0.091), (49, 0.014), (83, 0.101), (85, 0.378), (87, 0.024), (90, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90852696 197 nips-2007-The Infinite Markov Model

Author: Daichi Mochihashi, Eiichiro Sumita

Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically inﬁnite order. By extending a stick-breaking prior, which is usually deﬁned on a unit interval, “vertically” to the trees of inﬁnite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efﬁcient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1

same-paper 2 0.85976231 77 nips-2007-Efficient Inference for Distributions on Permutations

Author: Jonathan Huang, Carlos Guestrin, Leonidas Guibas

3 0.83817643 135 nips-2007-Multi-task Gaussian Process Prediction

Author: Edwin V. Bonilla, Kian M. Chai, Christopher Williams

Abstract: In this paper we investigate multi-task learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on input-dependent features and a “free-form” covariance matrix over tasks. This allows for good ﬂexibility when modelling inter-task dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noise-free observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of inter-task transfer occurs. We evaluate the beneﬁts of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1

4 0.78816944 112 nips-2007-Learning Monotonic Transformations for Classification

Author: Andrew Howard, Tony Jebara

Abstract: A discriminative method is proposed for learning monotonic transformations of the training data while jointly estimating a large-margin classiﬁer. In many domains such as document classiﬁcation, image histogram classiﬁcation and gene microarray experiments, ﬁxed monotonic transformations can be useful as a preprocessing step. However, most classiﬁers only explore these transformations through manual trial and error or via prior domain knowledge. The proposed method learns monotonic transformations automatically while training a large-margin classiﬁer without any prior knowledge of the domain. A monotonic piecewise linear function is learned which transforms data for subsequent processing by a linear hyperplane classiﬁer. Two algorithmic implementations of the method are formalized. The ﬁrst solves a convergent alternating sequence of quadratic and linear programs until it obtains a locally optimal solution. An improved algorithm is then derived using a convex semideﬁnite relaxation that overcomes initialization issues in the greedy optimization problem. The eﬀectiveness of these learned transformations on synthetic problems, text data and image data is demonstrated. 1

5 0.57295424 141 nips-2007-New Outer Bounds on the Marginal Polytope

Author: David Sontag, Tommi S. Jaakkola

6 0.52586997 105 nips-2007-Infinite State Bayes-Nets for Structured Domains

7 0.51429874 23 nips-2007-An Analysis of Convex Relaxations for MAP Estimation

8 0.5092327 63 nips-2007-Convex Relaxations of Latent Variable Training

9 0.50484383 134 nips-2007-Multi-Task Learning via Conic Programming

10 0.50079274 4 nips-2007-A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

11 0.49976164 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

12 0.49921077 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

13 0.4982309 97 nips-2007-Hidden Common Cause Relations in Relational Learning

14 0.49475616 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

15 0.4896411 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

16 0.48687962 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging

17 0.48541233 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

18 0.48532227 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration

19 0.4826895 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

20 0.48162812 156 nips-2007-Predictive Matrix-Variate t Models