nips nips2012 nips2012-111 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. [sent-7, score-0.435]
2 In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. [sent-9, score-0.589]
3 This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. [sent-10, score-0.51]
4 We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. [sent-11, score-1.083]
5 1 Introduction Bipartite matching problems (BMPs), which involve mapping one set of items to another, are ubiquitous, with applications ranging from computational biology to information retrieval to computer vision. [sent-12, score-0.636]
6 The features for any two items do not provide a natural measure of compatibility between the items, i. [sent-15, score-0.326]
7 Consequently the goal of learning is to create a mapping from the item features to the target matches such that when an unlabeled instance is presented the same mapping can be applied to accurately infer the matches. [sent-18, score-0.294]
8 Recently there has been a flurry of new methods for sampling for bipartite matching problems. [sent-26, score-0.428]
9 However, to the best of our knowledge, even for simple versions of bipartite matching problems, no efficient sampler exists. [sent-28, score-0.631]
10 We compare the efficiency and performance of our sampler to others on two applications. [sent-30, score-0.25]
11 1 2 Problem Formulation A standard BMP consists of the two sets of N items U = {u1 , . [sent-31, score-0.298]
12 The goal is to find an assignment of the items so that every item in U is matched to exactly one item in V and no two items share the same match. [sent-38, score-1.171]
13 In this problem an assignment corresponds to a permutation π where π is a bijection {1, . [sent-39, score-0.366]
14 , N }, mapping each item in U to its match in V ; we use the terms assignment and permutation interchangeably. [sent-45, score-0.637]
15 We define π(i) = j to denote the index of a match vπ(i) = vj for item ui in π and use π −1 (j) = i to denote the reverse. [sent-46, score-0.663]
16 Permutations have a useful property that any subset of the permutation also constitutes a valid permutation with respect to the items in the subset. [sent-47, score-0.778]
17 Given a full permutation π we define π1:t (π1:0 = ∅) as a partial permutation of only the first t items in U . [sent-49, score-0.759]
18 The energy of a given assignment is typically formulated as a combination of ranks and the model’s output from the query-document features. [sent-55, score-0.416]
19 For example in [12] the energy is defined as: E(π, θ) = − 1 N N θi (N − π(i) + 1) (2) i=1 where θi is a score assigned by the model to ui . [sent-56, score-0.278]
20 For example in [17] the energy is given by: E(π, θ) = 1 |ψ| N u v θ, (ψi − ψπ(i) )2 (3) i=1 u v where ψi and ψπ(i) are feature descriptors for points ui and vπ(i) . [sent-59, score-0.307]
21 It is important to note here that for all models where the energy is additive we can compute the energy E(π1:t , θ) for any partial permutation π1:t by summing the potentials only over the t assignments in π1:t . [sent-61, score-0.74]
22 A particular instance of BMP that has been studied extensively is the maximum weight bipartite matching problem (WBMP). [sent-69, score-0.381]
23 Finding the assignment with the maximum energy is tractable and can be solved in O(N 3 ) [16]. [sent-71, score-0.343]
24 The majority of the proposed samplers are designed for 2 WBMPs and cannot be applied to the more general BMPs where the energy includes higher order potentials. [sent-73, score-0.315]
25 There is thus an evident need to develop an effective sampler applicable to any BMP distribution. [sent-75, score-0.25]
26 3 Related Approaches In this section we briefly describe existing sampling approaches, some of which have been developed specifically for bipartite matching problems while others come from matrix permanent research. [sent-76, score-0.573]
27 To do that we start with some initial assignment π and consider a subset of items in U ; for illustration purposes we will use two items ui and uj . [sent-79, score-0.882]
28 Given the selected subset of items the Gibbs sampler considers all possible assignment swaps within this subset. [sent-80, score-0.804]
29 In our example there are only two possibilities: leave π unchanged or swap π(i) with π(j) to produce a new permutation π . [sent-81, score-0.276]
30 Conditioned on the assignment of all the other items in U that were not selected, the probability of each permutation is: p(π |π\{i,j} ) = exp(−E(π , θ)) exp(−E(π, θ)) + exp(−E(π , θ)) p(π|π\{i,j} ) = 1 − p(π |π\{i,j} ) where π\{i,j} is permutation π with ui and uj removed. [sent-82, score-0.986]
31 The main reason for this is that the path from one probable assignment to another using only pairwise swaps is likely to go through regions that have very low probability [5]. [sent-86, score-0.258]
32 This makes it very unlikely that those moves will be accepted, which typically traps the sampler in one mode. [sent-87, score-0.301]
33 Thus, the local structure of the Gibbs sampler is likely to be inadequate for problems of the type considered here, in which several probable assignments will produce well-separated modes. [sent-88, score-0.416]
34 2 Chain-Based Approaches Chain-based methods extend the assignment swap idea behind the Gibbs sampler to generate samples more efficiently from WBMP distributions. [sent-90, score-0.537]
35 Instead of randomly choosing subsets of items to swap, chain-based method generate a sequence (chain) of interdependent swaps. [sent-91, score-0.333]
36 Given a (random) starting permutation π, an item ui (currently matched with vπ(i) ) is selected at random and a new match vj is proposed with probability p(ui , vj |θ) where p depends on the unary potential φ(ui , vj , θ) in the WBMP energy (see Equation 4). [sent-92, score-1.781]
37 Now, assuming that the match {ui , vj }, is selected, matches {ui , vπ(i) } and {uπ−1 (j) , vj } are removed from π and {ui , vj } is added to make π . [sent-93, score-1.046]
38 After this change uπ−1 (j) and vπ(i) are no longer matched to any item so π is a partial assignment. [sent-94, score-0.325]
39 This chain-like match sampling is repeated either until π is a complete assignment or a termination criteria is reached. [sent-96, score-0.317]
40 , [5] empirically demonstrated that the chain flipping sampler can mix better than the Gibbs sampler when applied to multimodal distributions. [sent-99, score-0.618]
41 First, unlike the Gibbs sampler which always maintains a valid assignment, the intermediate assignments π in chain-based methods are incomplete. [sent-101, score-0.421]
42 This means that the chain either has to be run until a valid assignment is generated [5] or terminated early and produce an incomplete assignment [11]. [sent-102, score-0.528]
43 In the first case the sampler has a non-deterministic run-time whereas in the second case the incomplete assignment can not be taken as a valid sample from the model. [sent-103, score-0.493]
44 Items are U = {u1 , u2 , u3 } and V = {v1 , v2 , v3 }; the reference permutation is σ = {2, 3, 1}. [sent-107, score-0.311]
45 Under this model a permutation π is generated by first selecting item vπ(1) from the set of N items and placing it in the first position, then selecting vπ(2) from the remaining N − 1 items and placing it the second position, and so on until all N items are placed. [sent-133, score-1.321]
46 Our approach is based on the observation that the sequential procedure behind the Plackett-Luce model can also be extended to generate matches between item sets. [sent-142, score-0.341]
47 Instead of placing items into ranked positions we can think of the Plackett-Luce generative process as sequentially matching ranks to the items in V , as illustrated in the top row of Figure 1. [sent-143, score-0.933]
48 To generate the permutation π = {3, 1, 2} the Plackett-Luce model first matches rank 1 with vπ(1) = v2 then rank 2 with vπ(2) = v3 and finally rank 3 with vπ(3) = v1 . [sent-144, score-0.511]
49 Unlike ranks, items in U do not have a natural order so we use a reference permutation σ, which specifies the order in which items in U are matched. [sent-146, score-0.907]
50 Formally the sequential matching process proceeds as follows: given some reference permutation σ, we start with an empty assignment π1:0 = ∅. [sent-149, score-0.788]
51 , vjN } denotes the set of items not matched in π1:t−1 . [sent-156, score-0.376]
52 Note that similarly to the Plackett-Luce model, |V \ π1:t−1 | = N − t + 1 so at each iteration, uσ(t) will have N − t + 1 left over items in V \ π1:t−1 to match with. [sent-157, score-0.403]
53 We define the conditional probability of each such match to be p(vj |uσ(t) , π1:t−1 ), vj ∈V \π1:t−1 p(vj |uσ(t) , π1:t−1 ) = 1. [sent-158, score-0.397]
54 After N iterations the permutation π1:N = π is produced with probability: N p(vπ(σ(t)) |uσ(t) , π1:t−1 ) Q(π|σ) = (6) t=1 where vπ(σ(t)) is a match for uσ(t) in π. [sent-159, score-0.339]
55 The conditional match probabilities depend on both the current item uσ(t) and on the partial assignment π1:t−1 . [sent-160, score-0.558]
56 Introducing this dependency generalizes the Plackett-Luce model which only takes into account that the items in π1:t−1 are already matched but does not take into account how these items are matched. [sent-161, score-0.697]
57 This dependency becomes very important when the energy contains pairwise and/or higher order potentials as it allows us to compute the change in energy for each new match, in turn allowing for close approximations to the target BMP distribution. [sent-162, score-0.516]
58 1 The important consequence of this proposition is that it allows us to work with a very rich class of matching probabilities with arbitrary dependencies and still obtain a valid distribution over assignments with a simple way to generate exact samples from it. [sent-164, score-0.564]
59 1 Proposal Distribution Given the general matching probabilities the goal is to define them so that the resulting proposal distribution Q matches the target distribution as closely as possible. [sent-168, score-0.586]
60 The partial energy ignores all the items that are not matched in π1:t and thus provides an estimate of the ”current” energy at each iteration t. [sent-170, score-0.791]
61 Using partial energies we can also find the changes in energy when a given item is matched. [sent-171, score-0.403]
62 However, in this form we see that p(vj |uσ(t) , π1:t−1 ) is directly related to the change in the partial energy 1 The proof is in the supplementary material. [sent-175, score-0.259]
63 Thus, the matching choices will be made solely based on the changes in the partial energy. [sent-177, score-0.296]
64 × = ∗ ∗ Z1 (uσ(1) , π1:0 ) ZN (uσ(N ) , π1:N −1 ) Z ∗ (π, σ) Here Z ∗ (π, σ) is the normalization factor which depends both on the reference permutation σ and the generated assignment π. [sent-181, score-0.476]
65 The numerator remains the exponent of the energy but the denominator is no longer a constant; rather it is a function which depends on the generated assignment and the reference permutation. [sent-183, score-0.453]
66 Note that the proposal distribution defined above can be used to generate samples for any target distribution with arbitrary energy consisting of single and/or higher order potentials. [sent-184, score-0.49]
67 To the best of our knowledge aside from the Gibbs sampler this is the only sampling procedure that can be applied to arbitrary BMP distributions. [sent-185, score-0.297]
68 To achieve this effect with the sequential matching model we introduce an additional parameter ρ which we refer to as temperature: p(vj |uσ(t) , π1:t−1 , ρ) ∝ exp(−E(H(vj , uσ(t) , π1:t−1 ), θ)/ρ). [sent-188, score-0.312]
69 Decreasing ρ leads to sharp proposal distributions typically highly skewed towards one specific assignment, while increasing ρ makes the proposal distribution approach the uniform distribution. [sent-189, score-0.306]
70 To ensure that the SM sampler converges to the required distribution we demonstrate that it satisfies the three requisite properties: detailed balance, ergodicity, and aperiodicity [15]. [sent-191, score-0.318]
71 3 Reference Permutation Fixing the reference permutation σ yields a Algorithm 1 Sequential Matching (SM) state independent sampler. [sent-197, score-0.311]
72 Initialize π1:0 = ∅ However, for the general energy based distrifor t = 1 to N do {generate sample from Q(·|σ)} butions considered here finding the MAP state Find a match vj for uσ(t) using: can be very expensive and in many cases inp(vj |uσ(t) , π1:t−1 , ρ) tractable. [sent-199, score-0.575]
73 Moreover, even if MAP can be found Add {uσ(t) , vj } to π1:t−1 to get π1:t efficiently there is still no guarantee that using it end for as the reference permutation will lead to a good Calculate forward probability: sampler. [sent-200, score-0.603]
74 To avoid these problems we use a state Q(π|σ) = N p(vπ(σ(t)) |uσ(t) , π1:t−1 , ρ) t=1 Calculate backward probability: dependent sampler where the reference permuQ(σ|π) = N p(vσ(π(t)) |uπ(t) , σ1:t−1 , ρ) tation σ is updated every time a sample gets act=1 cepted. [sent-201, score-0.386]
75 In the matching example (bottom row if U nif orm(0, 1) < exp(−E(π,θ))Q(σ|π) then exp(−E(σ,θ))Q(π|σ) of Figure 1) if the new match at t = 3 is acσ←π end if cepted then σ would be updated to {3, 1, 2}. [sent-202, score-0.363]
76 Algorithm 1 summarizes the Metropolis-Hastings procedure for the state dependent sequential matching sampler. [sent-204, score-0.312]
77 5 Experiments To test the sequential matching sampling approach we conducted extensive experiments. [sent-205, score-0.359]
78 We considered document ranking and image matching, two popular applications of BMP; and for the sake of 6 Table 1: Average Hellinger distances for learning to rank (left half) and image matching (right half) problems. [sent-206, score-0.449]
79 For N = 50 we were unable to get a single sample from the RP sampler for any c in the allocated time limit (over 5 minutes). [sent-209, score-0.335]
80 When comparing the samplers we concentrated on evaluating how well the Monte Carlo estimates of probabilities produced by the samplers approximate the true distribution P . [sent-325, score-0.351]
81 When target probabilities are known this method of evaluation provides a good estimate of performance since the ultimate goal of any sampler is to approximate P as closely as possible. [sent-326, score-0.376]
82 Computing D exactly quickly becomes intractable as the number of items grows. [sent-332, score-0.298]
83 To overcome this problem we note that if a given permutation π is not generated by any of the samplers then the term P (π)Q(π) is 0 and does not affect the resulting estimate of D for any sampler. [sent-333, score-0.317]
84 Since any valid sampler will eventually produce samples from the target distribution, we tested the methods with short chain lengths. [sent-341, score-0.519]
85 Furthermore, to make comparisons fair we used the block GB sampler with the block size of 7 (the largest computationally feasible size) as the reference point. [sent-344, score-0.36]
86 For each query the distribution over assignments was parametrized by the energy given in Equation 2. [sent-353, score-0.325]
87 2 From the table it is seen that all the samplers perform equally well when the number of items is small (N = 8). [sent-359, score-0.414]
88 However, as the number of items increases SM significantly outperforms all other samplers. [sent-360, score-0.298]
89 For N = 50 we were unable to get a single sample from the RP sampler after running it for over 5 minutes. [sent-362, score-0.291]
90 Consequently the total rejection probability increases linearly with the number of items N . [sent-367, score-0.325]
91 2 Image Matching For an image matching task we followed the framework of Petterson et al. [sent-372, score-0.267]
92 The target distribution over matchings was parametrized by the energy given by Equation 3 where ψ’s are the SIFT feature descriptors. [sent-377, score-0.295]
93 This is the likely cause of the poor performance of the GB and CF samplers as both samplers propose new assignments through local moves. [sent-392, score-0.325]
94 As in the learning to rank experiments, we found the rejection rate for the RP sampler to increase significantly for N ≥ 25. [sent-393, score-0.347]
95 We were unable to obtain any samples in the allocated time (over 5 mins) from the RP sampler for N = 50. [sent-394, score-0.37]
96 6 Conclusion In this paper we introduced a new sampling approach for bipartite matching problems based on a generalization of the Plackett-Luce model. [sent-396, score-0.454]
97 In this approach the matching probabilities at each stage are conditioned on the partial assignment made to that point. [sent-397, score-0.524]
98 We also plan to investigate the relationship between the proposal distribution produced by sequential matching and the target one. [sent-401, score-0.543]
99 Protein structure comparison using bipartite graph matching and its application to protein structure classification. [sent-561, score-0.417]
100 A bipartite graph matching framework for finding correspondences between structural elements in two proteins. [sent-579, score-0.381]
wordName wordTfidf (topN-words)
[('bmp', 0.337), ('items', 0.298), ('vj', 0.292), ('sampler', 0.25), ('wbmp', 0.247), ('matching', 0.237), ('permutation', 0.201), ('energy', 0.178), ('item', 0.166), ('assignment', 0.165), ('sm', 0.155), ('bipartite', 0.144), ('hellinger', 0.139), ('gb', 0.137), ('rp', 0.126), ('permanent', 0.119), ('samplers', 0.116), ('proposal', 0.112), ('cf', 0.112), ('reference', 0.11), ('match', 0.105), ('ui', 0.1), ('assignments', 0.093), ('matched', 0.078), ('valid', 0.078), ('gibbs', 0.076), ('exp', 0.075), ('sequential', 0.075), ('chain', 0.07), ('rank', 0.07), ('swaps', 0.069), ('bmps', 0.067), ('matches', 0.065), ('target', 0.063), ('probabilities', 0.063), ('ranking', 0.06), ('partial', 0.059), ('zt', 0.056), ('unary', 0.055), ('ipping', 0.053), ('partitioning', 0.052), ('swap', 0.052), ('ranks', 0.049), ('sampling', 0.047), ('aperiodicity', 0.045), ('dellaert', 0.045), ('insertion', 0.045), ('ramos', 0.045), ('allocated', 0.044), ('unable', 0.041), ('recursive', 0.041), ('retrieval', 0.041), ('modes', 0.04), ('volkovs', 0.04), ('acceptance', 0.037), ('correspondence', 0.037), ('protein', 0.036), ('temperature', 0.036), ('samples', 0.035), ('distributions', 0.035), ('generate', 0.035), ('ranging', 0.034), ('petterson', 0.034), ('produced', 0.033), ('caetano', 0.033), ('parametrized', 0.031), ('potentials', 0.031), ('placing', 0.03), ('image', 0.03), ('descriptors', 0.029), ('compatibility', 0.028), ('ergodicity', 0.028), ('toronto', 0.028), ('half', 0.028), ('rejection', 0.027), ('moves', 0.027), ('run', 0.027), ('problems', 0.026), ('mix', 0.026), ('combinatorial', 0.024), ('typically', 0.024), ('yahoo', 0.024), ('vn', 0.024), ('zemel', 0.024), ('aggregation', 0.024), ('probable', 0.024), ('accepted', 0.024), ('distribution', 0.023), ('dependency', 0.023), ('equation', 0.023), ('produce', 0.023), ('mcmc', 0.022), ('distances', 0.022), ('selected', 0.022), ('change', 0.022), ('multimodal', 0.022), ('higher', 0.021), ('row', 0.021), ('robotics', 0.021), ('uj', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
2 0.20044534 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
3 0.18126838 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
Author: Michael C. Hughes, Erik B. Sudderth, Emily B. Fox
Abstract: Applications of Bayesian nonparametric methods require learning and inference algorithms which efficiently explore models of unbounded complexity. We develop new Markov chain Monte Carlo methods for the beta process hidden Markov model (BP-HMM), enabling discovery of shared activity patterns in large video and motion capture databases. By introducing split-merge moves based on sequential allocation, we allow large global changes in the shared feature structure. We also develop data-driven reversible jump moves which more reliably discover rare or unique behaviors. Our proposals apply to any choice of conjugate likelihood for observed data, and we show success with multinomial, Gaussian, and autoregressive emission models. Together, these innovations allow tractable analysis of hundreds of time series, where previous inference required clever initialization and lengthy burn-in periods for just six sequences. 1
4 0.15809371 60 nips-2012-Bayesian nonparametric models for ranked data
Author: Francois Caron, Yee W. Teh
Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1
5 0.15750103 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
6 0.15095222 75 nips-2012-Collaborative Ranking With 17 Parameters
7 0.12886012 126 nips-2012-FastEx: Hash Clustering with Exponential Families
8 0.1239037 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
9 0.11997414 30 nips-2012-Accuracy at the Top
10 0.11706315 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
11 0.11385672 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
12 0.11185487 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
13 0.1102744 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
14 0.10839583 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
15 0.098345503 41 nips-2012-Ancestor Sampling for Particle Gibbs
16 0.094267823 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
17 0.093233608 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
18 0.091293335 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
19 0.079220161 205 nips-2012-MCMC for continuous-time discrete-state systems
20 0.075783253 280 nips-2012-Proper losses for learning from partial labels
topicId topicWeight
[(0, 0.207), (1, 0.048), (2, -0.025), (3, -0.032), (4, -0.13), (5, -0.127), (6, -0.009), (7, 0.089), (8, 0.101), (9, 0.209), (10, -0.161), (11, 0.033), (12, -0.157), (13, 0.004), (14, 0.015), (15, -0.078), (16, -0.038), (17, -0.104), (18, 0.042), (19, -0.006), (20, 0.15), (21, -0.058), (22, -0.069), (23, -0.062), (24, -0.081), (25, 0.019), (26, -0.061), (27, 0.027), (28, 0.036), (29, -0.007), (30, -0.053), (31, 0.098), (32, -0.024), (33, 0.005), (34, 0.014), (35, 0.024), (36, -0.104), (37, 0.013), (38, -0.044), (39, -0.04), (40, 0.097), (41, -0.033), (42, -0.154), (43, -0.047), (44, 0.033), (45, -0.017), (46, -0.013), (47, 0.019), (48, 0.015), (49, -0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.95554644 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
2 0.74137032 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
Author: Michael C. Hughes, Erik B. Sudderth, Emily B. Fox
Abstract: Applications of Bayesian nonparametric methods require learning and inference algorithms which efficiently explore models of unbounded complexity. We develop new Markov chain Monte Carlo methods for the beta process hidden Markov model (BP-HMM), enabling discovery of shared activity patterns in large video and motion capture databases. By introducing split-merge moves based on sequential allocation, we allow large global changes in the shared feature structure. We also develop data-driven reversible jump moves which more reliably discover rare or unique behaviors. Our proposals apply to any choice of conjugate likelihood for observed data, and we show success with multinomial, Gaussian, and autoregressive emission models. Together, these innovations allow tractable analysis of hundreds of time series, where previous inference required clever initialization and lengthy burn-in periods for just six sequences. 1
3 0.66935289 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
4 0.66681373 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
5 0.63088107 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon, Seung-jean Kim
Abstract: Recent approaches to collaborative filtering have concentrated on estimating an algebraic or statistical model, and using the model for predicting missing ratings. In this paper we observe that different models have relative advantages in different regions of the input space. This motivates our approach of using stagewise linear combinations of collaborative filtering algorithms, with non-constant combination coefficients based on kernel smoothing. The resulting stagewise model is computationally scalable and outperforms a wide selection of state-of-the-art collaborative filtering algorithms. 1
6 0.63072646 75 nips-2012-Collaborative Ranking With 17 Parameters
7 0.58053142 30 nips-2012-Accuracy at the Top
8 0.55531055 60 nips-2012-Bayesian nonparametric models for ranked data
9 0.54852664 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
10 0.54489583 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
11 0.51389921 205 nips-2012-MCMC for continuous-time discrete-state systems
12 0.5026772 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy
13 0.48526067 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
14 0.48237506 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
15 0.47211993 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
16 0.46927628 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
17 0.46522292 41 nips-2012-Ancestor Sampling for Particle Gibbs
18 0.46016324 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
19 0.45635805 155 nips-2012-Human memory search as a random walk in a semantic network
20 0.44755888 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
topicId topicWeight
[(0, 0.037), (21, 0.024), (38, 0.161), (39, 0.013), (42, 0.016), (54, 0.027), (55, 0.016), (74, 0.1), (76, 0.191), (80, 0.103), (85, 0.163), (92, 0.072)]
simIndex simValue paperId paperTitle
1 0.91644514 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding
Author: Brett Vintch, Andrew Zaharia, J Movshon, Hhmi) Hhmi), Eero P. Simoncelli
Abstract: Many visual and auditory neurons have response properties that are well explained by pooling the rectified responses of a set of spatially shifted linear filters. These filters cannot be estimated using spike-triggered averaging (STA). Subspace methods such as spike-triggered covariance (STC) can recover multiple filters, but require substantial amounts of data, and recover an orthogonal basis for the subspace in which the filters reside rather than the filters themselves. Here, we assume a linear-nonlinear–linear-nonlinear (LN-LN) cascade model in which the first linear stage is a set of shifted (‘convolutional’) copies of a common filter, and the first nonlinear stage consists of rectifying scalar nonlinearities that are identical for all filter outputs. We refer to these initial LN elements as the ‘subunits’ of the receptive field. The second linear stage then computes a weighted sum of the responses of the rectified subunits. We present a method for directly fitting this model to spike data, and apply it to both simulated and real neuronal data from primate V1. The subunit model significantly outperforms STA and STC in terms of cross-validated accuracy and efficiency. 1
same-paper 2 0.90615773 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
3 0.86588353 260 nips-2012-Online Sum-Product Computation Over Trees
Author: Mark Herbster, Stephen Pasteris, Fabio Vitale
Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1
4 0.86063844 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto
Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of infinite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1
5 0.85969198 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
Author: Ryan Kiros, Csaba Szepesvári
Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1
6 0.85699987 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization
7 0.85695589 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model
8 0.85575557 168 nips-2012-Kernel Latent SVM for Visual Recognition
9 0.85573727 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
10 0.85476983 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
11 0.8538028 38 nips-2012-Algorithms for Learning Markov Field Policies
12 0.85243315 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity
13 0.85225368 148 nips-2012-Hamming Distance Metric Learning
14 0.8518855 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
15 0.85177267 180 nips-2012-Learning Mixtures of Tree Graphical Models
16 0.85152221 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
17 0.85088003 197 nips-2012-Learning with Recursive Perceptual Representations
18 0.85086679 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
19 0.85071105 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
20 0.85059142 199 nips-2012-Link Prediction in Graphs with Autoregressive Features