nips nips2011 nips2011-229 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. [sent-5, score-0.634]
2 In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. [sent-7, score-0.558]
3 In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. [sent-8, score-0.763]
4 1 Introduction Graphical models are useful for representing relationships between large numbers of random variables in probabilistic models spanning a wide range of applications, including information extraction and data integration. [sent-11, score-0.23]
5 Exact inference in these models is often computationally intractable due to the dense dependency structures required in many real world problems, thus there exists a large body of work on both variational and sampling approximations to inference that help manage large treewidth. [sent-12, score-0.265]
6 The proliferation of interconnected data and the desire to model it has given rise to graphical models with millions or even billions of random variables. [sent-14, score-0.194]
7 Unfortunately, there has been little research devoted to approximate inference in graphical models that are large in terms of their number of variables. [sent-15, score-0.245]
8 Fortunately, many inference needs are instigated by queries issued by users interested in particular random variables. [sent-17, score-0.201]
9 In these situations not all variables are of equal relevance to the user’s query; some variables become observed given the query, others become statistically independent given the query, and the remaining variables are typically marginalized. [sent-22, score-0.326]
10 Thus, a user-generated query provides a tremendous amount of information that can be exploited by an intelligent inference procedure. [sent-23, score-0.542]
11 Unfortunately, traditional approaches to inference such as loopy belief propagation (BP) and Gibbs sampling are query agnostic in the sense that they fail to take advantage of this knowledge and treat each variable as equally relevant. [sent-24, score-0.869]
12 Surprisingly, there has been little research on query specific inference and the only existing approaches focus on loopy BP [3, 4]. [sent-25, score-0.591]
13 In this paper we propose a query-aware approach to Markov chain Monte Carlo (QAM) that exploits the dependency structure of the graph and the query to achieve faster convergence to the answer. [sent-26, score-0.696]
14 Our method selects variables for sampling in proportion to their influence on the query variables. [sent-27, score-0.633]
15 We 1 determine this influence using a computationally tractable generalization of mutual information between the query variables and each variable in the graph. [sent-28, score-0.743]
16 Because our query-specific approach to inference is based on MCMC, we can provide arbitrarily close approximations to the query answer while also scaling to graphs whose structure and unrolled factor density would ordinarily preclude both exact and belief propagation inference methods. [sent-29, score-0.777]
17 This is essential for the method to be deployable in real-world probabilistic databases where even a seemingly innocuous relational algebra query over a simple fully independent structure can produce an inference problem that is #P-hard [5]. [sent-30, score-0.749]
18 We demonstrate dramatic improvements over traditional Markov chain Monte Carlo sampling methods across a wide array of models of diverse structure. [sent-31, score-0.277]
19 A factor graph G := x, ψ is a bipartite graph consisting of n random variables x = {xi }n and m factors ψ = {ψi }m . [sent-34, score-0.182]
20 Each variable xi has a domain Xi , and we notate the entire 1 1 domain space of the random variables (x) as X with associated σ-algebra Ω. [sent-35, score-0.24]
21 2 Queries on Graphical Models Informally, a query on a graphical model is a request for some quantity of interest that the graphical model is capable of providing. [sent-40, score-0.734]
22 That is, a query is a function mapping the graphical model to an answer set. [sent-41, score-0.669]
23 While in the general case, a query may contain arbitrary functions over the support of a graphical model, for this work we consider queries of the marginal form. [sent-43, score-0.776]
24 That is a query Q consists of three parts Q = xq , xl , xe . [sent-44, score-1.21]
25 Where xq is the set of query variables whose marginal distributions (or MAP configuration) are the answer to the query, xe is a set of evidence variables whose values are observed, and xl is the set of latent variables over which one typically marginalizes to obtain the statistically sound answer. [sent-45, score-1.709]
26 Note that this class of queries is remarkably general and includes queries that require expectations over arbitrary functions. [sent-46, score-0.238]
27 We can see this because a function over the graphical model (or a subset of the graphical model) is itself a random variable, and can therefore be included in xq . [sent-47, score-0.666]
28 1 More precisely, a query over a graphical model is: Q(xq , xl , xe , π) = π(xq |xe = ve ) = π(xq , xl |xe = ve ) (2) vl we assume that Ω is well defined with respect to marginalization over arbitrary subsets of variables. [sent-48, score-1.417]
29 3 Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is an important inference method for graphical models where computing the normalization constant Z is intractable. [sent-50, score-0.377]
30 However, many of the results 1 Research in probabilistic databases has demonstrated that a large class of relational algebra queries can be represented as graphical models and answered using statistical queries of the this form [6, 7]. [sent-54, score-0.578]
31 Markov chain Monte Carlo produces a sequence of states {si }∞ in a state space S according to 1 a transition kernel K : S × S → R+ , which in the discrete case is a stochastic matrix: for all s ∈ S K(s, ·) is a valid probability measure and for all s ∈ S K(·, s) is a measurable function. [sent-56, score-0.25]
32 Since we are concerned with MCMC for inference in graphical models, we will from now on let S:=X, and use X instead. [sent-57, score-0.219]
33 Under certain conditions the Markov chain is said to be ergodic, then the chain exhibits two types of convergence. [sent-58, score-0.264]
34 At each time step, the Markov chain is in a time-specific distribution over the state space (encoding the probability of being in a particular state at time t). [sent-61, score-0.251]
35 Under certain conditions and regardless of the initial distribution, the Markov chain will converge to the stationary (invariant) distribution π. [sent-64, score-0.211]
36 4 MCMC Inference in Graphical Models MCMC is used for inference in graphical models by constructing a Markov chain with invariant distribution π (given by the graphical model). [sent-67, score-0.617]
37 As a result, generating samples from graphical models with Metropolis-Hastings is usually inexpensive. [sent-71, score-0.163]
38 3 3 Query Specific MCMC Given a query Q = xq , xl , xe , and a probability distribution π encoded by a graphical model G with factors ψ and random variables x, the problem of query specific inference is to return the highest fidelity answer to Q given a possible time budget. [sent-72, score-2.128]
39 We can put more precision on this statement by defining “highest fidelity” as closest to the truth in total variation distance. [sent-73, score-0.154]
40 Our approach for query specific inference is based on the Metropolis Hastings algorithm described in Section 2. [sent-74, score-0.542]
41 2: Sample a new value for xi according to some distribution q(Xi ) over that variable’s domain, leave all other variables unchanged and return the new state s . [sent-77, score-0.24]
42 In brief, this strategy arrives at a new state s from a current state s by simply updating the value of one variable at a time. [sent-78, score-0.16]
43 In traditional MCMC inference, where the marginal distributions of all variables are of equal interest, the variables are usually sampled in a deterministic order, or selected 1 uniformly at random; that is, p(i) = n induces a uniform distribution over the integers 1, 2, · · · , n. [sent-79, score-0.394]
44 However, given a query Q, it is reasonable to choose a p that more frequently selects the query variables for sampling. [sent-80, score-1.018]
45 Clearly, the query variable marginals depend on the remaining latent variables, so we must tradeoff sampling between query and non-query variables. [sent-81, score-1.179]
46 A key observation is that not all latent variables influence the query variables equally. [sent-82, score-0.697]
47 A fundamental question raised and addressed in this paper is: how do we pick a variable selection distribution p for a query Q to obtain the highest fidelity answer under a finite time budget. [sent-83, score-0.647]
48 We propose to select variables based on their influence on the query variable according to the graphical model. [sent-84, score-0.773]
49 The π(x,y) mutual information I(x, y) = π(x, y) log( π(x)π(y) ) between two random variables measures the strength of their dependence. [sent-86, score-0.205]
50 It is easy to check that this quantity is the KL divergence between the joint distribution of the variables and the product of the marginals: I(x, y) = KL(π(x, y)||π(x)π(y)). [sent-87, score-0.163]
51 Let x and y be two random variables with marginal distributions π(x, y),π(x), π(y). [sent-92, score-0.158]
52 The influence ι(x, y) between x and y is ι(x, y) := f (π(x, y), π(x)π(y)) (8) If we let f be the KL divergence then ι becomes the mutual information; however, because MCMC convergence is more commonly assessed with total variation norm, we define an influence metric based on this choice for f . [sent-94, score-0.367]
53 In particular we define ιtv (x, y) := π(x, y) − π(x)π(y) tv . [sent-95, score-0.261]
54 As we will now show, the total variation influence (between the query variable and the latent variables) has the important property that it is exactly the error incurred from ignoring a single latent variable when sampling values for xq . [sent-96, score-1.351]
55 For example, suppose we design an approximate query specific sampler that saves computational resources by ignoring a particular random variable xl . [sent-97, score-0.823]
56 Then, the variable xl will remain at its burned-in value xl =vl for the duration of query specific sampling. [sent-98, score-1.056]
57 As a result, the chain will converge to the invariant distribution π(·|xl =vl ). [sent-99, score-0.235]
58 If we use this conditional distribution to approximate the marginal, then the expected error we incur is exactly the influence score under total variation distance. [sent-100, score-0.221]
59 If p(i) = 1(i = l) n−1 induces an MH kernel that neglects variable xl , then the expected total variation error ξtv of the resulting MH sampling procedure under the model is the total variation influence ιtv . [sent-102, score-0.757]
60 4 Proof: The resulting chain has stationary distribution π(xq |xl = vl ). [sent-103, score-0.414]
61 We are now justified in selecting variables proportional to their influence to reduce the error they assert on the query marginal. [sent-105, score-0.558]
62 Note, however, that computing either ιtv or the mutual information is as difficult as inference itself. [sent-107, score-0.189]
63 Thus, we define a computationally efficient variant of influence that we term the influence trail score. [sent-108, score-0.162]
64 The idea is to approximate the true influence as a product of factors along an active trail in the graph. [sent-109, score-0.194]
65 Let ρ = (x0 , x1 , · · · , xr ) be an active trail between the query variable xq and xi where x0 = xq and xr = xi . [sent-111, score-1.612]
66 Let φ(xi , xj ) be the approximate joint distribution between xi and xj according only to the mutual factors in their scopes. [sent-112, score-0.24]
67 The influence trail score with respect to an active trail ρ is r−1 τρ (xq , xi ) := f (φi (xi , xi+1 ), φi (xi )φi (xi+1 )) (9) i=1 The influence trail score is efficient to compute because all factors and variables outside the mutual scopes of each variable pair are ignored. [sent-114, score-0.925]
68 In the experimental results we evaluate both the influence and the influence trail and find that they perform similarly and outperform competing graph-based heuristics for determining p. [sent-115, score-0.162]
69 While in general it is difficult to uniformly state that one choice of p converges faster than another for all models and queries, we present the following analysis showing that even an approximate query aware sampler can exhibit faster finite time convergence progress than an exact sampler. [sent-116, score-0.703]
70 Let K be an exact MCMC kernel that converges to the correct stationary distribution and let L be an approximate kernel that exclusively samples the query variable and thus converges to the conditional distribution of the query variable. [sent-117, score-1.254]
71 Extrapolating Proposition 1, we know that the error incurred from neglecting to sample the latent variables is the influence ιtv between the joint distribution of the latent variables and the query variable. [sent-119, score-0.84]
72 Observe that L is simultaneously making progress towards two distributions: its own invariant distribution and the invariant distribution of K plus an error term. [sent-120, score-0.206]
73 The amount of time that L (the query only kernel) is closer to K’s stationary distribution πk can be determined by solving for t, 5 yielding the fixed point iteration: t= log (γlt + ιtv ) log γk (13) +ι The one-step approximation yields a non-trivial, but conservative bound: t ≥ log(γlγk tv ) . [sent-122, score-0.8]
74 This implies that the strategy of exclusively sampling the query variables can achieve faster short-term convergence to the correct invariant distribution even though asymptotic convergence is to the incorrect invariant distribution. [sent-124, score-0.958]
75 The decayed MCMC algorithm for filtering [12] can be thought of as a special case of our method where the model is a linear chain, and the query is for the last variable in the sequence. [sent-129, score-0.571]
76 MCMC has also recently been deployed in probabilistic databases [13] where it is possible to incorporate the deterministic constraints of a relational algebra query directly into a Metropolis-Hastings proposal distribution to obtain quicker answers [14, 15]. [sent-132, score-0.702]
77 A related idea from statistics is data augmentation (or auxiliary variable) approaches to sampling where latent variables are artificially introduced into the model to improve convergence of the original variables (e. [sent-133, score-0.39]
78 In this setting, we see QAM as a way of determining a more sophisticated variable selection strategy that can balance sampling efforts between the original and auxiliary variables. [sent-136, score-0.153]
79 5 Experiments In this section we demonstrate the effectiveness and broad applicability of query aware MCMC (QAM) by demonstrating superior convergence rates to the query marginals across a diverse range of graphical models that vary widely in structure. [sent-137, score-1.265]
80 In our experiments, we generate a wide range of graphical models and evaluate the convergence of each chain exactly, avoiding noisy empirical sampling error by performing exact computations with full transition kernels. [sent-138, score-0.488]
81 Query-only Metropolis-Hastings (qo): p(xi ) = 1(xq = xi ); on six different graphical models with varying parameters generated from a Beta(2,2) distribution (this ensures an interesting dynamic range over the event space). [sent-145, score-0.305]
82 For our experiments we randomly generate ten parameter settings for each of the six model types and measure convergence of the six chains to the the single-variable marginal query π(xq ) for each variable in each of the sixty realized models. [sent-154, score-0.824]
83 Convergence is measured using the total variation norm: π(xq ) − π(xq )(t) tv . [sent-155, score-0.415]
84 Generally, all the query specific sampling chains converge more quickly than the uniform baseline in the early iterations across every model. [sent-158, score-0.688]
85 The query-only and mutual information chain exhibit the most rapid convergence in the early stages of learning, with the query-only chain converging to an incorrect distribution, and the mutual information chain slowly converging during the later time stages. [sent-160, score-0.778]
86 Finally, notice that the influence-trail variant of total variation influence converges at a similar rate to the actual total variation influence, and in some cases converges more quickly (e. [sent-162, score-0.404]
87 In the next experiment, we demonstrate how the size of the graphical model affects convergence of the various chains. [sent-165, score-0.215]
88 In particular, we plot the convergence of all chains on six different hoop-structured models containing three, four, six, eight, ten, and twelve variables (Figure 2). [sent-166, score-0.309]
89 That is we measure the difference in convergence rates t t π − π0 KUnif tv − π − π0 KQAM tv so that points above the line x = 0 mean the QAM is closer to the answer than the uniform baseline and points below the line mean the QAM is further from the answer. [sent-168, score-0.729]
90 As expected, increasing the number of variables in the graph increases the opportunities for query specific sampling and thus increases QAM’s advantage over traditional MCMC. [sent-169, score-0.703]
91 6 Conclusion In this paper we presented a query-aware approach to MCMC, motivated by the need to answer queries over large scale graphical models. [sent-170, score-0.328]
92 We found that the query-aware sampling methods outperform the traditional Metropolis Hastings sampler across all models in the early time steps. [sent-171, score-0.171]
93 Further, as the number of variables in the models increase, the query aware samplers not only outperform the baseline for longer periods of time, but also exhibit more dramatic convergence rate improvements. [sent-172, score-0.755]
94 Thus, query specific sampling is a promising approach for approximately answering queries on realworld probabilistic databases (and relational models) that contain billions of variables. [sent-173, score-0.832]
95 An exciting area of future work is to combine query specific sampling with adaptive MCMC techniques allowing the kernel to evolve in response to the underlying distribution. [sent-175, score-0.572]
96 There has been little theoretical work on analyzing marginal convergence of MCMC chains and future work can help develop these tools. [sent-177, score-0.204]
97 20 Time 0 10 20 Time 30 40 50 0 10 Time 20 30 Time Figure 1: Convergence to the query marginals of the stationary distribution from an initial uniform distribution. [sent-203, score-0.661]
98 10 Time 8 Variables Improvement over uniform Time 0 10 20 30 Time Figure 2: Improvement over uniform p as the number of variables increases. [sent-222, score-0.212]
99 As number of variables increase, the improvements of the query specific techniques increase. [sent-224, score-0.558]
100 BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. [sent-252, score-0.184]
wordName wordTfidf (topN-words)
[('query', 0.46), ('xq', 0.392), ('tv', 0.261), ('xl', 0.259), ('vl', 0.203), ('mcmc', 0.194), ('qam', 0.164), ('uence', 0.163), ('trail', 0.162), ('graphical', 0.137), ('chain', 0.132), ('queries', 0.119), ('variation', 0.116), ('mutual', 0.107), ('xe', 0.099), ('variables', 0.098), ('inference', 0.082), ('wick', 0.082), ('variable', 0.078), ('convergence', 0.078), ('sampling', 0.075), ('answer', 0.072), ('chains', 0.066), ('hastings', 0.066), ('invariant', 0.066), ('marginals', 0.065), ('xi', 0.064), ('marginal', 0.06), ('databases', 0.06), ('uniform', 0.057), ('samplers', 0.054), ('metropolis', 0.054), ('delity', 0.054), ('loopy', 0.049), ('markov', 0.047), ('probabilistic', 0.047), ('vq', 0.047), ('joseph', 0.045), ('propagation', 0.044), ('traditional', 0.044), ('lt', 0.042), ('vldb', 0.042), ('stationary', 0.042), ('six', 0.041), ('daisy', 0.041), ('garofalakis', 0.041), ('gerome', 0.041), ('hoop', 0.041), ('kmh', 0.041), ('minos', 0.041), ('zhe', 0.041), ('state', 0.041), ('mh', 0.041), ('mccallum', 0.041), ('latent', 0.041), ('transition', 0.04), ('relational', 0.04), ('stages', 0.04), ('aware', 0.039), ('connected', 0.039), ('michael', 0.039), ('andrew', 0.038), ('total', 0.038), ('kernel', 0.037), ('distribution', 0.037), ('belief', 0.037), ('stats', 0.036), ('coreference', 0.036), ('advancement', 0.036), ('iarpa', 0.036), ('carlo', 0.035), ('improvement', 0.034), ('monte', 0.034), ('neglecting', 0.033), ('decayed', 0.033), ('converges', 0.033), ('extraction', 0.033), ('incurred', 0.032), ('factors', 0.032), ('statistically', 0.032), ('distance', 0.031), ('billions', 0.031), ('kl', 0.031), ('algebra', 0.03), ('quickly', 0.03), ('score', 0.03), ('fully', 0.03), ('ergodic', 0.03), ('bases', 0.029), ('proposal', 0.028), ('cancels', 0.028), ('divergence', 0.028), ('massachusetts', 0.028), ('amherst', 0.027), ('contract', 0.026), ('pw', 0.026), ('sampler', 0.026), ('graph', 0.026), ('models', 0.026), ('converging', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
2 0.1573371 231 nips-2011-Randomized Algorithms for Comparison-based Search
Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer
Abstract: This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.
3 0.15614508 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
4 0.13594514 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
5 0.13386175 226 nips-2011-Projection onto A Nonnegative Max-Heap
Author: Jun Liu, Liang Sun, Jieping Ye
Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1
6 0.12135402 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
7 0.11456724 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
8 0.11006974 157 nips-2011-Learning to Search Efficiently in High Dimensions
9 0.093419813 272 nips-2011-Stochastic convex optimization with bandit feedback
10 0.08960247 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
11 0.08458893 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
12 0.081212521 158 nips-2011-Learning unbelievable probabilities
13 0.077314079 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
14 0.075969502 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
15 0.074549213 8 nips-2011-A Model for Temporal Dependencies in Event Streams
16 0.073730856 55 nips-2011-Collective Graphical Models
17 0.073012874 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
18 0.072185941 168 nips-2011-Maximum Margin Multi-Instance Learning
19 0.070918195 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
20 0.069335632 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
topicId topicWeight
[(0, 0.193), (1, 0.011), (2, -0.034), (3, 0.013), (4, -0.05), (5, -0.097), (6, -0.064), (7, -0.137), (8, 0.083), (9, -0.039), (10, -0.057), (11, -0.081), (12, -0.032), (13, 0.025), (14, 0.154), (15, -0.037), (16, 0.082), (17, -0.001), (18, -0.123), (19, 0.069), (20, 0.037), (21, 0.031), (22, 0.202), (23, -0.1), (24, -0.153), (25, 0.037), (26, 0.189), (27, 0.107), (28, 0.022), (29, -0.089), (30, 0.024), (31, -0.187), (32, -0.125), (33, -0.025), (34, -0.129), (35, 0.077), (36, -0.052), (37, -0.106), (38, 0.055), (39, -0.039), (40, -0.023), (41, 0.029), (42, -0.029), (43, -0.036), (44, -0.104), (45, 0.063), (46, 0.029), (47, -0.041), (48, 0.022), (49, -0.109)]
simIndex simValue paperId paperTitle
same-paper 1 0.95719969 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
2 0.65968513 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
3 0.65163755 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
Author: Nir Ailon
Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1
4 0.58060038 231 nips-2011-Randomized Algorithms for Comparison-based Search
Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer
Abstract: This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.
5 0.52187067 8 nips-2011-A Model for Temporal Dependencies in Event Streams
Author: Asela Gunawardana, Christopher Meek, Puyang Xu
Abstract: We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams. We describe a closed-form Bayesian approach to learning these models, and describe an importance sampling algorithm for forecasting future events using these models, using a proposal distribution based on Poisson superposition. We then use synthetic data, supercomputer event logs, and web search query logs to illustrate that our learning algorithm can efficiently learn nonlinear temporal dependencies, and that our importance sampling algorithm can effectively forecast future events. 1
6 0.50988162 55 nips-2011-Collective Graphical Models
7 0.50752789 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
8 0.48601338 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
9 0.45910701 272 nips-2011-Stochastic convex optimization with bandit feedback
10 0.45204434 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
11 0.44652116 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
12 0.44586658 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
13 0.42390686 157 nips-2011-Learning to Search Efficiently in High Dimensions
14 0.41110498 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
15 0.40851593 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
16 0.39118898 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
17 0.38852739 64 nips-2011-Convergent Bounds on the Euclidean Distance
18 0.38656044 221 nips-2011-Priors over Recurrent Continuous Time Processes
19 0.36555561 301 nips-2011-Variational Gaussian Process Dynamical Systems
20 0.35670465 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
topicId topicWeight
[(0, 0.034), (4, 0.055), (20, 0.032), (26, 0.084), (31, 0.121), (33, 0.032), (43, 0.045), (44, 0.183), (45, 0.116), (56, 0.012), (57, 0.045), (63, 0.014), (65, 0.015), (74, 0.049), (83, 0.048), (99, 0.042)]
simIndex simValue paperId paperTitle
1 0.96152073 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions
Author: Matthew A. Kayala, Pierre F. Baldi
Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.
Author: Cristina Savin, Peter Dayan, Máté Lengyel
Abstract: Storing a new pattern in a palimpsest memory system comes at the cost of interfering with the memory traces of previously stored items. Knowing the age of a pattern thus becomes critical for recalling it faithfully. This implies that there should be a tight coupling between estimates of age, as a form of familiarity, and the neural dynamics of recollection, something which current theories omit. Using a normative model of autoassociative memory, we show that a dual memory system, consisting of two interacting modules for familiarity and recollection, has best performance for both recollection and recognition. This finding provides a new window onto actively contentious psychological and neural aspects of recognition memory. 1
same-paper 3 0.83556914 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
4 0.73856074 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. 1
5 0.73810065 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
6 0.73532307 231 nips-2011-Randomized Algorithms for Comparison-based Search
7 0.7278583 197 nips-2011-On Tracking The Partition Function
8 0.72767651 180 nips-2011-Multiple Instance Filtering
9 0.72745818 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
10 0.72647119 21 nips-2011-Active Learning with a Drifting Distribution
11 0.72563112 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
12 0.72340721 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
13 0.7231881 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
14 0.72247601 303 nips-2011-Video Annotation and Tracking with Active Learning
15 0.72210485 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
16 0.72158611 64 nips-2011-Convergent Bounds on the Euclidean Distance
17 0.71932292 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
18 0.71880484 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
19 0.71769392 158 nips-2011-Learning unbelievable probabilities
20 0.7167806 283 nips-2011-The Fixed Points of Off-Policy TD