nips nips2011 nips2011-229 knowledge-graph by maker-knowledge-mining

229 nips-2011-Query-Aware MCMC

Source: pdf

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, speciﬁed by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. [sent-5, score-0.634]

2 In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. [sent-7, score-0.558]

3 In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. [sent-8, score-0.763]

4 1 Introduction Graphical models are useful for representing relationships between large numbers of random variables in probabilistic models spanning a wide range of applications, including information extraction and data integration. [sent-11, score-0.23]

5 Exact inference in these models is often computationally intractable due to the dense dependency structures required in many real world problems, thus there exists a large body of work on both variational and sampling approximations to inference that help manage large treewidth. [sent-12, score-0.265]

6 The proliferation of interconnected data and the desire to model it has given rise to graphical models with millions or even billions of random variables. [sent-14, score-0.194]

7 Unfortunately, there has been little research devoted to approximate inference in graphical models that are large in terms of their number of variables. [sent-15, score-0.245]

8 Fortunately, many inference needs are instigated by queries issued by users interested in particular random variables. [sent-17, score-0.201]

9 In these situations not all variables are of equal relevance to the user’s query; some variables become observed given the query, others become statistically independent given the query, and the remaining variables are typically marginalized. [sent-22, score-0.326]

10 Thus, a user-generated query provides a tremendous amount of information that can be exploited by an intelligent inference procedure. [sent-23, score-0.542]

11 Unfortunately, traditional approaches to inference such as loopy belief propagation (BP) and Gibbs sampling are query agnostic in the sense that they fail to take advantage of this knowledge and treat each variable as equally relevant. [sent-24, score-0.869]

12 Surprisingly, there has been little research on query speciﬁc inference and the only existing approaches focus on loopy BP [3, 4]. [sent-25, score-0.591]

13 In this paper we propose a query-aware approach to Markov chain Monte Carlo (QAM) that exploits the dependency structure of the graph and the query to achieve faster convergence to the answer. [sent-26, score-0.696]

14 Our method selects variables for sampling in proportion to their inﬂuence on the query variables. [sent-27, score-0.633]

15 We 1 determine this inﬂuence using a computationally tractable generalization of mutual information between the query variables and each variable in the graph. [sent-28, score-0.743]

16 Because our query-speciﬁc approach to inference is based on MCMC, we can provide arbitrarily close approximations to the query answer while also scaling to graphs whose structure and unrolled factor density would ordinarily preclude both exact and belief propagation inference methods. [sent-29, score-0.777]

17 This is essential for the method to be deployable in real-world probabilistic databases where even a seemingly innocuous relational algebra query over a simple fully independent structure can produce an inference problem that is #P-hard [5]. [sent-30, score-0.749]

18 We demonstrate dramatic improvements over traditional Markov chain Monte Carlo sampling methods across a wide array of models of diverse structure. [sent-31, score-0.277]

19 A factor graph G := x, ψ is a bipartite graph consisting of n random variables x = {xi }n and m factors ψ = {ψi }m . [sent-34, score-0.182]

20 Each variable xi has a domain Xi , and we notate the entire 1 1 domain space of the random variables (x) as X with associated σ-algebra Ω. [sent-35, score-0.24]

21 2 Queries on Graphical Models Informally, a query on a graphical model is a request for some quantity of interest that the graphical model is capable of providing. [sent-40, score-0.734]

22 That is, a query is a function mapping the graphical model to an answer set. [sent-41, score-0.669]

23 While in the general case, a query may contain arbitrary functions over the support of a graphical model, for this work we consider queries of the marginal form. [sent-43, score-0.776]

24 That is a query Q consists of three parts Q = xq , xl , xe . [sent-44, score-1.21]

25 Where xq is the set of query variables whose marginal distributions (or MAP conﬁguration) are the answer to the query, xe is a set of evidence variables whose values are observed, and xl is the set of latent variables over which one typically marginalizes to obtain the statistically sound answer. [sent-45, score-1.709]

26 Note that this class of queries is remarkably general and includes queries that require expectations over arbitrary functions. [sent-46, score-0.238]

27 We can see this because a function over the graphical model (or a subset of the graphical model) is itself a random variable, and can therefore be included in xq . [sent-47, score-0.666]

28 1 More precisely, a query over a graphical model is: Q(xq , xl , xe , π) = π(xq |xe = ve ) = π(xq , xl |xe = ve ) (2) vl we assume that Ω is well deﬁned with respect to marginalization over arbitrary subsets of variables. [sent-48, score-1.417]

29 3 Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is an important inference method for graphical models where computing the normalization constant Z is intractable. [sent-50, score-0.377]

30 However, many of the results 1 Research in probabilistic databases has demonstrated that a large class of relational algebra queries can be represented as graphical models and answered using statistical queries of the this form [6, 7]. [sent-54, score-0.578]

31 Markov chain Monte Carlo produces a sequence of states {si }∞ in a state space S according to 1 a transition kernel K : S × S → R+ , which in the discrete case is a stochastic matrix: for all s ∈ S K(s, ·) is a valid probability measure and for all s ∈ S K(·, s) is a measurable function. [sent-56, score-0.25]

32 Since we are concerned with MCMC for inference in graphical models, we will from now on let S:=X, and use X instead. [sent-57, score-0.219]

33 Under certain conditions the Markov chain is said to be ergodic, then the chain exhibits two types of convergence. [sent-58, score-0.264]

34 At each time step, the Markov chain is in a time-speciﬁc distribution over the state space (encoding the probability of being in a particular state at time t). [sent-61, score-0.251]

35 Under certain conditions and regardless of the initial distribution, the Markov chain will converge to the stationary (invariant) distribution π. [sent-64, score-0.211]

36 4 MCMC Inference in Graphical Models MCMC is used for inference in graphical models by constructing a Markov chain with invariant distribution π (given by the graphical model). [sent-67, score-0.617]

37 As a result, generating samples from graphical models with Metropolis-Hastings is usually inexpensive. [sent-71, score-0.163]

38 3 3 Query Speciﬁc MCMC Given a query Q = xq , xl , xe , and a probability distribution π encoded by a graphical model G with factors ψ and random variables x, the problem of query speciﬁc inference is to return the highest ﬁdelity answer to Q given a possible time budget. [sent-72, score-2.128]

39 We can put more precision on this statement by deﬁning “highest ﬁdelity” as closest to the truth in total variation distance. [sent-73, score-0.154]

40 Our approach for query speciﬁc inference is based on the Metropolis Hastings algorithm described in Section 2. [sent-74, score-0.542]

41 2: Sample a new value for xi according to some distribution q(Xi ) over that variable’s domain, leave all other variables unchanged and return the new state s . [sent-77, score-0.24]

42 In brief, this strategy arrives at a new state s from a current state s by simply updating the value of one variable at a time. [sent-78, score-0.16]

43 In traditional MCMC inference, where the marginal distributions of all variables are of equal interest, the variables are usually sampled in a deterministic order, or selected 1 uniformly at random; that is, p(i) = n induces a uniform distribution over the integers 1, 2, · · · , n. [sent-79, score-0.394]

44 However, given a query Q, it is reasonable to choose a p that more frequently selects the query variables for sampling. [sent-80, score-1.018]

45 Clearly, the query variable marginals depend on the remaining latent variables, so we must tradeoff sampling between query and non-query variables. [sent-81, score-1.179]

46 A key observation is that not all latent variables inﬂuence the query variables equally. [sent-82, score-0.697]

47 A fundamental question raised and addressed in this paper is: how do we pick a variable selection distribution p for a query Q to obtain the highest ﬁdelity answer under a ﬁnite time budget. [sent-83, score-0.647]

48 We propose to select variables based on their inﬂuence on the query variable according to the graphical model. [sent-84, score-0.773]

49 The π(x,y) mutual information I(x, y) = π(x, y) log( π(x)π(y) ) between two random variables measures the strength of their dependence. [sent-86, score-0.205]

50 It is easy to check that this quantity is the KL divergence between the joint distribution of the variables and the product of the marginals: I(x, y) = KL(π(x, y)||π(x)π(y)). [sent-87, score-0.163]

51 Let x and y be two random variables with marginal distributions π(x, y),π(x), π(y). [sent-92, score-0.158]

52 The inﬂuence ι(x, y) between x and y is ι(x, y) := f (π(x, y), π(x)π(y)) (8) If we let f be the KL divergence then ι becomes the mutual information; however, because MCMC convergence is more commonly assessed with total variation norm, we deﬁne an inﬂuence metric based on this choice for f . [sent-94, score-0.367]

53 In particular we deﬁne ιtv (x, y) := π(x, y) − π(x)π(y) tv . [sent-95, score-0.261]

54 As we will now show, the total variation inﬂuence (between the query variable and the latent variables) has the important property that it is exactly the error incurred from ignoring a single latent variable when sampling values for xq . [sent-96, score-1.351]

55 For example, suppose we design an approximate query speciﬁc sampler that saves computational resources by ignoring a particular random variable xl . [sent-97, score-0.823]

56 Then, the variable xl will remain at its burned-in value xl =vl for the duration of query speciﬁc sampling. [sent-98, score-1.056]

57 As a result, the chain will converge to the invariant distribution π(·|xl =vl ). [sent-99, score-0.235]

58 If we use this conditional distribution to approximate the marginal, then the expected error we incur is exactly the inﬂuence score under total variation distance. [sent-100, score-0.221]

59 If p(i) = 1(i = l) n−1 induces an MH kernel that neglects variable xl , then the expected total variation error ξtv of the resulting MH sampling procedure under the model is the total variation inﬂuence ιtv . [sent-102, score-0.757]

60 4 Proof: The resulting chain has stationary distribution π(xq |xl = vl ). [sent-103, score-0.414]

61 We are now justiﬁed in selecting variables proportional to their inﬂuence to reduce the error they assert on the query marginal. [sent-105, score-0.558]

62 Note, however, that computing either ιtv or the mutual information is as difﬁcult as inference itself. [sent-107, score-0.189]

63 Thus, we deﬁne a computationally efﬁcient variant of inﬂuence that we term the inﬂuence trail score. [sent-108, score-0.162]

64 The idea is to approximate the true inﬂuence as a product of factors along an active trail in the graph. [sent-109, score-0.194]

65 Let ρ = (x0 , x1 , · · · , xr ) be an active trail between the query variable xq and xi where x0 = xq and xr = xi . [sent-111, score-1.612]

66 Let φ(xi , xj ) be the approximate joint distribution between xi and xj according only to the mutual factors in their scopes. [sent-112, score-0.24]

67 The inﬂuence trail score with respect to an active trail ρ is r−1 τρ (xq , xi ) := f (φi (xi , xi+1 ), φi (xi )φi (xi+1 )) (9) i=1 The inﬂuence trail score is efﬁcient to compute because all factors and variables outside the mutual scopes of each variable pair are ignored. [sent-114, score-0.925]

68 In the experimental results we evaluate both the inﬂuence and the inﬂuence trail and ﬁnd that they perform similarly and outperform competing graph-based heuristics for determining p. [sent-115, score-0.162]

69 While in general it is difﬁcult to uniformly state that one choice of p converges faster than another for all models and queries, we present the following analysis showing that even an approximate query aware sampler can exhibit faster ﬁnite time convergence progress than an exact sampler. [sent-116, score-0.703]

70 Let K be an exact MCMC kernel that converges to the correct stationary distribution and let L be an approximate kernel that exclusively samples the query variable and thus converges to the conditional distribution of the query variable. [sent-117, score-1.254]

71 Extrapolating Proposition 1, we know that the error incurred from neglecting to sample the latent variables is the inﬂuence ιtv between the joint distribution of the latent variables and the query variable. [sent-119, score-0.84]

72 Observe that L is simultaneously making progress towards two distributions: its own invariant distribution and the invariant distribution of K plus an error term. [sent-120, score-0.206]

73 The amount of time that L (the query only kernel) is closer to K’s stationary distribution πk can be determined by solving for t, 5 yielding the ﬁxed point iteration: t= log (γlt + ιtv ) log γk (13) +ι The one-step approximation yields a non-trivial, but conservative bound: t ≥ log(γlγk tv ) . [sent-122, score-0.8]

74 This implies that the strategy of exclusively sampling the query variables can achieve faster short-term convergence to the correct invariant distribution even though asymptotic convergence is to the incorrect invariant distribution. [sent-124, score-0.958]

75 The decayed MCMC algorithm for ﬁltering [12] can be thought of as a special case of our method where the model is a linear chain, and the query is for the last variable in the sequence. [sent-129, score-0.571]

76 MCMC has also recently been deployed in probabilistic databases [13] where it is possible to incorporate the deterministic constraints of a relational algebra query directly into a Metropolis-Hastings proposal distribution to obtain quicker answers [14, 15]. [sent-132, score-0.702]

77 A related idea from statistics is data augmentation (or auxiliary variable) approaches to sampling where latent variables are artiﬁcially introduced into the model to improve convergence of the original variables (e. [sent-133, score-0.39]

78 In this setting, we see QAM as a way of determining a more sophisticated variable selection strategy that can balance sampling efforts between the original and auxiliary variables. [sent-136, score-0.153]

79 5 Experiments In this section we demonstrate the effectiveness and broad applicability of query aware MCMC (QAM) by demonstrating superior convergence rates to the query marginals across a diverse range of graphical models that vary widely in structure. [sent-137, score-1.265]

80 In our experiments, we generate a wide range of graphical models and evaluate the convergence of each chain exactly, avoiding noisy empirical sampling error by performing exact computations with full transition kernels. [sent-138, score-0.488]

81 Query-only Metropolis-Hastings (qo): p(xi ) = 1(xq = xi ); on six different graphical models with varying parameters generated from a Beta(2,2) distribution (this ensures an interesting dynamic range over the event space). [sent-145, score-0.305]

82 For our experiments we randomly generate ten parameter settings for each of the six model types and measure convergence of the six chains to the the single-variable marginal query π(xq ) for each variable in each of the sixty realized models. [sent-154, score-0.824]

83 Convergence is measured using the total variation norm: π(xq ) − π(xq )(t) tv . [sent-155, score-0.415]

84 Generally, all the query speciﬁc sampling chains converge more quickly than the uniform baseline in the early iterations across every model. [sent-158, score-0.688]

85 The query-only and mutual information chain exhibit the most rapid convergence in the early stages of learning, with the query-only chain converging to an incorrect distribution, and the mutual information chain slowly converging during the later time stages. [sent-160, score-0.778]

86 Finally, notice that the inﬂuence-trail variant of total variation inﬂuence converges at a similar rate to the actual total variation inﬂuence, and in some cases converges more quickly (e. [sent-162, score-0.404]

87 In the next experiment, we demonstrate how the size of the graphical model affects convergence of the various chains. [sent-165, score-0.215]

88 In particular, we plot the convergence of all chains on six different hoop-structured models containing three, four, six, eight, ten, and twelve variables (Figure 2). [sent-166, score-0.309]

89 That is we measure the difference in convergence rates t t π − π0 KUnif tv − π − π0 KQAM tv so that points above the line x = 0 mean the QAM is closer to the answer than the uniform baseline and points below the line mean the QAM is further from the answer. [sent-168, score-0.729]

90 As expected, increasing the number of variables in the graph increases the opportunities for query speciﬁc sampling and thus increases QAM’s advantage over traditional MCMC. [sent-169, score-0.703]

91 6 Conclusion In this paper we presented a query-aware approach to MCMC, motivated by the need to answer queries over large scale graphical models. [sent-170, score-0.328]

92 We found that the query-aware sampling methods outperform the traditional Metropolis Hastings sampler across all models in the early time steps. [sent-171, score-0.171]

93 Further, as the number of variables in the models increase, the query aware samplers not only outperform the baseline for longer periods of time, but also exhibit more dramatic convergence rate improvements. [sent-172, score-0.755]

94 Thus, query speciﬁc sampling is a promising approach for approximately answering queries on realworld probabilistic databases (and relational models) that contain billions of variables. [sent-173, score-0.832]

95 An exciting area of future work is to combine query speciﬁc sampling with adaptive MCMC techniques allowing the kernel to evolve in response to the underlying distribution. [sent-175, score-0.572]

96 There has been little theoretical work on analyzing marginal convergence of MCMC chains and future work can help develop these tools. [sent-177, score-0.204]

97 20 Time 0 10 20 Time 30 40 50 0 10 Time 20 30 Time Figure 1: Convergence to the query marginals of the stationary distribution from an initial uniform distribution. [sent-203, score-0.661]

98 10 Time 8 Variables Improvement over uniform Time 0 10 20 30 Time Figure 2: Improvement over uniform p as the number of variables increases. [sent-222, score-0.212]

99 As number of variables increase, the improvements of the query speciﬁc techniques increase. [sent-224, score-0.558]

100 BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. [sent-252, score-0.184]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('query', 0.46), ('xq', 0.392), ('tv', 0.261), ('xl', 0.259), ('vl', 0.203), ('mcmc', 0.194), ('qam', 0.164), ('uence', 0.163), ('trail', 0.162), ('graphical', 0.137), ('chain', 0.132), ('queries', 0.119), ('variation', 0.116), ('mutual', 0.107), ('xe', 0.099), ('variables', 0.098), ('inference', 0.082), ('wick', 0.082), ('variable', 0.078), ('convergence', 0.078), ('sampling', 0.075), ('answer', 0.072), ('chains', 0.066), ('hastings', 0.066), ('invariant', 0.066), ('marginals', 0.065), ('xi', 0.064), ('marginal', 0.06), ('databases', 0.06), ('uniform', 0.057), ('samplers', 0.054), ('metropolis', 0.054), ('delity', 0.054), ('loopy', 0.049), ('markov', 0.047), ('probabilistic', 0.047), ('vq', 0.047), ('joseph', 0.045), ('propagation', 0.044), ('traditional', 0.044), ('lt', 0.042), ('vldb', 0.042), ('stationary', 0.042), ('six', 0.041), ('daisy', 0.041), ('garofalakis', 0.041), ('gerome', 0.041), ('hoop', 0.041), ('kmh', 0.041), ('minos', 0.041), ('zhe', 0.041), ('state', 0.041), ('mh', 0.041), ('mccallum', 0.041), ('latent', 0.041), ('transition', 0.04), ('relational', 0.04), ('stages', 0.04), ('aware', 0.039), ('connected', 0.039), ('michael', 0.039), ('andrew', 0.038), ('total', 0.038), ('kernel', 0.037), ('distribution', 0.037), ('belief', 0.037), ('stats', 0.036), ('coreference', 0.036), ('advancement', 0.036), ('iarpa', 0.036), ('carlo', 0.035), ('improvement', 0.034), ('monte', 0.034), ('neglecting', 0.033), ('decayed', 0.033), ('converges', 0.033), ('extraction', 0.033), ('incurred', 0.032), ('factors', 0.032), ('statistically', 0.032), ('distance', 0.031), ('billions', 0.031), ('kl', 0.031), ('algebra', 0.03), ('quickly', 0.03), ('score', 0.03), ('fully', 0.03), ('ergodic', 0.03), ('bases', 0.029), ('proposal', 0.028), ('cancels', 0.028), ('divergence', 0.028), ('massachusetts', 0.028), ('amherst', 0.027), ('contract', 0.026), ('pw', 0.026), ('sampler', 0.026), ('graph', 0.026), ('models', 0.026), ('converging', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

2 0.1573371 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

Abstract: This paper addresses the problem of ﬁnding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difﬁculty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which deﬁnes approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.

3 0.15614508 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identiﬁed by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Speciﬁcally, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reﬂect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1

4 0.13594514 301 nips-2011-Variational Gaussian Process Dynamical Systems

Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou

Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1

5 0.13386175 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classiﬁcation task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to ﬁnd the so-called maximal root-tree of the subtree rooted at each node. A naive approach for ﬁnding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for eﬃciently ﬁnding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the eﬀectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1

6 0.12135402 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

7 0.11456724 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

8 0.11006974 157 nips-2011-Learning to Search Efficiently in High Dimensions

9 0.093419813 272 nips-2011-Stochastic convex optimization with bandit feedback

10 0.08960247 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

11 0.08458893 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

12 0.081212521 158 nips-2011-Learning unbelievable probabilities

13 0.077314079 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

14 0.075969502 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

15 0.074549213 8 nips-2011-A Model for Temporal Dependencies in Event Streams

16 0.073730856 55 nips-2011-Collective Graphical Models

17 0.073012874 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

18 0.072185941 168 nips-2011-Maximum Margin Multi-Instance Learning

19 0.070918195 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

20 0.069335632 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.193), (1, 0.011), (2, -0.034), (3, 0.013), (4, -0.05), (5, -0.097), (6, -0.064), (7, -0.137), (8, 0.083), (9, -0.039), (10, -0.057), (11, -0.081), (12, -0.032), (13, 0.025), (14, 0.154), (15, -0.037), (16, 0.082), (17, -0.001), (18, -0.123), (19, 0.069), (20, 0.037), (21, 0.031), (22, 0.202), (23, -0.1), (24, -0.153), (25, 0.037), (26, 0.189), (27, 0.107), (28, 0.022), (29, -0.089), (30, 0.024), (31, -0.187), (32, -0.125), (33, -0.025), (34, -0.129), (35, 0.077), (36, -0.052), (37, -0.106), (38, 0.055), (39, -0.039), (40, -0.023), (41, 0.029), (42, -0.029), (43, -0.036), (44, -0.104), (45, 0.063), (46, 0.029), (47, -0.041), (48, 0.022), (49, -0.109)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95719969 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

2 0.65968513 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

3 0.65163755 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

Author: Nir Ailon

Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often signiﬁcantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1

4 0.58060038 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

5 0.52187067 8 nips-2011-A Model for Temporal Dependencies in Event Streams

Author: Asela Gunawardana, Christopher Meek, Puyang Xu

Abstract: We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams. We describe a closed-form Bayesian approach to learning these models, and describe an importance sampling algorithm for forecasting future events using these models, using a proposal distribution based on Poisson superposition. We then use synthetic data, supercomputer event logs, and web search query logs to illustrate that our learning algorithm can efﬁciently learn nonlinear temporal dependencies, and that our importance sampling algorithm can effectively forecast future events. 1

6 0.50988162 55 nips-2011-Collective Graphical Models

7 0.50752789 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

8 0.48601338 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

9 0.45910701 272 nips-2011-Stochastic convex optimization with bandit feedback

10 0.45204434 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

11 0.44652116 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

12 0.44586658 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

13 0.42390686 157 nips-2011-Learning to Search Efficiently in High Dimensions

14 0.41110498 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

15 0.40851593 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

16 0.39118898 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

17 0.38852739 64 nips-2011-Convergent Bounds on the Euclidean Distance

18 0.38656044 221 nips-2011-Priors over Recurrent Continuous Time Processes

19 0.36555561 301 nips-2011-Variational Gaussian Process Dynamical Systems

20 0.35670465 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.034), (4, 0.055), (20, 0.032), (26, 0.084), (31, 0.121), (33, 0.032), (43, 0.045), (44, 0.183), (45, 0.116), (56, 0.012), (57, 0.045), (63, 0.014), (65, 0.015), (74, 0.049), (83, 0.048), (99, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96152073 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

Author: Matthew A. Kayala, Pierre F. Baldi

Abstract: Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufﬁcient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2,989 productive mechanistic steps and 6.14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we ﬁrst train atom level reactivity ﬁlters to prune 94.0% of non-productive reactions with less than a 0.1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89.1% of the time, rising to 99.9% of the time when top ranked lists with at most four nonproductive reactions are considered. The ﬁnal system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.

2 0.87509447 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

Author: Cristina Savin, Peter Dayan, Máté Lengyel

Abstract: Storing a new pattern in a palimpsest memory system comes at the cost of interfering with the memory traces of previously stored items. Knowing the age of a pattern thus becomes critical for recalling it faithfully. This implies that there should be a tight coupling between estimates of age, as a form of familiarity, and the neural dynamics of recollection, something which current theories omit. Using a normative model of autoassociative memory, we show that a dual memory system, consisting of two interacting modules for familiarity and recollection, has best performance for both recollection and recognition. This ﬁnding provides a new window onto actively contentious psychological and neural aspects of recognition memory. 1

same-paper 3 0.83556914 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

4 0.73856074 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a ﬂat histogram sampling technique by signiﬁcantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efﬁcient weight learning in Markov Logic theories. 1

5 0.73810065 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

6 0.73532307 231 nips-2011-Randomized Algorithms for Comparison-based Search

7 0.7278583 197 nips-2011-On Tracking The Partition Function

8 0.72767651 180 nips-2011-Multiple Instance Filtering

9 0.72745818 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning

10 0.72647119 21 nips-2011-Active Learning with a Drifting Distribution

11 0.72563112 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

12 0.72340721 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

13 0.7231881 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

14 0.72247601 303 nips-2011-Video Annotation and Tracking with Active Learning

15 0.72210485 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

16 0.72158611 64 nips-2011-Convergent Bounds on the Euclidean Distance

17 0.71932292 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

18 0.71880484 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

19 0.71769392 158 nips-2011-Learning unbelievable probabilities

20 0.7167806 283 nips-2011-The Fixed Points of Off-Policy TD