nips nips2011 nips2011-192 knowledge-graph by maker-knowledge-mining

192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference


Source: pdf

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. [sent-7, score-0.366]

2 Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. [sent-8, score-0.3]

3 We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. [sent-9, score-0.689]

4 These interpretations can be easily coded using special-purpose objects and operator overloading. [sent-10, score-0.14]

5 We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. [sent-11, score-0.84]

6 1 Introduction Probabilistic programming simplifies the development of probabilistic models by allowing modelers to specify a stochastic process using syntax that resembles modern programming languages. [sent-12, score-0.323]

7 These languages permit arbitrary mixing of deterministic and stochastic elements, resulting in tremendous modeling flexibility. [sent-13, score-0.15]

8 The resulting programs define probabilistic models that serve as prior distributions: running the (unconditional) program forward many times results in a distribution over execution traces, with each trace being a sample from the prior. [sent-14, score-0.297]

9 The primary challenge in developing such languages is scalable inference. [sent-16, score-0.124]

10 But in probabilistic modeling more generally, efficient inference algorithms are designed by taking advantage of structure in distributions. [sent-19, score-0.137]

11 How can we find structure in a distribution defined by a probabilistic program? [sent-20, score-0.081]

12 Here, we show how nonstandard interpretations of probabilistic programs can help craft efficient inference algorithms. [sent-23, score-0.597]

13 We focus on two such interpretations: automatic differentiation and provenance tracking, and show how they can be used as building blocks to construct efficient inference 1 algorithms. [sent-26, score-0.488]

14 We implement nonstandard interpretations in two different languages (C HURCH and Stochastic M ATLAB), and experimentally demonstrate that while they typically incur some additional execution overhead, they dramatically improve inference performance. [sent-27, score-0.594]

15 We de1: for i=1:1000 fine an unconditioned probabilistic program to be a pa2: if ( rand > 0. [sent-30, score-0.21]

16 5 ) rameterless function f with an arbitrary mix of stochas3: X(i) = randn; tic and deterministic elements (hereafter, we will use the 4: else term function and program interchangeably). [sent-31, score-0.086]

17 The stochastic elements of f must come from a set of known, fixed elementary random primitives, or ERPs. [sent-35, score-0.072]

18 In M ATLAB, ERPs may be functions such as rand (sample uniformly from [0,1]) or randn (sample from a standard normal). [sent-37, score-0.11]

19 Let fk|x1 ,··· ,xk−1 be the k’th ERP encountered while executing f , and let xk be the value it returns. [sent-48, score-0.095]

20 By fk , it should therefore be understood that we mean fk|x1 ,··· ,xk−1 , and by ptk (xk |θtk ) we mean ptk (xk |θtk , x1 , · · · , xk−1 ). [sent-54, score-0.07]

21 1 Nonstandard Interpretations of Probabilistic Programs With an outline of probabilistic programming in hand, we now turn to nonstandard interpretations. [sent-60, score-0.414]

22 The idea of nonstandard interpretations originated in model theory and mathematical logic, where it was proposed that a set of axioms could be interpreted by different models. [sent-61, score-0.373]

23 For example, differential geometry can be considered a nonstandard interpretation of classical arithmetic. [sent-62, score-0.284]

24 In programming, a nonstandard interpretation replaces the domain of the variables in the program with a new domain, and redefines the semantics of the operators in the program to be consistent with the new domain. [sent-63, score-0.485]

25 This allows reuse of program syntax while implementing new functionality. [sent-64, score-0.121]

26 Practically, many useful nonstandard interpretations can be implemented with operator overloading: variables are redefined to be objects with operators that implement special functionality, such as tracing, reference counting, or profiling. [sent-66, score-0.425]

27 For the purposes of inference in probabilistic programs, we will augment each random choice xk with additional side information sk , and replace each xk with the tuple xk , sk . [sent-67, score-0.302]

28 The native interpreter for the probabilistic program can then interpret the source code as a sequence of operations on these augmented data types. [sent-68, score-0.189]

29 3 Automatic Differentiation For probabilistic models with many continuous-valued random variables, the gradient of the likelihood ∇x p(x) provides local information that can significantly improve the properties of MonteCarlo inference algorithms. [sent-70, score-0.209]

30 We use automatic differentiation (AD) [3, 7], a nonstandard interpretation that automatically constructs ∇x p(x). [sent-74, score-0.391]

31 The automatic nature of AD is critical because it relieves the programmer from hand-computing derivatives for each model; moreover, some probabilistic programs dynamically create or delete random variables making simple closed-form expressions for the gradient very difficult to find. [sent-75, score-0.312]

32 To do this, AD relies on the chain rule to decompose the derivative of f into derivatives of its sub-functions: ultimately, known derivatives of elementary functions are composed together to yield the derivative of the compound function. [sent-77, score-0.218]

33 This composition can be computed as a nonstandard interpretation of the underlying elementary functions. [sent-78, score-0.33]

34 The derivative computation as a composition of the derivatives of the elementary functions can be performed in different orders. [sent-79, score-0.132]

35 In forward mode AD [27], computation of the derivative proceeds by propagating perturbations of the input toward the output. [sent-80, score-0.097]

36 This can be done by a nonstandard interpretation that extends each real value to the first two terms of its Taylor expansion [26], overloading each elementary function to operate on these real “polynomials”. [sent-81, score-0.424]

37 In reverse mode AD [25], computation of the derivative proceeds by propagating sensitivities of the output toward the input. [sent-83, score-0.106]

38 One way this can be done is by a nonstandard interpretation that extends each real value into a “tape” that captures the trace of the real computation which led to that value from the inputs, overloading each elementary function to incrementally construct these tapes. [sent-84, score-0.453]

39 Additionally, overloading and AD are well established techniques that have been applied to machine learning, and even to applicationspecific programming languages for machine learning, e. [sent-103, score-0.295]

40 In particular, DYNA applies a nonstandard interpretation for ∧ and ∨ as a semiring (× and +, + and max, . [sent-106, score-0.284]

41 and uses AD to derive the outside algorithm from the inside algorithm and support parameter estimation, but unlike probabilistic programming, it does not model general stochastic processes and does not do general inference over such. [sent-110, score-0.163]

42 Our use of overloading and AD differs in that it facilitates inference in complicated models of general stochastic processes formulated as probabilistic programs. [sent-111, score-0.257]

43 Probabilistic programming provides a powerful and convenient framework for formulating complicated models and, more importantly, separating such models from orthogonal inference mechanisms. [sent-112, score-0.133]

44 Moreover, overloading provides a convenient mechanism for implementing many such inference mechanisms (e. [sent-113, score-0.15]

45 , Langevin MC, Hamiltonian MCMC, Provenance Tracking, as demonstrated below) in a probabilistic programming language. [sent-115, score-0.158]

46 0)))))))]) (map (lambda (x) (map (lambda (y) (perlin-pt x y keypt power)) xs)) ys))) Figure 1: Code for the structured Perlin noise generator. [sent-118, score-0.094]

47 For example, Hessian matrices can be used to construct blocked Metropolis moves [9] or proposals based on Newton’s method [19], or as part of Riemannian manifold methods [5]. [sent-132, score-0.123]

48 We used used an implementation of AD based on [17] that uses hygienic operator overloading to do both forward and reverse mode AD for Scheme (the target language of the B HER compiler). [sent-135, score-0.255]

49 1, p(x) is the product of the individual choices made by each xi (though each probability can depend on previous choices, through the program evaluation). [sent-138, score-0.086]

50 The gradient is then computed in reverse mode, by “back-propagating” along this graph. [sent-141, score-0.079]

51 Since program states may contain a combination of discrete and continuous ERPs, we use an overall cycle kernel which alternates between standard MH kernel for individual discrete random variables and the HMC kernel for all continuous random choices. [sent-143, score-0.086]

52 1; this experiment illustrates how the automatic nature of the gradients is most helpful, as it would be time consuming to compute these gradients by hand—particularly since we are free to condition using any function of the image. [sent-158, score-0.146]

53 4 Provenance Tracking for Fine-Grained Dynamic Dependency Analysis One reason gradient based inference algorithms are effective is that the chain rule of derivatives propagates information backwards from the data up to the proposal variables. [sent-174, score-0.19]

54 We now introduce a new nonstandard interpretation based on provenance tracking (PT). [sent-177, score-0.676]

55 In programming language theory, the provenance of a variable is the history of variables and computations that combined to form its value. [sent-178, score-0.418]

56 We then use this provenance information to construct good global proposals for discrete variables as part of a novel factored multiple-try MCMC algorithm. [sent-180, score-0.453]

57 Because provenance information is much coarser than gradient information, the operators in PT objects have a particularly simple form; most program expressions can be covered by considering a few cases. [sent-183, score-0.49]

58 Let R(x) ⊂ X define the provenance of a variable x. [sent-185, score-0.296]

59 Given R(x), the provenance of expressions involving x can be computed by breaking 5 down expressions into a sequence of unary operations, binary operations, and function applications. [sent-186, score-0.366]

60 Let x and y be expressions in the program (consisting of an arbitrary mix of variables, constants, functions and operators). [sent-188, score-0.121]

61 For a binary operation x ⊙ y, the provenance R(x ⊙ y) of the result is defined to be R(x ⊙ y) = R(x) ∪ R(y). [sent-189, score-0.296]

62 Similarly, for a unary operation, the provenance R(⊙x) = R(x). [sent-190, score-0.296]

63 Strictly speaking, the previous rules track a superset of provenance information because some functions and operations are constant for certain inputs. [sent-200, score-0.296]

64 In the case of probabilistic programming, recall that random variables (or ERPs) are represented as stochastic functions fi that accept parameters θi . [sent-203, score-0.187]

65 Whenever a random variable is conditioned, the output of the corresponding fi is fixed; thus, while the likelihood of a particular output of fi depends on θi , the specific output of fi does not. [sent-204, score-0.115]

66 Here, we illustrate one use: to help construct good block proposals for MH inference. [sent-208, score-0.123]

67 Our basic idea is to construct a good global proposal by starting with a random global proposal (which is unlikely to be good) and then inhibiting the bad parts. [sent-209, score-0.115]

68 We do this by allowing each element of the likelihood to “vote” on which proposals seemed good. [sent-210, score-0.151]

69 In step (3), we accept or reject each element of the proposal based on a i i function α. [sent-217, score-0.117]

70 In step (4) we construct a new proposal xM by “mixing” two states: we set the variables in the accepted set A to the values of ′ xO , and we leave the variables in the rejected set R at their original values in xO . [sent-220, score-0.155]

71 Finally, in step (9) we accept or reject the overall proposal. [sent-223, score-0.074]

72 ′ We use α(xO , xO ) to allow the likelihood to “vote” in a fine-grained way for which proposals seemed good and which seemed bad. [sent-224, score-0.18]

73 We also use PT to compute p(xO ), again i ′ tracking dependents D(i; xO ), and let D(i) be the joint set of ERPs that xi influences in either state ′ ′ xO or xO . [sent-227, score-0.12]

74 We assign “credit” to each i as if it were the i only proposal – that is, we assume that if, for example, the likelihood went up, it was entirely due to the change in xO . [sent-229, score-0.071]

75 i i i ′ ′ 3: Decide to accept or reject each element of xO . [sent-242, score-0.074]

76 Let A be the set i of indices of accepted proposals, and R the set of rejected ones. [sent-244, score-0.083]

77 This new state mixes new values for the 4: Construct a new state, xM = xO : i ∈ A j i ERPs from the accepted set A and old values for the ERPs in the rejected set R. [sent-246, score-0.107]

78 Propose new values for all of the rejected ERPs using xM as the start state, but leave ERPs in the accepted set at their original value. [sent-250, score-0.083]

79 3 Experiments and Results We implemented provenance tracking and in Stochastic M ATLAB [28] by leveraging M ATLAB’s object oriented capabilities, which provides full operator overloading. [sent-259, score-0.415]

80 We tested on four tasks: a Bayesian “mesh induction” task, a small QMR problem, probabilistic matrix factorization [23] and an integer-valued variant of PMF. [sent-260, score-0.081]

81 We measured performance by examining likelihood as a function of wallclock time; an important property of the provenance tracking algorithm is that it can help mitigate constant factors affecting inference performance. [sent-261, score-0.476]

82 5: Bayesian Mesh Induction given a prior distribution over meshes and a target im1: function X = bmi( base mesh ) age, sample a mesh which, when rendered, looks like the 2: mesh = base mesh + randn; target image. [sent-264, score-0.472]

83 The prior is a Gaussian centered around a 3: img = render( mesh ); “mean mesh,” which is a perfect sphere; Gaussian noise 4: X = img + randn; is added to each vertex to deform the mesh. [sent-265, score-0.172]

84 No gradients are available for this renderer, but it is reasonably easy to augment it with provenance information recording vertices of the triangle that were responsible for each pixel. [sent-269, score-0.348]

85 This allows us to make proposals to mesh vertices, while assigning credit based on pixel likelihoods. [sent-270, score-0.234]

86 Even though the renderer is quite fast, MCMC with simple proposals fails: after proposing a change to a single variable, it must re-render the image in order to compute the likelihood. [sent-273, score-0.134]

87 In contrast, making large, global proposals is very effective. [sent-274, score-0.094]

88 Here, MCMC with provenance tracking is effective: it finds highlikelihood solutions quickly, again outperforming naive MCMC. [sent-288, score-0.392]

89 4, we see that MCMC with provenance tracking is able to find regions of much higher likelihood much more quickly than naive MCMC. [sent-294, score-0.42]

90 We also compared to an efficient hand-coded MCMC sampler which is capable of making, scoring and accepting/rejecting about 20,000 proposals per second. [sent-295, score-0.117]

91 Interestingly, MCMC with provenance tracking is more efficient than the hand-coded sampler, presumably because of the economies of scale that come with making global proposals. [sent-296, score-0.392]

92 5 Conclusions We have shown how nonstandard interpretations of probabilistic programs can be used to extract structural information about a distribution, and how this information can be used as part of a variety of inference algorithms. [sent-302, score-0.573]

93 Empirically, we have implemented two such interpretations and demonstrated how this information can be used to find regions of high likelihood quickly, and how it can be used to generate samples with improved statistical properties versus random-walk style MCMC. [sent-304, score-0.145]

94 There are other types of interpretations which could provide additional information. [sent-305, score-0.117]

95 Each of these interpretations can be used alone or in concert with each other; one of the advantages of the probabilistic programming framework is the clean separation of models and inference algorithms, making it easy to explore combinations of inference algorithms for complex models. [sent-307, score-0.387]

96 More generally, this work begins to illuminate the close connections between probabilistic inference and programming language theory. [sent-308, score-0.259]

97 It is likely that other techniques from compiler design and program analysis could be fruitfully applied to inference problems in probabilistic programs. [sent-309, score-0.27]

98 Compiling comp ling: Weighted dynamic programming and the Dyna language. [sent-346, score-0.077]

99 ADOL-C, a package for the automatic differentiation of algorithms written in C/C++. [sent-374, score-0.107]

100 Lightweight implementations of probabilistic programming languages via transformational compilation. [sent-511, score-0.282]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xo', 0.622), ('provenance', 0.296), ('erps', 0.256), ('nonstandard', 0.256), ('hmc', 0.177), ('xm', 0.155), ('languages', 0.124), ('mcmc', 0.122), ('mesh', 0.118), ('interpretations', 0.117), ('hamiltonian', 0.109), ('atlab', 0.106), ('ad', 0.099), ('qo', 0.098), ('tracking', 0.096), ('proposals', 0.094), ('keypt', 0.094), ('overloading', 0.094), ('program', 0.086), ('probabilistic', 0.081), ('perlin', 0.081), ('programming', 0.077), ('pmf', 0.071), ('randn', 0.067), ('erp', 0.065), ('differentiation', 0.065), ('programs', 0.063), ('mh', 0.062), ('pa', 0.058), ('inference', 0.056), ('xk', 0.055), ('hurch', 0.054), ('qmr', 0.054), ('rejected', 0.052), ('gradients', 0.052), ('accept', 0.051), ('pr', 0.049), ('compiler', 0.047), ('derivatives', 0.047), ('qm', 0.046), ('elementary', 0.046), ('language', 0.045), ('gradient', 0.044), ('metropolis', 0.044), ('proposal', 0.043), ('rand', 0.043), ('pt', 0.042), ('automatic', 0.042), ('lambda', 0.041), ('execution', 0.041), ('dyna', 0.04), ('griewank', 0.04), ('renderer', 0.04), ('tape', 0.04), ('executing', 0.04), ('derivative', 0.039), ('ptk', 0.035), ('syntax', 0.035), ('expressions', 0.035), ('reverse', 0.035), ('factored', 0.034), ('langevin', 0.033), ('mode', 0.032), ('accepted', 0.031), ('seemed', 0.029), ('operators', 0.029), ('construct', 0.029), ('fi', 0.029), ('likelihood', 0.028), ('interpretation', 0.028), ('momentum', 0.028), ('adifor', 0.027), ('bcs', 0.027), ('corliss', 0.027), ('denmark', 0.027), ('fadbad', 0.027), ('ibal', 0.027), ('img', 0.027), ('intlab', 0.027), ('modelers', 0.027), ('noisily', 0.027), ('popl', 0.027), ('pow', 0.027), ('primitives', 0.027), ('siskind', 0.027), ('wingate', 0.027), ('stochastic', 0.026), ('forward', 0.026), ('logic', 0.025), ('craft', 0.024), ('leapfrog', 0.024), ('state', 0.024), ('integer', 0.023), ('operator', 0.023), ('reject', 0.023), ('sampler', 0.023), ('floor', 0.022), ('native', 0.022), ('credit', 0.022), ('lightweight', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1

2 0.15586004 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

Author: Yichuan Zhang, Charles A. Sutton

Abstract: The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasiNewton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives. 1

3 0.095990732 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1

4 0.063417949 86 nips-2011-Empirical models of spiking in neural populations

Author: Jakob H. Macke, Lars Buesing, John P. Cunningham, Byron M. Yu, Krishna V. Shenoy, Maneesh Sahani

Abstract: Neurons in the neocortex code and compute as part of a locally interconnected population. Large-scale multi-electrode recording makes it possible to access these population processes empirically by fitting statistical models to unaveraged data. What statistical structure best describes the concurrent spiking of cells within a local network? We argue that in the cortex, where firing exhibits extensive correlations in both time and space and where a typical sample of neurons still reflects only a very small fraction of the local population, the most appropriate model captures shared variability by a low-dimensional latent process evolving with smooth dynamics, rather than by putative direct coupling. We test this claim by comparing a latent dynamical model with realistic spiking observations to coupled generalised linear spike-response models (GLMs) using cortical recordings. We find that the latent dynamical approach outperforms the GLM in terms of goodness-offit, and reproduces the temporal correlations in the data more accurately. We also compare models whose observations models are either derived from a Gaussian or point-process models, finding that the non-Gaussian model provides slightly better goodness-of-fit and more realistic population spike counts. 1

5 0.062335327 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

6 0.057668343 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

7 0.055515785 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

8 0.051858496 275 nips-2011-Structured Learning for Cell Tracking

9 0.050157838 131 nips-2011-Inference in continuous-time change-point models

10 0.04662744 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability

11 0.045091029 197 nips-2011-On Tracking The Partition Function

12 0.041153274 180 nips-2011-Multiple Instance Filtering

13 0.040449172 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

14 0.03955701 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

15 0.039396312 221 nips-2011-Priors over Recurrent Continuous Time Processes

16 0.037698202 55 nips-2011-Collective Graphical Models

17 0.037109822 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

18 0.036834154 222 nips-2011-Prismatic Algorithm for Discrete D.C. Programming Problem

19 0.036706783 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

20 0.036355209 217 nips-2011-Practical Variational Inference for Neural Networks


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.125), (1, 0.011), (2, 0.007), (3, -0.004), (4, -0.033), (5, -0.061), (6, -0.021), (7, -0.071), (8, 0.03), (9, 0.013), (10, 0.007), (11, -0.085), (12, 0.004), (13, -0.022), (14, -0.001), (15, -0.056), (16, -0.005), (17, 0.027), (18, -0.072), (19, 0.023), (20, -0.049), (21, 0.032), (22, 0.009), (23, -0.065), (24, -0.046), (25, 0.033), (26, -0.067), (27, 0.047), (28, 0.033), (29, 0.027), (30, -0.087), (31, -0.019), (32, -0.03), (33, -0.04), (34, -0.091), (35, 0.019), (36, -0.018), (37, -0.049), (38, 0.062), (39, -0.054), (40, -0.007), (41, 0.004), (42, -0.149), (43, -0.008), (44, -0.022), (45, -0.057), (46, 0.064), (47, -0.066), (48, -0.003), (49, 0.122)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90352696 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1

2 0.71242106 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

Author: Yichuan Zhang, Charles A. Sutton

Abstract: The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasiNewton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives. 1

3 0.57498324 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter

Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1

4 0.57420218 55 nips-2011-Collective Graphical Models

Author: Daniel R. Sheldon, Thomas G. Dietterich

Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1

5 0.53854859 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

Author: Angela Yao, Juergen Gall, Luc V. Gool, Raquel Urtasun

Abstract: A common approach for handling the complexity and inherent ambiguities of 3D human pose estimation is to use pose priors learned from training data. Existing approaches however, are either too simplistic (linear), too complex to learn, or can only learn latent spaces from “simple data”, i.e., single activities such as walking or running. In this paper, we present an efficient stochastic gradient descent algorithm that is able to learn probabilistic non-linear latent spaces composed of multiple activities. Furthermore, we derive an incremental algorithm for the online setting which can update the latent space without extensive relearning. We demonstrate the effectiveness of our approach on the task of monocular and multi-view tracking and show that our approach outperforms the state-of-the-art. 1

6 0.50230896 197 nips-2011-On Tracking The Partition Function

7 0.48957694 229 nips-2011-Query-Aware MCMC

8 0.46209058 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

9 0.45426151 63 nips-2011-Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

10 0.44948706 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

11 0.44055334 8 nips-2011-A Model for Temporal Dependencies in Event Streams

12 0.43397865 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

13 0.42631444 221 nips-2011-Priors over Recurrent Continuous Time Processes

14 0.42498201 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

15 0.42173257 131 nips-2011-Inference in continuous-time change-point models

16 0.4211753 121 nips-2011-Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

17 0.42088968 14 nips-2011-A concave regularization technique for sparse mixture models

18 0.40471733 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

19 0.40279976 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

20 0.38193592 301 nips-2011-Variational Gaussian Process Dynamical Systems


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (4, 0.06), (20, 0.032), (26, 0.049), (27, 0.304), (31, 0.09), (33, 0.024), (43, 0.054), (45, 0.073), (57, 0.022), (63, 0.038), (74, 0.047), (83, 0.045), (99, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76492882 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1

2 0.7595045 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

Author: Ali Tofigh, Erik Sj̦lund, Mattias H̦glund, Jens Lagergren

Abstract: Cancer has complex patterns of progression that include converging as well as diverging progressional pathways. Vogelstein’s path model of colon cancer was a pioneering contribution to cancer research. Since then, several attempts have been made at obtaining mathematical models of cancer progression, devising learning algorithms, and applying these to cross-sectional data. Beerenwinkel et al. provided, what they coined, EM-like algorithms for Oncogenetic Trees (OTs) and mixtures of such. Given the small size of current and future data sets, it is important to minimize the number of parameters of a model. For this reason, we too focus on tree-based models and introduce Hidden-variable Oncogenetic Trees (HOTs). In contrast to OTs, HOTs allow for errors in the data and thereby provide more realistic modeling. We also design global structural EM algorithms for learning HOTs and mixtures of HOTs (HOT-mixtures). The algorithms are global in the sense that, during the M-step, they find a structure that yields a global maximum of the expected complete log-likelihood rather than merely one that improves it. The algorithm for single HOTs performs very well on reasonable-sized data sets, while that for HOT-mixtures requires data sets of sizes obtainable only with tomorrow’s more cost-efficient technologies. 1

3 0.7003926 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

Author: Joel Z. Leibo, Jim Mutch, Tomaso Poggio

Abstract: Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspecific cells projecting to downstream viewpoint-invariant identity-specific cells [1]. A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-specific. In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-specific transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. We argue here that in order to accomplish viewpoint-invariant face identification from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network. 1

4 0.61197537 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

Author: Joshua T. Abbott, Katherine A. Heller, Zoubin Ghahramani, Thomas L. Griffiths

Abstract: How do people determine which elements of a set are most representative of that set? We extend an existing Bayesian measure of representativeness, which indicates the representativeness of a sample from a distribution, to define a measure of the representativeness of an item to a set. We show that this measure is formally related to a machine learning method known as Bayesian Sets. Building on this connection, we derive an analytic expression for the representativeness of objects described by a sparse vector of binary features. We then apply this measure to a large database of images, using it to determine which images are the most representative members of different sets. Comparing the resulting predictions to human judgments of representativeness provides a test of this measure with naturalistic stimuli, and illustrates how databases that are more commonly used in computer vision and machine learning can be used to evaluate psychological theories. 1

5 0.47206271 221 nips-2011-Priors over Recurrent Continuous Time Processes

Author: Ardavan Saeedi, Alexandre Bouchard-côté

Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1

6 0.46840119 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

7 0.46797377 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

8 0.46575198 229 nips-2011-Query-Aware MCMC

9 0.46529701 2 nips-2011-A Brain-Machine Interface Operating with a Real-Time Spiking Neural Network Control Algorithm

10 0.46492606 75 nips-2011-Dynamical segmentation of single trials from population neural data

11 0.46231201 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

12 0.46098933 180 nips-2011-Multiple Instance Filtering

13 0.45944497 231 nips-2011-Randomized Algorithms for Comparison-based Search

14 0.45852867 22 nips-2011-Active Ranking using Pairwise Comparisons

15 0.45822343 158 nips-2011-Learning unbelievable probabilities

16 0.45741674 303 nips-2011-Video Annotation and Tracking with Active Learning

17 0.45727751 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems

18 0.45589462 258 nips-2011-Sparse Bayesian Multi-Task Learning

19 0.45569095 55 nips-2011-Collective Graphical Models

20 0.45531857 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation