Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn't work for us. 2. You say you'd rather sample from the modes and then average over them. But that won't work if if you have a zillion modes. Also, if you know where the modes are, the quickest way to estimate their relative masses might well be an MCMC algorithm that jumps through them. 3. Finally, pre-processing to find modes is fine, but if pre-processing is so important, it probably needs its own serious algorithm too. I think some work has been done here but I'm not up on the latest.

1 Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. [sent-1, score-1.016]

2 If we assume that we know where they are, then there is no point to tempering. [sent-2, score-0.095]

3 Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. [sent-3, score-0.87]

4 Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. [sent-6, score-0.764]

5 You say you’d rather sample from the modes and then average over them. [sent-8, score-0.733]

6 But that won’t work if if you have a zillion modes. [sent-9, score-0.173]

7 Also, if you know where the modes are, the quickest way to estimate their relative masses might well be an MCMC algorithm that jumps through them. [sent-10, score-1.362]

8 Finally, pre-processing to find modes is fine, but if pre-processing is so important, it probably needs its own serious algorithm too. [sent-12, score-0.973]

9 I think some work has been done here but I’m not up on the latest. [sent-13, score-0.157]

same-blog 1 1.0 1018 andrew gelman stats-2011-11-19-Tempering and modes

2 0.2407003 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling

Introduction: I’m involved (with Irv Garfinkel and others) in a planned survey of New York City residents. It’s hard to reach people in the city–not everyone will answer their mail or phone, and you can’t send an interviewer door-to-door in a locked apartment building. (I think it violates IRB to have a plan of pushing all the buzzers by the entrance and hoping someone will let you in.) So the plan is to use multiple modes, including phone, in person household, random street intercepts and mail. The question then is how to combine these samples. My suggested approach is to divide the population into poststrata based on various factors (age, ethnicity, family type, housing type, etc), then to pool responses within each poststratum, then to runs some regressions including postratsta and also indicators for mode, to understand how respondents from different modes differ, after controlling for the demographic/geographic adjustments. Maybe this has already been done and written up somewhere? P.

3 0.11239074 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.

4 0.099964477 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

Introduction: I hate to keep bumping our scheduled posts but this is just too important and too exciting to wait. So it’s time to jump the queue. The news is a paper from Michael Betancourt that presents a super-cool new way to compute normalizing constants: A common strategy for inference in complex models is the relaxation of a simple model into the more complex target model, for example the prior into the posterior in Bayesian inference. Existing approaches that attempt to generate such transformations, however, are sensitive to the pathologies of complex distributions and can be difficult to implement in practice. Leveraging the geometry of thermodynamic processes I introduce a principled and robust approach to deforming measures that presents a powerful new tool for inference. The idea is to generalize Hamiltonian Monte Carlo so that it moves through a family of distributions (that is, it transitions through an “inverse temperature” variable called beta that indexes the family) a

5 0.091303512 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

Introduction: Xiao-Li says yes: The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brig

6 0.072395571 2041 andrew gelman stats-2013-09-27-Setting up Jitts online

7 0.06867975 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

8 0.067585476 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

9 0.066658929 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?

10 0.064663723 976 andrew gelman stats-2011-10-27-Geophysicist Discovers Modeling Error (in Economics)

11 0.06450101 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

12 0.062437098 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research

13 0.057985842 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

14 0.05566876 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

15 0.055547412 1861 andrew gelman stats-2013-05-17-Where do theories come from?

16 0.055476926 695 andrew gelman stats-2011-05-04-Statistics ethics question

17 0.053988777 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

18 0.051779136 1428 andrew gelman stats-2012-07-25-The problem with realistic advice?

19 0.051503181 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?

20 0.050747208 1252 andrew gelman stats-2012-04-08-Jagdish Bhagwati’s definition of feminist sincerity

