andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1018 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
sentIndex sentText sentNum sentScore
1 Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. [sent-1, score-1.016]
2 If we assume that we know where they are, then there is no point to tempering. [sent-2, score-0.095]
3 Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. [sent-3, score-0.87]
4 Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. [sent-6, score-0.764]
5 You say you’d rather sample from the modes and then average over them. [sent-8, score-0.733]
6 But that won’t work if if you have a zillion modes. [sent-9, score-0.173]
7 Also, if you know where the modes are, the quickest way to estimate their relative masses might well be an MCMC algorithm that jumps through them. [sent-10, score-1.362]
8 Finally, pre-processing to find modes is fine, but if pre-processing is so important, it probably needs its own serious algorithm too. [sent-12, score-0.973]
9 I think some work has been done here but I’m not up on the latest. [sent-13, score-0.157]
wordName wordTfidf (topN-words)
[('modes', 0.59), ('tempering', 0.399), ('gustavo', 0.309), ('mode', 0.178), ('searching', 0.175), ('algorithm', 0.148), ('important', 0.143), ('quickest', 0.133), ('masses', 0.127), ('prescription', 0.123), ('appropriately', 0.116), ('jumps', 0.113), ('finally', 0.113), ('always', 0.109), ('messy', 0.105), ('zillion', 0.101), ('independently', 0.1), ('jumping', 0.098), ('sample', 0.095), ('parallel', 0.091), ('spirit', 0.089), ('done', 0.085), ('mcmc', 0.085), ('mass', 0.079), ('weight', 0.077), ('samples', 0.073), ('work', 0.072), ('latest', 0.072), ('admit', 0.069), ('find', 0.068), ('relative', 0.068), ('needs', 0.068), ('estimated', 0.062), ('tried', 0.06), ('happens', 0.06), ('step', 0.057), ('seemed', 0.055), ('serious', 0.054), ('won', 0.051), ('assume', 0.05), ('might', 0.05), ('easy', 0.049), ('fine', 0.048), ('average', 0.048), ('end', 0.048), ('know', 0.045), ('probably', 0.045), ('probability', 0.045), ('way', 0.044), ('estimate', 0.044)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1018 andrew gelman stats-2011-11-19-Tempering and modes
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
2 0.2407003 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling
Introduction: I’m involved (with Irv Garfinkel and others) in a planned survey of New York City residents. It’s hard to reach people in the city–not everyone will answer their mail or phone, and you can’t send an interviewer door-to-door in a locked apartment building. (I think it violates IRB to have a plan of pushing all the buzzers by the entrance and hoping someone will let you in.) So the plan is to use multiple modes, including phone, in person household, random street intercepts and mail. The question then is how to combine these samples. My suggested approach is to divide the population into poststrata based on various factors (age, ethnicity, family type, housing type, etc), then to pool responses within each poststratum, then to runs some regressions including postratsta and also indicators for mode, to understand how respondents from different modes differ, after controlling for the demographic/geographic adjustments. Maybe this has already been done and written up somewhere? P.
3 0.11239074 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.
Introduction: I hate to keep bumping our scheduled posts but this is just too important and too exciting to wait. So it’s time to jump the queue. The news is a paper from Michael Betancourt that presents a super-cool new way to compute normalizing constants: A common strategy for inference in complex models is the relaxation of a simple model into the more complex target model, for example the prior into the posterior in Bayesian inference. Existing approaches that attempt to generate such transformations, however, are sensitive to the pathologies of complex distributions and can be difficult to implement in practice. Leveraging the geometry of thermodynamic processes I introduce a principled and robust approach to deforming measures that presents a powerful new tool for inference. The idea is to generalize Hamiltonian Monte Carlo so that it moves through a family of distributions (that is, it transitions through an “inverse temperature” variable called beta that indexes the family) a
5 0.091303512 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?
Introduction: Xiao-Li says yes: The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brig
6 0.072395571 2041 andrew gelman stats-2013-09-27-Setting up Jitts online
7 0.06867975 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?
8 0.067585476 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
9 0.066658929 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?
10 0.064663723 976 andrew gelman stats-2011-10-27-Geophysicist Discovers Modeling Error (in Economics)
11 0.06450101 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
12 0.062437098 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research
13 0.057985842 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics
14 0.05566876 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
15 0.055547412 1861 andrew gelman stats-2013-05-17-Where do theories come from?
16 0.055476926 695 andrew gelman stats-2011-05-04-Statistics ethics question
17 0.053988777 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood
18 0.051779136 1428 andrew gelman stats-2012-07-25-The problem with realistic advice?
19 0.051503181 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?
20 0.050747208 1252 andrew gelman stats-2012-04-08-Jagdish Bhagwati’s definition of feminist sincerity
topicId topicWeight
[(0, 0.104), (1, 0.009), (2, 0.015), (3, -0.005), (4, 0.029), (5, 0.007), (6, 0.026), (7, -0.002), (8, 0.012), (9, -0.028), (10, 0.006), (11, -0.016), (12, -0.015), (13, -0.002), (14, -0.028), (15, -0.023), (16, -0.019), (17, 0.003), (18, 0.002), (19, -0.005), (20, -0.015), (21, -0.013), (22, 0.005), (23, 0.033), (24, -0.005), (25, -0.003), (26, 0.005), (27, 0.005), (28, 0.02), (29, 0.0), (30, 0.026), (31, 0.01), (32, 0.005), (33, 0.012), (34, -0.005), (35, -0.032), (36, -0.016), (37, -0.004), (38, -0.022), (39, -0.011), (40, -0.0), (41, 0.007), (42, -0.036), (43, -0.024), (44, 0.014), (45, -0.033), (46, -0.023), (47, 0.025), (48, 0.032), (49, -0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.94399476 1018 andrew gelman stats-2011-11-19-Tempering and modes
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
2 0.73898715 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?
Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e
3 0.71896768 535 andrew gelman stats-2011-01-24-Bleg: Automatic Differentiation for Log Prob Gradients?
Introduction: We need help picking out an automatic differentiation package for Hamiltonian Monte Carlo sampling from the posterior of a generalized linear model with deep interactions. Specifically, we need to compute gradients for log probability functions with thousands of parameters that involve matrix (determinants, eigenvalues, inverses), stats (distributions), and math (log gamma) functions. Any suggestions? The Application: Hybrid Monte Carlo for Posteriors We’re getting serious about implementing posterior sampling using Hamiltonian Monte Carlo. HMC speeds up mixing by including gradient information to help guide the Metropolis proposals toward areas high probability. In practice, the algorithm requires a handful or of gradient calculations per sample, but there are many dimensions and the functions are hairy enough we don’t want to compute derivaties by hand. Auto Diff: Perhaps not What you Think It may not have been clear to readers of this blog that automatic diffe
4 0.71342546 107 andrew gelman stats-2010-06-24-PPS in Georgia
Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel
5 0.70646721 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression
6 0.70428139 984 andrew gelman stats-2011-11-01-David MacKay sez . . . 12??
7 0.69898975 459 andrew gelman stats-2010-12-09-Solve mazes by starting at the exit
8 0.69789064 1691 andrew gelman stats-2013-01-25-Extreem p-values!
9 0.69741279 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism
10 0.69224477 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs
11 0.68711764 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories
12 0.68390733 650 andrew gelman stats-2011-04-05-Monitor the efficiency of your Markov chain sampler using expected squared jumped distance!
13 0.67380971 2257 andrew gelman stats-2014-03-20-The candy weighing demonstration, or, the unwisdom of crowds
14 0.6727103 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns
15 0.67203808 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again
16 0.67155367 1520 andrew gelman stats-2012-10-03-Advice that’s so eminently sensible but so difficult to follow
17 0.67136306 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission
18 0.66852456 938 andrew gelman stats-2011-10-03-Comparing prediction errors
19 0.66603154 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley
topicId topicWeight
[(0, 0.016), (6, 0.033), (16, 0.02), (20, 0.017), (21, 0.022), (24, 0.088), (53, 0.013), (57, 0.318), (86, 0.032), (99, 0.309)]
simIndex simValue blogId blogTitle
1 0.93733716 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion
Introduction: We have lots of models for overdispersed count data but we rarely see underdispersed data. But now I know what example I’ll be giving when this next comes up in class. From a book review by Theo Tait: A number of shark species go in for oophagy, or uterine cannibalism. Sand tiger foetuses ‘eat each other in utero, acting out the harshest form of sibling rivalry imaginable’. Only two babies emerge, one from each of the mother shark’s uteruses: the survivors have eaten everything else. ‘A female sand tiger gives birth to a baby that’s already a metre long and an experienced killer,’ explains Demian Chapman, an expert on the subject. That’s what I call underdispersion. E(y)=2, var(y)=0. Take that, M. Poisson!
Introduction: That’s ok , Krugman earlier slammed Galbraith. (I wonder if Krugman is as big a fan of “tough choices” now as he was in 1996 .) Given Krugman’s politicization in recent years, I’m surprised he’s so dismissive of the political (rather than technical-economic) nature of Hayek’s influence. (I don’t know if he’s changed his views on Galbraith in recent years.) P.S. Greg Mankiw, in contrast, labels Galbraith and Hayek as “two of the great economists of the 20th century” and writes, “even though their most famous works were written many decades ago, they are still well worth reading today.”
3 0.91630161 1146 andrew gelman stats-2012-01-30-Convenient page of data sources from the Washington Post
Introduction: Wayne Folta points us to this list .
same-blog 4 0.89719707 1018 andrew gelman stats-2011-11-19-Tempering and modes
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
5 0.87616974 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
Introduction: Tomas Iesmantas writes: I’m dealing with high dimensional (40-50 parameters) hierarchical bayesian model applied to nonlinear Poisson regression problem. Now I’m using an adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift (Yves F. Atchade, 2003) to obtain samples from posterior. But this algorithm is not very efficient in my case, it needs several millions iterations as burn-in period. And simulation takes quite a long time, since algorithm has to work with 40×40 matrices. Maybe you know another MCMC algorithm which could take not so many burn-in samples and would be able to deal with nonlinear regression? In non-hierarchical nonlinear regression model adaptive metropolis algorithm is enough, but in hierarchical case I could use something more effective. My reply: Try fitting the model in Stan. If that doesn’t work, let me know.
7 0.85674584 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?
8 0.85153687 215 andrew gelman stats-2010-08-18-DataMarket
9 0.83549052 891 andrew gelman stats-2011-09-05-World Bank data now online
10 0.83051163 1120 andrew gelman stats-2012-01-15-Fun fight over the Grover search algorithm
11 0.82644987 306 andrew gelman stats-2010-09-29-Statistics and the end of time
12 0.82605875 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
14 0.81606364 989 andrew gelman stats-2011-11-03-This post does not mention Wegman
15 0.79138947 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
16 0.78949267 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
17 0.78351128 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey
19 0.77107978 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
20 0.76957971 1108 andrew gelman stats-2012-01-09-Blogging, polemical and otherwise