andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1521 knowledge-graph by maker-knowledge-mining

1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks


meta infos for this blog

Source: html

Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . [sent-1, score-0.327]

2 He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. [sent-5, score-0.785]

3 Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. [sent-6, score-0.763]

4 OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? [sent-7, score-0.219]

5 The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. [sent-8, score-1.22]

6 I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. [sent-9, score-0.518]

7 That’s the essence of predictive model checking: you take advantage of the fact that you’re working with a generative model, and you generate anything and everything you can. [sent-10, score-0.643]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('columbo', 0.559), ('killer', 0.314), ('detective', 0.279), ('examines', 0.25), ('murder', 0.209), ('yeah', 0.162), ('clues', 0.14), ('seriously', 0.139), ('fictional', 0.134), ('trap', 0.117), ('essence', 0.114), ('discrepancies', 0.114), ('generative', 0.112), ('contradictions', 0.109), ('ronald', 0.109), ('jokes', 0.109), ('reagan', 0.102), ('ok', 0.101), ('internal', 0.099), ('tradition', 0.097), ('expansion', 0.097), ('exploring', 0.096), ('finds', 0.094), ('tries', 0.094), ('proved', 0.094), ('saying', 0.09), ('concludes', 0.088), ('model', 0.088), ('keeping', 0.085), ('differ', 0.085), ('hypotheses', 0.085), ('generate', 0.084), ('story', 0.082), ('eventually', 0.081), ('implications', 0.078), ('record', 0.078), ('advantage', 0.074), ('checking', 0.069), ('carefully', 0.068), ('predictive', 0.067), ('via', 0.065), ('takes', 0.063), ('chapter', 0.063), ('statistician', 0.058), ('everything', 0.056), ('taking', 0.053), ('re', 0.053), ('difference', 0.051), ('fact', 0.048), ('fit', 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks

Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t

2 0.10062143 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

Introduction: In politics we’re familiar with the non-apology apology (well described in Wikipedia as “a statement that has the form of an apology but does not express the expected contrition”). Here’s the scientific equivalent: the non-retraction retraction. Sanjay Srivastava points to an amusing yet barfable story of a pair of researchers who (inadvertently, I assume) made a data coding error and were eventually moved to issue a correction notice, but even then refused to fully admit their error. As Srivastava puts it, the story “ended up with Lew [Goldberg] and colleagues [Kibeom Lee and Michael Ashton] publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal.” From the comment on the erratum: In their “erratum and addendum,” Anderson and Ones (this issue) explained that we had brought their attention to the “potential” of a “possible” misalignment and described the results computed from re-aligned data as being based on a “post-ho

3 0.093819223 1152 andrew gelman stats-2012-02-03-Web equation

Introduction: Aleks sends along this app which, while cute, is not quite “killer” for me. I find it more difficult to write the equation using the trackpad than to simply type it in using Latex! But I suppose it could be useful to beginners who want their papers to look more like science .

4 0.086745091 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

5 0.086606666 1255 andrew gelman stats-2012-04-10-Amtrak sucks

Introduction: Couldn’t they at least let me buy my tickets from Amazon so I wouldn’t have to re-enter the credit card information each time? Yeah, yeah, I know it’s no big deal. It just seems so silly.

6 0.077955753 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

7 0.073816679 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

8 0.071861207 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

9 0.071655847 1431 andrew gelman stats-2012-07-27-Overfitting

10 0.070535913 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

11 0.069993079 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career

12 0.063125916 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

13 0.062975824 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

14 0.062708899 2119 andrew gelman stats-2013-12-01-Separated by a common blah blah blah

15 0.055822298 331 andrew gelman stats-2010-10-10-Bayes jumps the shark

16 0.055141203 197 andrew gelman stats-2010-08-10-The last great essayist?

17 0.054901954 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

18 0.054865841 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion

19 0.053598914 852 andrew gelman stats-2011-08-13-Checking your model using fake data

20 0.053065322 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.09), (1, 0.037), (2, -0.008), (3, 0.031), (4, -0.019), (5, -0.002), (6, 0.002), (7, -0.002), (8, 0.066), (9, 0.012), (10, -0.009), (11, 0.023), (12, -0.031), (13, -0.013), (14, -0.019), (15, -0.015), (16, 0.029), (17, -0.02), (18, -0.007), (19, -0.012), (20, -0.008), (21, -0.007), (22, -0.033), (23, -0.028), (24, -0.029), (25, -0.012), (26, -0.029), (27, -0.011), (28, 0.009), (29, -0.0), (30, -0.003), (31, 0.013), (32, -0.001), (33, 0.018), (34, 0.014), (35, 0.027), (36, -0.01), (37, -0.026), (38, 0.005), (39, -0.003), (40, 0.021), (41, -0.003), (42, 0.001), (43, 0.015), (44, 0.012), (45, -0.017), (46, 0.019), (47, -0.012), (48, -0.02), (49, 0.01)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97458607 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks

Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t

2 0.83715302 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

3 0.82429677 72 andrew gelman stats-2010-06-07-Valencia: Summer of 1991

Introduction: With the completion of the last edition of Jose Bernardo’s Valencia (Spain) conference on Bayesian statistics–I didn’t attend, but many of my friends were there–I thought I’d share my strongest memory of the Valencia conference that I attended in 1991. I contributed a poster and a discussion, both on the topic of inference from iterative simulation, but what I remember most vividly, and what bothered me the most, was how little interest there was in checking model fit. Not only had people mostly not checked the fit of their models to data, and not only did they seem uninterested in such checks, even worse was that many of these Bayesians felt that it was basically illegal to check model fit. I don’t want to get too down on Bayesians for this. Lots of non-Bayesian statisticians go around not checking their models too. With Bayes, though, model checking seems particularly important because Bayesians rely on their models so strongly, not just as a way of getting point estimates bu

4 0.81804425 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

Introduction: David Rohde writes: I have been thinking a lot lately about your Bayesian model checking approach. This is in part because I have been working on exploratory data analysis and wishing to avoid controversy and mathematical statistics we omitted model checking from our discussion. This is something that the refereeing process picked us up on and we ultimately added a critical discussion of null-hypothesis testing to our paper . The exploratory technique we discussed was essentially a 2D histogram approach, but we used Polya models as a formal model for the histogram. We are currently working on a new paper, and we are thinking through how or if we should do “confirmatory analysis” or model checking in the paper. What I find most admirable about your statistical work is that you clearly use the Bayesian approach to do useful applied statistical analysis. My own attempts at applied Bayesian analysis makes me greatly admire your applied successes. On the other hand it may be t

5 0.81493855 1392 andrew gelman stats-2012-06-26-Occam

Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad

6 0.80714136 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

7 0.80448633 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

8 0.80280972 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation

9 0.80144674 2133 andrew gelman stats-2013-12-13-Flexibility is good

10 0.80100137 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

11 0.80030686 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

12 0.79839569 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics

13 0.79396105 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

14 0.78298223 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

15 0.78205317 1431 andrew gelman stats-2012-07-27-Overfitting

16 0.78187513 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

17 0.77457547 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

18 0.77265364 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

19 0.77096707 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

20 0.76816225 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.028), (15, 0.028), (16, 0.09), (21, 0.028), (23, 0.016), (24, 0.123), (28, 0.035), (51, 0.031), (53, 0.017), (63, 0.045), (64, 0.229), (72, 0.013), (81, 0.026), (99, 0.172)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.87037778 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks

Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t

2 0.85568964 1109 andrew gelman stats-2012-01-09-Google correlate links statistics with minorities

Introduction: John Eppley asks what I make of this : Eppley is guessing the negative spikes are searches getting swamped by holiday season shoppers.

3 0.85032421 1653 andrew gelman stats-2013-01-04-Census dotmap

Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.

4 0.8500098 985 andrew gelman stats-2011-11-01-Doug Schoen has 2 poll reports

Introduction: According to Chris Wilson , there are two versions of the report of the Occupy Wall Street poll from so-called hack pollster Doug Schoen. Here’s the report that Azi Paybarah says that Schoen sent to him, and here’s the final question from the poll: And here’s what’s on Schoen’s own website: Very similar, except for that last phrase, “no matter what the cost.” I have no idea which was actually asked to the survey participants, but it’s a reminder of the difficulties of public opinion research—sometimes you don’t even know what question was asked! I’m not implying anything sinister on Schoen’s part, it’s just interesting to see these two documents floating around. P.S. More here from Kaiser Fung on fundamental flaws with Schoen’s poll.

5 0.82174879 595 andrew gelman stats-2011-02-28-What Zombies see in Scatterplots

Introduction: This video caught my interest – news video clip (from this post2 ) http://www.stat.columbia.edu/~cook/movabletype/archives/2011/02/on_summarizing.html The news commentator did seem to be trying to point out what a couple of states had to say about the claimed relationship – almost on their own. Some methods have been worked out for zombies to do just this! So I grabbed the data as close as I quickly could, modified the code slightly and here’s the zombie veiw of it. PoliticInt.pdf North Carolina is the bolded red curve, Idaho the bolded green curve. Missisipi and New York are the bolded blue. As ugly as it is this is the Bayasian marginal picture – exactly (given MCMC errror). K? p.s. you will get a very confusing picture if you forget to centre the x (i.e. see chapter 4 of Gelman and Hill book)

6 0.80460441 118 andrew gelman stats-2010-06-30-Question & Answer Communities

7 0.77906054 724 andrew gelman stats-2011-05-21-New search engine for data & statistics

8 0.760077 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves

9 0.74817443 11 andrew gelman stats-2010-04-29-Auto-Gladwell, or Can fractals be used to predict human history?

10 0.74498433 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?

11 0.72534209 2249 andrew gelman stats-2014-03-15-Recently in the sister blog

12 0.70053023 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

13 0.70040566 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

14 0.69185871 977 andrew gelman stats-2011-10-27-Hack pollster Doug Schoen illustrates a general point: The #1 way to lie with statistics is . . . to just lie!

15 0.69143689 599 andrew gelman stats-2011-03-03-Two interesting posts elsewhere on graphics

16 0.68446004 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

17 0.68207884 747 andrew gelman stats-2011-06-06-Research Directions for Machine Learning and Algorithms

18 0.68105096 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

19 0.68071872 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

20 0.68039405 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials