andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2182 knowledge-graph by maker-knowledge-mining

2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis


meta infos for this blog

Source: html

Introduction: One of the new examples for the third edition of Bayesian Data Analysis is a spell-checking story. Here it is (just start at 2/3 down on the first page, with “Spelling correction”). I like this example—it demonstrates the Bayesian algebra, also gives a sense of the way that probability models (both “likelihood” and “prior”) are constructed from existing assumptions and data. The models aren’t just specified as a mathematical exercise, they represent some statement about reality. And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One of the new examples for the third edition of Bayesian Data Analysis is a spell-checking story. [sent-1, score-0.526]

2 Here it is (just start at 2/3 down on the first page, with “Spelling correction”). [sent-2, score-0.173]

3 I like this example—it demonstrates the Bayesian algebra, also gives a sense of the way that probability models (both “likelihood” and “prior”) are constructed from existing assumptions and data. [sent-3, score-1.376]

4 The models aren’t just specified as a mathematical exercise, they represent some statement about reality. [sent-4, score-0.831]

5 And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities. [sent-5, score-1.228]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('spelling', 0.312), ('algebra', 0.231), ('constructed', 0.228), ('demonstrates', 0.228), ('possibilities', 0.215), ('criticized', 0.213), ('specified', 0.213), ('correction', 0.211), ('exercise', 0.198), ('edition', 0.196), ('improved', 0.183), ('bayesian', 0.172), ('existing', 0.166), ('models', 0.165), ('represent', 0.158), ('third', 0.158), ('mathematical', 0.15), ('likelihood', 0.146), ('statement', 0.145), ('assumptions', 0.144), ('aren', 0.137), ('experience', 0.13), ('gives', 0.13), ('close', 0.127), ('page', 0.123), ('ways', 0.117), ('examples', 0.113), ('prior', 0.112), ('example', 0.11), ('start', 0.109), ('three', 0.107), ('probability', 0.105), ('simple', 0.103), ('consider', 0.102), ('enough', 0.082), ('sense', 0.079), ('problem', 0.072), ('analysis', 0.071), ('model', 0.065), ('first', 0.064), ('new', 0.059), ('way', 0.052), ('data', 0.045), ('also', 0.043), ('like', 0.036), ('one', 0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

Introduction: One of the new examples for the third edition of Bayesian Data Analysis is a spell-checking story. Here it is (just start at 2/3 down on the first page, with “Spelling correction”). I like this example—it demonstrates the Bayesian algebra, also gives a sense of the way that probability models (both “likelihood” and “prior”) are constructed from existing assumptions and data. The models aren’t just specified as a mathematical exercise, they represent some statement about reality. And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities.

2 0.15928936 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn

3 0.15322463 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

4 0.14732835 2022 andrew gelman stats-2013-09-13-You heard it here first: Intense exercise can suppress appetite

Introduction: This post is by Phil Price. The New York Times recently ran an article entitled “How Exercise Can Help Us Eat Less,” which begins with this: “Strenuous exercise seems to dull the urge to eat afterward better than gentler workouts, several new studies show, adding to a growing body of science suggesting that intense exercise may have unique benefits.” The article is based on a couple of recent studies in which moderately overweight volunteers participated in different types of exercise, and had their food intake monitored at a subsequent meal. The article also says “[The volunteers] also displayed significantly lower levels of the hormone ghrelin, which is known to stimulate appetite, and elevated levels of both blood lactate and blood sugar, which have been shown to lessen the drive to eat, after the most vigorous interval session than after the other workouts. And the appetite-suppressing effect of the highly intense intervals lingered into the next day, according to food diarie

5 0.14479467 1554 andrew gelman stats-2012-10-31-It not necessary that Bayesian methods conform to the likelihood principle

Introduction: Bayesian inference, conditional on the model and data, conforms to the likelihood principle. But there is more to Bayesian methods than Bayesian inference. See chapters 6 and 7 of Bayesian Data Analysis for much discussion of this point. It saddens me to see that people are still confused on this issue.

6 0.14391389 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

7 0.1404734 1469 andrew gelman stats-2012-08-25-Ways of knowing

8 0.13674982 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

9 0.1343147 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

10 0.13149163 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else

11 0.12688887 1941 andrew gelman stats-2013-07-16-Priors

12 0.12609529 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

13 0.12508696 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

14 0.12323073 1459 andrew gelman stats-2012-08-15-How I think about mixture models

15 0.11962494 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

16 0.11883818 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

17 0.11883514 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

18 0.11734347 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

19 0.11534686 1529 andrew gelman stats-2012-10-11-Bayesian brains?

20 0.11420569 2368 andrew gelman stats-2014-06-11-Bayes in the research conversation


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.17), (1, 0.18), (2, -0.046), (3, 0.075), (4, -0.068), (5, -0.009), (6, 0.011), (7, 0.042), (8, 0.006), (9, -0.014), (10, -0.005), (11, -0.027), (12, -0.019), (13, -0.003), (14, 0.011), (15, 0.04), (16, 0.048), (17, 0.008), (18, 0.01), (19, 0.022), (20, -0.007), (21, 0.026), (22, 0.002), (23, -0.01), (24, -0.02), (25, 0.01), (26, 0.034), (27, -0.024), (28, 0.026), (29, -0.027), (30, -0.082), (31, 0.002), (32, -0.045), (33, 0.025), (34, 0.036), (35, -0.028), (36, -0.046), (37, -0.029), (38, -0.008), (39, 0.002), (40, 0.011), (41, 0.011), (42, 0.009), (43, 0.018), (44, 0.027), (45, 0.011), (46, -0.001), (47, 0.01), (48, -0.047), (49, -0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97844464 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

Introduction: One of the new examples for the third edition of Bayesian Data Analysis is a spell-checking story. Here it is (just start at 2/3 down on the first page, with “Spelling correction”). I like this example—it demonstrates the Bayesian algebra, also gives a sense of the way that probability models (both “likelihood” and “prior”) are constructed from existing assumptions and data. The models aren’t just specified as a mathematical exercise, they represent some statement about reality. And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities.

2 0.84527308 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

Introduction: Ryan Ickert writes: I was wondering if you’d seen this post , by a particle physicist with some degree of influence. Dr. Dorigo works at CERN and Fermilab. The penultimate paragraph is: From the above expression, the Frequentist researcher concludes that the tracker is indeed biased, and rejects the null hypothesis H0, since there is a less-than-2% probability (P’<α) that a result as the one observed could arise by chance! A Frequentist thus draws, strongly, the opposite conclusion than a Bayesian from the same set of data. How to solve the riddle? He goes on to not solve the riddle. Perhaps you can? Surely with the large sample size they have (n=10^6), the precision on the frequentist p-value is pretty good, is it not? My reply: The first comment on the site (by Anonymous [who, just to be clear, is not me; I have no idea who wrote that comment], 22 Feb 2012, 21:27pm) pretty much nails it: In setting up the Bayesian model, Dorigo assumed a silly distribution on th

3 0.82355803 811 andrew gelman stats-2011-07-20-Kind of Bayesian

Introduction: Astrophysicist Andrew Jaffe pointed me to this and discussion of my philosophy of statistics (which is, in turn, my rational reconstruction of the statistical practice of Bayesians such as Rubin and Jaynes). Jaffe’s summary is fair enough and I only disagree in a few points: 1. Jaffe writes: Subjective probability, at least the way it is actually used by practicing scientists, is a sort of “as-if” subjectivity — how would an agent reason if her beliefs were reflected in a certain set of probability distributions? This is why when I discuss probability I try to make the pedantic point that all probabilities are conditional, at least on some background prior information or context. I agree, and my problem with the usual procedures used for Bayesian model comparison and Bayesian model averaging is not that these approaches are subjective but that the particular models being considered don’t make sense. I’m thinking of the sorts of models that say the truth is either A or

4 0.80896837 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

Introduction: Deborah Mayo pointed me to this discussion by Christian Hennig of my recent article on Induction and Deduction in Bayesian Data Analysis. A couple days ago I responded to comments by Mayo, Stephen Senn, and Larry Wasserman. I will respond to Hennig by pulling out paragraphs from his discussion and then replying. Hennig: for me the terms “frequentist” and “subjective Bayes” point to interpretations of probability, and not to specific methods of inference. The frequentist one refers to the idea that there is an underlying data generating process that repeatedly throws out data and would approximate the assumed distribution if one could only repeat it infinitely often. Hennig makes the good point that, if this is the way you would define “frequentist” (it’s not how I’d define the term myself, but I’ll use Hennig’s definition here), then it makes sense to be a frequentist in some settings but not others. Dice really can be rolled over and over again; a sample survey of 15

5 0.80703729 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn

6 0.79902947 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

7 0.79689509 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

8 0.79560661 1438 andrew gelman stats-2012-07-31-What is a Bayesian?

9 0.79165882 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

10 0.78789437 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

11 0.78575671 2027 andrew gelman stats-2013-09-17-Christian Robert on the Jeffreys-Lindley paradox; more generally, it’s good news when philosophical arguments can be transformed into technical modeling issues

12 0.78355736 1157 andrew gelman stats-2012-02-07-Philosophy of Bayesian statistics: my reactions to Hendry

13 0.77842182 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

14 0.77472395 1571 andrew gelman stats-2012-11-09-The anti-Bayesian moment and its passing

15 0.77369338 1529 andrew gelman stats-2012-10-11-Bayesian brains?

16 0.77104628 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

17 0.77099371 1332 andrew gelman stats-2012-05-20-Problemen met het boek

18 0.77064669 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

19 0.76645553 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

20 0.76620966 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.163), (18, 0.051), (24, 0.119), (40, 0.034), (53, 0.047), (86, 0.139), (94, 0.045), (99, 0.273)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95627993 185 andrew gelman stats-2010-08-04-Why does anyone support private macroeconomic forecasts?

Introduction: Tyler Cowen asks the above question. I don’t have a full answer, but, in the Economics section of A Quantitative Tour of the Social Sciences , Richard Clarida discusses in detail the ways that researchers have tried to estimate the extent to which government or private forecasts supply additional information.

same-blog 2 0.95082372 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

Introduction: One of the new examples for the third edition of Bayesian Data Analysis is a spell-checking story. Here it is (just start at 2/3 down on the first page, with “Spelling correction”). I like this example—it demonstrates the Bayesian algebra, also gives a sense of the way that probability models (both “likelihood” and “prior”) are constructed from existing assumptions and data. The models aren’t just specified as a mathematical exercise, they represent some statement about reality. And the problem is close enough to our experience that we can consider ways in which the model can be criticized and improved, all in a simple example that has only three possibilities.

3 0.92983043 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

Introduction: In an article provocatively entitled, “Will Ohio State’s football team decide who wins the White House?”, Tyler Cowen and Kevin Grier report : It is statistically possible that the outcome of a handful of college football games in the right battleground states could determine the race for the White House. Economists Andrew Healy, Neil Malhotra, and Cecilia Mo make this argument in a fascinating article in the Proceedings of the National Academy of Science. They examined whether the outcomes of college football games on the eve of elections for presidents, senators, and governors affected the choices voters made. They found that a win by the local team, in the week before an election, raises the vote going to the incumbent by around 1.5 percentage points. When it comes to the 20 highest attendance teams—big athletic programs like the University of Michigan, Oklahoma, and Southern Cal—a victory on the eve of an election pushes the vote for the incumbent up by 3 percentage points. T

4 0.92594576 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”

Introduction: As a statistician I am particularly worried about the rhetorical power of anecdotes (even though I use them in my own reasoning; see discussion below). But much can be learned from a true anecdote. The rough edges—the places where the anecdote doesn’t fit your thesis—these are where you learn. We have recently had a discussion ( here and here ) of Karl Weick, a prominent scholar of business management who plagiarized a story and then went on to draw different lessons from the pilfered anecdote in several different publications published over many years. Setting aside an issues of plagiarism and rulebreaking, I argue that, by hiding the source of the story and changing its form, Weick and his management-science audience are losing their ability to get anything out of it beyond empty confirmation. A full discussion follows. 1. The lost Hungarian soldiers Thomas Basbøll (who has the unusual (to me) job of “writing consultant” at the Copenhagen Business School) has been

5 0.92368257 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update

Introduction: Sports researcher Dave Berri had a disagreement with a remark in our recent discussion of Malcolm Gladwell. Berri writes: This post [from Gelman] contains the following paragraph: Similarly, when Gladwell claimed that NFL quarterback performance is unrelated to the order they were drafted out of college, he appears to have been wrong. But if you take his writing as stone soup, maybe it’s valuable: just retreat to the statement that there’s only a weak relationship between draft order and NFL performance. That alone is interesting. It’s too bad that Gladwell sometimes has to make false general statements in order to get our attention, but maybe that’s what is needed to shake people out of their mental complacency. The above paragraph links to a blog post by Eric Loken. This is something you have linked to before. And when you linked to it before I tried to explain why Loken’s work is not very good. Since you still think this work shows that Gladwell – and therefore Rob

6 0.91974556 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist

7 0.91954827 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building

8 0.91919804 722 andrew gelman stats-2011-05-20-Why no Wegmania?

9 0.91689312 615 andrew gelman stats-2011-03-16-Chess vs. checkers

10 0.91669816 253 andrew gelman stats-2010-09-03-Gladwell vs Pinker

11 0.91606557 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

12 0.91444635 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

13 0.91356218 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

14 0.91308916 1971 andrew gelman stats-2013-08-07-I doubt they cheated

15 0.91131216 873 andrew gelman stats-2011-08-26-Luck or knowledge?

16 0.91112655 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?

17 0.91040844 2058 andrew gelman stats-2013-10-11-Gladwell and Chabris, David and Goliath, and science writing as stone soup

18 0.90882945 1327 andrew gelman stats-2012-05-18-Comments on “A Bayesian approach to complex clinical diagnoses: a case-study in child abuse”

19 0.9085263 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models

20 0.90783042 187 andrew gelman stats-2010-08-05-Update on state size and governors’ popularity