andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2184 knowledge-graph by maker-knowledge-mining

2184 andrew gelman stats-2014-01-24-Parables vs. stories


meta infos for this blog

Source: html

Introduction: God is in every leaf of every tree , but he is not in every leaf of every parable. Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neal, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books: By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial. The moral: We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 God is in every leaf of every tree , but he is not in every leaf of every parable. [sent-1, score-0.811]

2 But privacy takes lives too, as we see from this story of emergency room deaths. [sent-11, score-0.544]

3 Here’s Wikipedia: “A town is a human settlement larger than a village but smaller than a city. [sent-17, score-0.607]

4 In the United States of America, the term “town” refers to an area of population distinct from others in some meaningful dimension, typically population or type of government. [sent-22, score-0.405]

5 In some instances, the term “town” refers to a small incorporated municipality of less than 10,000 people, while in others a town can be significantly larger. [sent-26, score-0.907]

6 Some states do not use the term ‘town’ at all, while in others the term has no official meaning and is used informally to refer to a populated place, of any size, whether incorporated or unincorporated. [sent-27, score-0.39]

7 If approximately 1/70 of the population is dying every year, that’s 140 deaths a year. [sent-34, score-0.386]

8 So that can’t be right—there’s no way that half the deaths in this town are caused by poor record-keeping in a hospital. [sent-35, score-0.7]

9 If the town had 20,000 people (which would seem to be near the upper limit of the population of a town that one would call “smallish,” at least in the United States), then we’re talking 1/4 of the deaths, which still seems way too large a proportion. [sent-36, score-1.233]

10 Even if it is a town with lots of old people, so that much more than 1/70 of the population is dropping off each year, the numbers just don’t seem to add up. [sent-37, score-0.678]

11 Based on my calculations, I feel like there is something missing in the story that was told about the hospital records. [sent-41, score-0.426]

12 It’s hard for me to know, though, because the story is not sourced. [sent-44, score-0.241]

13 My point is that, if we want to truly learn from a story, in the “God is in every leaf of every tree” sense, we can’t just relax and soak in the message, we need to push push push. [sent-47, score-0.503]

14 I don’t know what’s going on here, whether the story is entirely made up or maybe the numbers got garbled and are off by one or two orders of magnitude or maybe two stories got mashed up and something was lost in translation, or maybe there’s some key aspect I’m not understanding. [sent-48, score-0.518]

15 Without sourcing, without any way to get more information, we don’t have a story at all, we have a parable. [sent-51, score-0.241]

16 Here’s Webster: par·a·ble noun \ˈpa-rə-bəl\ : a short story that teaches a moral or spiritual lesson; especially : one of the stories told by Jesus Christ and recorded in the Bible . [sent-52, score-0.579]

17 specifically : a usually short fictitious story that illustrates a moral attitude or a religious principle OK, so there’s nothing religious in this parable (nothing about Bayes or Emacs or Linux, yuk yuk yuk) but, yes, it illustrates a moral attitude. [sent-55, score-1.309]

18 If deeper investigation into the parable were to change or complexify its message, that would defeat the purpose. [sent-59, score-0.304]

19 Indeed, for the purpose of writing a general-interest book, maybe a parable works better than an endlessly-complexifying story. [sent-64, score-0.355]

20 But I think it’s been a good real-life story to illustrate the distinction between parables and real-life stories. [sent-66, score-0.35]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('town', 0.555), ('story', 0.241), ('parable', 0.238), ('smallish', 0.19), ('yuk', 0.164), ('deaths', 0.145), ('leaf', 0.135), ('rachel', 0.135), ('population', 0.123), ('moral', 0.12), ('hospital', 0.119), ('every', 0.118), ('crawshaw', 0.109), ('municipality', 0.109), ('parables', 0.109), ('stories', 0.097), ('privacy', 0.095), ('cathy', 0.095), ('term', 0.093), ('emergency', 0.092), ('picky', 0.09), ('incorporated', 0.084), ('god', 0.075), ('patients', 0.072), ('tree', 0.069), ('records', 0.069), ('illustrates', 0.068), ('told', 0.066), ('push', 0.066), ('refers', 0.066), ('deeper', 0.066), ('states', 0.065), ('religious', 0.063), ('match', 0.062), ('maybe', 0.06), ('wikipedia', 0.059), ('room', 0.059), ('city', 0.058), ('purpose', 0.057), ('lives', 0.057), ('fractality', 0.055), ('noun', 0.055), ('webster', 0.055), ('informally', 0.055), ('united', 0.054), ('breaches', 0.052), ('addendum', 0.052), ('reminiscent', 0.052), ('sourcing', 0.052), ('settlement', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999964 2184 andrew gelman stats-2014-01-24-Parables vs. stories

Introduction: God is in every leaf of every tree , but he is not in every leaf of every parable. Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neal, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books: By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial. The moral: We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from

2 0.69306129 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

3 0.2187233 195 andrew gelman stats-2010-08-09-President Carter

Introduction: This assessment by Tyler Cowen reminded me that, in 1980, I and just about all my friends hated Jimmy Carter. Most of us much preferred him to Reagan but still hated Carter. I wouldn’t associate this with any particular ideological feeling—it’s not that we thought he was too liberal, or too conservative. He just seemed completely ineffectual. I remember feeling at the time that he had no principles, that he’d do anything to get elected. In retrospect, I think of this as an instance of uniform partisan swing: the president was unpopular nationally, and attitudes about him were negative, relatively speaking, among just about every group. My other Carter story comes from a conversation I had a couple years ago with an economist who’s about my age, a man who said that one reason he and his family moved from town A to town B in his metropolitan area was that, in town B, they didn’t feel like they were the only Republicans on their block. Anyway, this guy described himself as a “

4 0.14686349 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”

Introduction: As a statistician I am particularly worried about the rhetorical power of anecdotes (even though I use them in my own reasoning; see discussion below). But much can be learned from a true anecdote. The rough edges—the places where the anecdote doesn’t fit your thesis—these are where you learn. We have recently had a discussion ( here and here ) of Karl Weick, a prominent scholar of business management who plagiarized a story and then went on to draw different lessons from the pilfered anecdote in several different publications published over many years. Setting aside an issues of plagiarism and rulebreaking, I argue that, by hiding the source of the story and changing its form, Weick and his management-science audience are losing their ability to get anything out of it beyond empty confirmation. A full discussion follows. 1. The lost Hungarian soldiers Thomas Basbøll (who has the unusual (to me) job of “writing consultant” at the Copenhagen Business School) has been

5 0.14461258 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010

Introduction: See here for the full story.

6 0.11498825 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.

7 0.1126826 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

8 0.10413805 624 andrew gelman stats-2011-03-22-A question about the economic benefits of universities

9 0.10031901 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

10 0.097358868 2255 andrew gelman stats-2014-03-19-How Americans vote

11 0.085830376 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?

12 0.082207836 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)

13 0.082160048 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

14 0.082118012 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

15 0.078900576 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

16 0.077957809 719 andrew gelman stats-2011-05-19-Everything is Obvious (once you know the answer)

17 0.077246964 1410 andrew gelman stats-2012-07-09-Experimental work on market-based or non-market-based incentives

18 0.075920314 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

19 0.075136304 715 andrew gelman stats-2011-05-16-“It doesn’t matter if you believe in God. What matters is if God believes in you.”

20 0.073372833 258 andrew gelman stats-2010-09-05-A review of a review of a review of a decade


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.184), (1, -0.063), (2, 0.01), (3, 0.017), (4, 0.017), (5, -0.007), (6, 0.042), (7, 0.028), (8, 0.047), (9, -0.019), (10, -0.042), (11, -0.012), (12, 0.019), (13, 0.012), (14, 0.032), (15, 0.016), (16, 0.002), (17, -0.021), (18, 0.075), (19, -0.027), (20, -0.025), (21, -0.017), (22, -0.042), (23, -0.024), (24, -0.011), (25, 0.034), (26, -0.054), (27, 0.036), (28, 0.005), (29, 0.07), (30, 0.049), (31, -0.022), (32, -0.043), (33, 0.035), (34, 0.084), (35, 0.054), (36, -0.04), (37, -0.02), (38, 0.019), (39, 0.019), (40, -0.034), (41, -0.053), (42, 0.005), (43, -0.033), (44, 0.011), (45, 0.005), (46, -0.002), (47, 0.047), (48, -0.046), (49, 0.023)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96252149 2184 andrew gelman stats-2014-01-24-Parables vs. stories

Introduction: God is in every leaf of every tree , but he is not in every leaf of every parable. Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neal, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books: By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial. The moral: We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from

2 0.76570725 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist

Introduction: This one isn’t actually new, but it’s new to me. It involves University of Michigan business school professor Karl Weick. Here’s the relevant paragraph of Weick’s Wikipedia entry (as of 13 Apr 2012): In several published articles, Weick related a story that originally appeared in a poem by Miroslav Holub that was published in the Times Literary Supplement. Weick plagiarized Holub in that he republished the poem (with some minor differences, including removing line breaks and making small changes in a few words) without quotation or attribution. Some of Weick’s articles included the material with no reference to Holub; others referred to Holub but without indicating that Weick had essentially done a direct copy of Holub’s writing. The plagiarism was detailed in an article by Thomas Basbøll and Henrik Graham. [5] In a response, Weick disputed the claim of plagiarism, writing, “By the time I began to see the Alps story as an example of cognition in the path of the action, I had lo

3 0.76056528 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”

Introduction: As a statistician I am particularly worried about the rhetorical power of anecdotes (even though I use them in my own reasoning; see discussion below). But much can be learned from a true anecdote. The rough edges—the places where the anecdote doesn’t fit your thesis—these are where you learn. We have recently had a discussion ( here and here ) of Karl Weick, a prominent scholar of business management who plagiarized a story and then went on to draw different lessons from the pilfered anecdote in several different publications published over many years. Setting aside an issues of plagiarism and rulebreaking, I argue that, by hiding the source of the story and changing its form, Weick and his management-science audience are losing their ability to get anything out of it beyond empty confirmation. A full discussion follows. 1. The lost Hungarian soldiers Thomas Basbøll (who has the unusual (to me) job of “writing consultant” at the Copenhagen Business School) has been

4 0.75943238 197 andrew gelman stats-2010-08-10-The last great essayist?

Introduction: I recently read a bizarre article by Janet Malcolm on a murder trial in NYC. What threw me about the article was that the story was utterly commonplace (by the standards of today’s headlines): divorced mom kills ex-husband in a custody dispute over their four-year-old daughter. The only interesting features were (a) the wife was a doctor and the husband were a dentist, the sort of people you’d expect to sue rather than slay, and (b) the wife hired a hitman from within the insular immigrant community that she (and her husband) belonged to. But, really, neither of these was much of a twist. To add to the non-storyness of it all, there were no other suspects, the evidence against the wife and the hitman was overwhelming, and even the high-paid defense lawyers didn’t seem to be making much of an effort to convince anyone of their client’s innocents. (One of the closing arguments was that one aspect of the wife’s story was so ridiculous that it had to be true. In the lawyer’s wo

5 0.7498883 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

6 0.73686469 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring

7 0.73634213 1457 andrew gelman stats-2012-08-13-Retro ethnic slurs

8 0.72980082 504 andrew gelman stats-2011-01-05-For those of you in the U.K., also an amusing paradox involving the infamous hookah story

9 0.72399348 1789 andrew gelman stats-2013-04-05-Elites have alcohol problems too!

10 0.71752131 203 andrew gelman stats-2010-08-12-John McPhee, the Anti-Malcolm

11 0.71489859 1524 andrew gelman stats-2012-10-07-An (impressive) increase in survival rate from 50% to 60% corresponds to an R-squared of (only) 1%. Counterintuitive, huh?

12 0.71333051 1442 andrew gelman stats-2012-08-03-Double standard? Plagiarizing journos get slammed, plagiarizing profs just shrug it off

13 0.70733702 17 andrew gelman stats-2010-05-05-Taking philosophical arguments literally

14 0.70640796 1653 andrew gelman stats-2013-01-04-Census dotmap

15 0.7058686 2119 andrew gelman stats-2013-12-01-Separated by a common blah blah blah

16 0.70478082 335 andrew gelman stats-2010-10-11-How to think about Lou Dobbs

17 0.70360476 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

18 0.7008599 2341 andrew gelman stats-2014-05-20-plus ça change, plus c’est la même chose

19 0.70003676 2229 andrew gelman stats-2014-02-28-God-leaf-tree

20 0.69861788 2026 andrew gelman stats-2013-09-16-He’s adult entertainer, Child educator, King of the crossfader, He’s the greatest of the greater, He’s a big bad wolf in your neighborhood, Not bad meaning bad but bad meaning good


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.022), (10, 0.018), (16, 0.087), (21, 0.026), (22, 0.018), (24, 0.111), (27, 0.033), (44, 0.016), (46, 0.057), (48, 0.013), (52, 0.011), (63, 0.028), (69, 0.016), (73, 0.013), (81, 0.026), (86, 0.106), (99, 0.268)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9580512 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”

Introduction: As a statistician I am particularly worried about the rhetorical power of anecdotes (even though I use them in my own reasoning; see discussion below). But much can be learned from a true anecdote. The rough edges—the places where the anecdote doesn’t fit your thesis—these are where you learn. We have recently had a discussion ( here and here ) of Karl Weick, a prominent scholar of business management who plagiarized a story and then went on to draw different lessons from the pilfered anecdote in several different publications published over many years. Setting aside an issues of plagiarism and rulebreaking, I argue that, by hiding the source of the story and changing its form, Weick and his management-science audience are losing their ability to get anything out of it beyond empty confirmation. A full discussion follows. 1. The lost Hungarian soldiers Thomas Basbøll (who has the unusual (to me) job of “writing consultant” at the Copenhagen Business School) has been

same-blog 2 0.95626932 2184 andrew gelman stats-2014-01-24-Parables vs. stories

Introduction: God is in every leaf of every tree , but he is not in every leaf of every parable. Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neal, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books: By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial. The moral: We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from

3 0.9538433 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

Introduction: In an article provocatively entitled, “Will Ohio State’s football team decide who wins the White House?”, Tyler Cowen and Kevin Grier report : It is statistically possible that the outcome of a handful of college football games in the right battleground states could determine the race for the White House. Economists Andrew Healy, Neil Malhotra, and Cecilia Mo make this argument in a fascinating article in the Proceedings of the National Academy of Science. They examined whether the outcomes of college football games on the eve of elections for presidents, senators, and governors affected the choices voters made. They found that a win by the local team, in the week before an election, raises the vote going to the incumbent by around 1.5 percentage points. When it comes to the 20 highest attendance teams—big athletic programs like the University of Michigan, Oklahoma, and Southern Cal—a victory on the eve of an election pushes the vote for the incumbent up by 3 percentage points. T

4 0.95257288 2260 andrew gelman stats-2014-03-22-Postdoc at Rennes on multilevel missing data imputation

Introduction: Julie Josse sends along this job announcement: A post-doctoral position is available in the applied mathematics department of Agrocampus Rennes. The postdoc will be funded by the Henri Lebesgue Center (see http://www.lebesgue.fr/) if the application is selected. Applicants are expected to send their application before 31 March 2014. The research focus is on development of new methods to deal with missing values and their implementation in the free R software to make them available. We study new multiple imputation methods based on principal component methods. Different aspects are expected to be covered: dealing with missing values in multi-blocks, multi-groups data (groups of individuals and variables); regularization in this framework using a Bayesian approach, dealing with different types of data (continuous, categoricals, etc.). Fields of application are wide and include biological data as well as socio-economic data. Key words: missing values, matrix completion, PCA, B

5 0.95203972 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist

Introduction: This one isn’t actually new, but it’s new to me. It involves University of Michigan business school professor Karl Weick. Here’s the relevant paragraph of Weick’s Wikipedia entry (as of 13 Apr 2012): In several published articles, Weick related a story that originally appeared in a poem by Miroslav Holub that was published in the Times Literary Supplement. Weick plagiarized Holub in that he republished the poem (with some minor differences, including removing line breaks and making small changes in a few words) without quotation or attribution. Some of Weick’s articles included the material with no reference to Holub; others referred to Holub but without indicating that Weick had essentially done a direct copy of Holub’s writing. The plagiarism was detailed in an article by Thomas Basbøll and Henrik Graham. [5] In a response, Weick disputed the claim of plagiarism, writing, “By the time I began to see the Alps story as an example of cognition in the path of the action, I had lo

6 0.95082951 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis

7 0.95038152 2058 andrew gelman stats-2013-10-11-Gladwell and Chabris, David and Goliath, and science writing as stone soup

8 0.94928038 1971 andrew gelman stats-2013-08-07-I doubt they cheated

9 0.94672942 276 andrew gelman stats-2010-09-14-Don’t look at just one poll number–unless you really know what you’re doing!

10 0.94638801 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

11 0.9460724 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update

12 0.9450171 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?

13 0.94490552 2057 andrew gelman stats-2013-10-10-Chris Chabris is irritated by Malcolm Gladwell

14 0.94490206 866 andrew gelman stats-2011-08-23-Participate in a research project on combining information for prediction

15 0.94395459 1327 andrew gelman stats-2012-05-18-Comments on “A Bayesian approach to complex clinical diagnoses: a case-study in child abuse”

16 0.94335967 1777 andrew gelman stats-2013-03-26-Data Science for Social Good summer fellowship program

17 0.942532 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

18 0.9424504 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

19 0.94074965 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!

20 0.94006157 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”