andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-940 knowledge-graph by maker-knowledge-mining

940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.


meta infos for this blog

Source: html

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. [sent-1, score-0.246]

2 And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. [sent-4, score-0.304]

3 Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. [sent-5, score-0.799]

4 While appeals often unmask shaky evidence, this was different. [sent-6, score-0.14]

5 The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. [sent-8, score-0.51]

6 In the last four years he has been an expert witness in six cases . [sent-14, score-0.181]

7 He claims that the decision in the shoeprint case threatens to damage trials now coming to court because experts like him can no longer use the maths they need. [sent-17, score-0.764]

8 Between 1996 and 2006, for example, Nike distributed 786,000 pairs of trainers. [sent-23, score-0.104]

9 This might suggest a match doesn’t mean very much. [sent-24, score-0.133]

10 But if you take into account that there are 1,200 different sole patterns of Nike trainers and around 42 million pairs of sports shoes sold every year, a matching pair becomes more significant. [sent-25, score-0.986]

11 The data needed to run these kinds of calculations, though, isn’t always available. [sent-26, score-0.093]

12 And this is where the expert in this case came under fire. [sent-27, score-0.197]

13 The judge complained that he couldn’t say exactly how many of one particular type of Nike trainer there are in the country. [sent-28, score-0.379]

14 National sales figures for sports shoes are just rough estimates. [sent-29, score-0.361]

15 And so he decided that Bayes’ theorem shouldn’t again be used unless the underlying statistics are “firm”. [sent-30, score-0.239]

16 It seems reasonable to require that numbers be “firm” if they are to be used in a court case. [sent-32, score-0.213]

17 But, what does the judge recommend doing if the numbers aren’t firm? [sent-33, score-0.229]

18 What if the base rates are firm but the likelihoods are not? [sent-36, score-0.523]

19 Cosma Shalizi writes: In so far as the judge said that Bayes’s rule “shouldn’t … be used unless the underlying statistics are ‘firm’,” he was being entirely reasonable. [sent-39, score-0.468]

20 I agree—as long as the judge recognized the problems with using any statistical method when the underlying statistics are not “firm. [sent-40, score-0.335]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nike', 0.433), ('shoeprint', 0.26), ('trainers', 0.26), ('firm', 0.245), ('judge', 0.229), ('shoes', 0.214), ('pair', 0.203), ('court', 0.156), ('bayes', 0.139), ('match', 0.133), ('sole', 0.116), ('likelihoods', 0.116), ('damage', 0.113), ('expert', 0.112), ('murder', 0.111), ('underlying', 0.106), ('pairs', 0.104), ('kinds', 0.093), ('calculations', 0.09), ('base', 0.09), ('sports', 0.089), ('case', 0.085), ('compounded', 0.079), ('shattering', 0.079), ('queen', 0.079), ('maths', 0.079), ('worn', 0.079), ('trainer', 0.079), ('unless', 0.076), ('rocked', 0.074), ('shouldn', 0.073), ('rates', 0.072), ('complained', 0.071), ('threatens', 0.071), ('norman', 0.071), ('appeals', 0.071), ('witness', 0.069), ('conviction', 0.069), ('clash', 0.069), ('shaky', 0.069), ('convicted', 0.063), ('figuring', 0.062), ('scene', 0.061), ('quiet', 0.061), ('mary', 0.058), ('rough', 0.058), ('mathematician', 0.057), ('used', 0.057), ('hogg', 0.056), ('killer', 0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.

2 0.094940349 589 andrew gelman stats-2011-02-24-On summarizing a noisy scatterplot with a single comparison of two points

Introduction: John Sides discusses how his scatterplot of unionization rates and budget deficits made it onto cable TV news: It’s also interesting to see how he [journalist Chris Hayes] chooses to explain a scatterplot — especially given the evidence that people don’t always understand scatterplots. He compares pairs of cases that don’t illustrate the basic hypothesis of Brooks, Scott Walker, et al. Obviously, such comparisons could be misleading, but given that there was no systematic relationship depicted that graph, these particular comparisons are not. This idea–summarizing a bivariate pattern by comparing pairs of points–reminds me of a well-known statistical identities which I refer to in a paper with David Park: John Sides is certainly correct that if you can pick your pair of points, you can make extremely misleading comparisons. But if you pick every pair of points, and average over them appropriately, you end up with the least-squares regression slope. Pretty cool, and

3 0.090969548 417 andrew gelman stats-2010-11-17-Clutering and variance components

Introduction: Raymond Lim writes: Do you have any recommendations on clustering and binary models? My particular problem is I’m running a firm fixed effect logit and want to cluster by industry-year (every combination of industry-year). My control variable of interest in measured by industry-year and when I cluster by industry-year, the standard errors are 300x larger than when I don’t cluster. Strangely, this problem only occurs when doing logit and not OLS (linear probability). Also, clustering just by field doesn’t blow up the errors. My hunch is it has something to do with the non-nested structure of year, but I don’t understand why this is only problematic under logit and not OLS. My reply: I’d recommend including four multilevel variance parameters, one for firm, one for industry, one for year, and one for industry-year. (In lmer, that’s (1 | firm) + (1 | industry) + (1 | year) + (1 | industry.year)). No need to include (1 | firm.year) since in your data this is the error term. Try

4 0.088214137 1488 andrew gelman stats-2012-09-08-Annals of spam

Introduction: I have to go through the inbox to approve new comments. When I set to auto-approve, I get overwhelmed with spam. As is, I still get spam but it’s manageable. Usually the spam is uninteresting but this one caught my eye: At first this seemed reasonable enough: law firm is desperate for business, spams blogs to raise its Google ranking. But what’s with the writing in the actual comment? It’s incoherent but it doesn’t look computer-generated. My guess is that the law firm in Massachusetts hired a company that promised to raise their Google rankings, and that this company hired some non-English-speaking foreigners to search through the web and write some spam comments. If anyone actually reads the comments, they might get the impression that this law firm is staffed by illiterates . . . but, as we all know, nobody reads blog comments! P.S. I followed the link (sorry!) and came across this: I guess if they’re going to use a tragedy as an excuse to troll for Faceb

5 0.0879509 3 andrew gelman stats-2010-04-26-Bayes in the news…in a somewhat frustrating way

Introduction: I’m not sure how the New York Times defines a blog versus an article, so perhaps this post should be called “Bayes in the blogs.” Whatever. A recent NY Times article/blog post discusses a classic Bayes’ Theorem application — probability that the patient has cancer, given a “positive” mammogram — and purports to give a solution that is easy for students to understand because it doesn’t require Bayes’ Theorem, which is of course complicated and confusing. You can see my comment (#17) here.

6 0.083380118 527 andrew gelman stats-2011-01-20-Cars vs. trucks

7 0.082945153 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

8 0.079702765 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

9 0.078787953 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?

10 0.077633135 330 andrew gelman stats-2010-10-09-What joker put seven dog lice in my Iraqi fez box?

11 0.074003644 213 andrew gelman stats-2010-08-17-Matching at two levels

12 0.073817521 1173 andrew gelman stats-2012-02-17-Sports examples in class

13 0.071213149 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others

14 0.070381895 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

15 0.069360867 197 andrew gelman stats-2010-08-10-The last great essayist?

16 0.068231851 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

17 0.06283728 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions

18 0.06222843 538 andrew gelman stats-2011-01-25-Postdoc Position #2: Hierarchical Modeling and Statistical Graphics

19 0.06176376 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

20 0.060991749 146 andrew gelman stats-2010-07-14-The statistics and the science


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.127), (1, -0.011), (2, -0.003), (3, -0.005), (4, -0.012), (5, -0.002), (6, 0.005), (7, 0.02), (8, 0.008), (9, -0.01), (10, -0.031), (11, -0.014), (12, -0.001), (13, -0.023), (14, -0.024), (15, 0.024), (16, 0.024), (17, 0.013), (18, 0.009), (19, -0.015), (20, -0.002), (21, 0.016), (22, -0.024), (23, 0.007), (24, 0.011), (25, 0.019), (26, -0.035), (27, 0.011), (28, 0.004), (29, -0.016), (30, 0.003), (31, 0.039), (32, 0.009), (33, 0.002), (34, 0.003), (35, 0.006), (36, -0.008), (37, -0.021), (38, 0.016), (39, 0.042), (40, -0.02), (41, -0.034), (42, -0.025), (43, -0.012), (44, -0.004), (45, -0.006), (46, -0.014), (47, -0.012), (48, 0.026), (49, 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9648267 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.

2 0.74447125 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense

Introduction: This post is by Phil Price. An Oregon legislator, Mitch Greenlick, has proposed to make it illegal in Oregon to carry a child under six years old on one’s bike (including in a child seat) or in a bike trailer. The guy says “”We’ve just done a study showing that 30 percent of riders biking to work at least three days a week have some sort of crash that leads to an injury… When that’s going on out there, what happens when you have a four year old on the back of a bike?” The study is from Oregon Health Sciences University, at which the legislator is a professor. Greenlick also says “”If it’s true that it’s unsafe, we have an obligation to protect people. If I thought a law would save one child’s life, I would step in and do it. Wouldn’t you?” There are two statistical issues here. The first is in the category of “lies, damn lies, and statistics,” and involves the statement about how many riders have injuries. As quoted on a blog , the author of the study in question says th

3 0.7387408 1942 andrew gelman stats-2013-07-17-“Stop and frisk” statistics

Introduction: Washington Post columnist Richard Cohen brings up one of my research topics: In New York City, blacks make up a quarter of the population, yet they represent 78 percent of all shooting suspects — almost all of them young men. We know them from the nightly news. Those statistics represent the justification for New York City’s controversial stop-and-frisk program, which amounts to racial profiling writ large. After all, if young black males are your shooters, then it ought to be young black males whom the police stop and frisk. I have two comments on this. First, my research with Jeff Fagan and Alex Kiss (based on data from the late 1990s, so maybe things have changed) found that the NYPD was stopping blacks and hispanics at a rate higher than their previous arrest rates: To briefly summarize our findings, blacks and Hispanics represented 51% and 33% of the stops while representing only 26% and 24% of the New York City population. Compared with the number of arrests of

4 0.73530567 69 andrew gelman stats-2010-06-04-A Wikipedia whitewash

Introduction: After hearing a few times about the divorce predictions of researchers John Gottman and James Murray (work that was featured in Blink with a claim that they could predict with 83 percent accuracy whether a couple would be divorced–after meeting with them for 15 minutes) and feeling some skepticism , I decided to do the Lord’s work and amend Gottman’s wikipedia entry, which had a paragraph saying: Gottman found his methodology predicts with 90% accuracy which newlywed couples will remain married and which will divorce four to six years later. It is also 81% percent accurate in predicting which marriages will survive after seven to nine years. I added the following: Gottman’s claim of 81% or 90% accuracy is misleading, however, because the accuracy is measured only after fitting a model to his data. There is no evidence that he can predict the outcome of a marriage with high accuracy in advance. As Laurie Abraham writes, “For the 1998 study, which focused on videotapes of 57

5 0.73481828 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves

Introduction: David Hogg sends in this bizarre bit of news reporting by Robert Evans: Until now, in the four decades since it was first posited, no one has convincingly claimed to have glimpsed the Higgs Boson, let alone proved that it actually exists. At an eagerly awaited briefing on Tuesday at the CERN research centre near Geneva, two independent teams of “Higgs Hunters” – a term they themselves hate – were widely expected to suggest they were fairly confident they had spotted it. But not confident enough, in the physics world of ultra-precision where certainty has to be measured at nothing less than 100 percent, to announce “a discovery.” In the jargon, this level is described as 5 sigma . . . So far, so good. But then comes this doozy: As one scientist explained, that level of accuracy would equate to the 17th-century discoverer of gravity, Isaac Newton, sitting under his apple tree and a million apples one after another falling on his head without one missing. Huh? A free

6 0.7318998 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

7 0.73152357 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?

8 0.72100031 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?

9 0.7183522 2354 andrew gelman stats-2014-05-30-Mmm, statistical significance . . . Evilicious!

10 0.71259916 1541 andrew gelman stats-2012-10-19-Statistical discrimination again

11 0.7094121 137 andrew gelman stats-2010-07-10-Cost of communicating numbers

12 0.70896578 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

13 0.70646858 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!

14 0.70570999 504 andrew gelman stats-2011-01-05-For those of you in the U.K., also an amusing paradox involving the infamous hookah story

15 0.70557886 1906 andrew gelman stats-2013-06-19-“Behind a cancer-treatment firm’s rosy survival claims”

16 0.70146447 1789 andrew gelman stats-2013-04-05-Elites have alcohol problems too!

17 0.69745106 1525 andrew gelman stats-2012-10-08-Ethical standards in different data communities

18 0.6966067 1822 andrew gelman stats-2013-04-24-Samurai sword-wielding Mormon bishop pharmaceutical statistician stops mugger

19 0.69093704 2053 andrew gelman stats-2013-10-06-Ideas that spread fast and slow

20 0.69067502 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.018), (16, 0.068), (21, 0.012), (24, 0.123), (51, 0.011), (63, 0.025), (65, 0.012), (82, 0.304), (84, 0.02), (85, 0.013), (86, 0.01), (98, 0.021), (99, 0.227)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95038861 1772 andrew gelman stats-2013-03-20-Stan at Google this Thurs and at Berkeley this Fri noon

Introduction: Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff. And he’ll be showing the whirlpool movie!

same-blog 2 0.91975313 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.

3 0.91625583 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm

Introduction: Michael Betancourt will be speaking at UCLA: The location for refreshment is in room 51-254 CHS at 3:00 PM. The place for the seminar is at CHS 33-105A at 3:30pm – 4:30pm, Wed 6 Mar. ["CHS" stands for Center for Health Sciences, the building of the UCLA schools of medicine and public health. Here's a map with directions .] Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff.

4 0.90967894 335 andrew gelman stats-2010-10-11-How to think about Lou Dobbs

Introduction: I was unsurprised to read that Lou Dobbs, the former CNN host who crusaded against illegal immigrants, had actually hired a bunch of them himself to maintain his large house and his horse farm. (OK, I have to admit I was surprised by the part about the horse farm.) But I think most of the reactions to this story missed the point. Isabel Macdonald’s article that broke the story was entitled, “Lou Dobbs, American Hypocrite,” and most of the discussion went from there, with some commenters piling on Dobbs and others defending him by saying that Dobbs hired his laborers through contractors and may not have known they were in the country illegally. To me, though, the key issue is slightly different. And Macdonald’s story is relevant whether or not Dobbs knew he was hiring illegals. My point is not that Dobbs is a bad guy, or a hypocrite, or whatever. My point is that, in his setting, it would take an extraordinary effort to not hire illegal immigrants to take care of his house

5 0.90036339 178 andrew gelman stats-2010-08-03-(Partisan) visualization of health care legislation

Introduction: Congressman Kevin Brady from Texas distributes this visualization of reformed health care in the US (click for a bigger picture): Here’s a PDF at Brady’s page, and a local copy of it. Complexity has its costs. Beyond the cost of writing it, learning it, following it, there’s also the cost of checking it. John Walker has some funny examples of what’s hidden in the almost 8000 pages of IRS code. Text mining and applied statistics will solve all that, hopefully. Anyone interested in developing a pork detection system for the legislation? Or an analysis of how much entropy to the legal code did each congressman contribute? There are already spin detectors , that help you detect whether the writer is a Democrat (“stimulus”, “health care”) or a Republican (“deficit spending”, “ObamaCare”). D+0.1: Jared Lander points to versions by Rep. Boehner and Robert Palmer .

6 0.89001405 359 andrew gelman stats-2010-10-21-Applied Statistics Center miniconference: Statistical sampling in developing countries

7 0.85689461 340 andrew gelman stats-2010-10-13-Randomized experiments, non-randomized experiments, and observational studies

8 0.84831142 699 andrew gelman stats-2011-05-06-Another stereotype demolished

9 0.84794772 1958 andrew gelman stats-2013-07-27-Teaching is hard

10 0.84323621 1440 andrew gelman stats-2012-08-02-“A Christmas Carol” as applied to plagiarism

11 0.8321023 1488 andrew gelman stats-2012-09-08-Annals of spam

12 0.83090883 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

13 0.81029695 193 andrew gelman stats-2010-08-09-Besag

14 0.80244666 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

15 0.79755354 366 andrew gelman stats-2010-10-24-Mankiw tax update

16 0.79179657 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study

17 0.79148149 1553 andrew gelman stats-2012-10-30-Real rothko, fake rothko

18 0.79009187 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform

19 0.7872299 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

20 0.78272778 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”