andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2249 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This would make Jean Piaget very happy: CenturyLink Arena in Boise, also home to the Idaho Stampede of the NBA’s D-League, is facing a potential class-action lawsuit from four fans, alleging that the arena management company defrauded fans by offering taller-but-thinner large-size cups that hold the same 16 ounces as the shorter, wider small. “While different shapes, both cup sizes hold substantially the same amount of liquid and are not large versus small in actual capacity,” the group’s attorney, Wyatt Johnson, wrote in the lawsuit. . . .
sentIndex sentText sentNum sentScore
1 “While different shapes, both cup sizes hold substantially the same amount of liquid and are not large versus small in actual capacity,” the group’s attorney, Wyatt Johnson, wrote in the lawsuit. [sent-2, score-1.344]
wordName wordTfidf (topN-words)
[('arena', 0.352), ('fans', 0.302), ('stampede', 0.208), ('liquid', 0.208), ('alleging', 0.208), ('boise', 0.208), ('hold', 0.208), ('cups', 0.197), ('attorney', 0.197), ('idaho', 0.188), ('shapes', 0.181), ('lawsuit', 0.181), ('nba', 0.181), ('wyatt', 0.181), ('jean', 0.176), ('cup', 0.172), ('facing', 0.156), ('shorter', 0.151), ('capacity', 0.151), ('wider', 0.139), ('offering', 0.133), ('substantially', 0.132), ('johnson', 0.13), ('management', 0.124), ('versus', 0.121), ('sizes', 0.107), ('company', 0.105), ('home', 0.101), ('amount', 0.095), ('four', 0.092), ('happy', 0.084), ('actual', 0.084), ('potential', 0.083), ('group', 0.075), ('large', 0.062), ('small', 0.062), ('wrote', 0.052), ('different', 0.041), ('make', 0.034), ('also', 0.027), ('would', 0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 2249 andrew gelman stats-2014-03-15-Recently in the sister blog
Introduction: This would make Jean Piaget very happy: CenturyLink Arena in Boise, also home to the Idaho Stampede of the NBA’s D-League, is facing a potential class-action lawsuit from four fans, alleging that the arena management company defrauded fans by offering taller-but-thinner large-size cups that hold the same 16 ounces as the shorter, wider small. “While different shapes, both cup sizes hold substantially the same amount of liquid and are not large versus small in actual capacity,” the group’s attorney, Wyatt Johnson, wrote in the lawsuit. . . .
2 0.082111731 703 andrew gelman stats-2011-05-10-Bringing Causal Models Into the Mainstream
Introduction: John Johnson writes at the Statistics Forum.
3 0.067724489 963 andrew gelman stats-2011-10-18-Question on Type M errors
Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain
4 0.061771646 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event
Introduction: Todd Schneider writes: Apropos of your recent blog post about modeling score differential of basketball games , I thought you might enjoy a site I built, gambletron2000.com , that gathers real-time win probabilities from betting markets for most major sports (including NBA and college basketball). My original goal was to use the variance of changes in win probabilities to quantify which games were the most exciting, but I got a bit carried away and ended up pursuing a bunch of other ideas, which you can read about in the full writeup here This particular passage from the anonymous someone in your post: My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. reminded me of a graph I made, which shows the mean-reverting tendency of N
Introduction: Thomas Lumley writes : The Herald has a story about hazards of coffee. The picture caption says Men who drink more than four cups a day are 56 per cent more likely to die. which is obviously not true: deaths, as we’ve observed before, are fixed at one per customer. The story says It’s not that people are dying at a rapid rate. But men who drink more than four cups a day are 56 per cent more likely to die and women have double the chance compared with moderate drinkers, according to the The University of Queensland and the University of South Carolina study. What the study actually reported was rates of death: over an average of 17 years, men who drink more than four cups a day died at about a 21% higher rate, with little evidence of any difference in men. After they considered only men and women under 55 (which they don’t say was something they had planned to do), and attempted to control for a whole bunch of other factors, the rate increase went to 56% for me
6 0.058423229 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards
7 0.053408906 1804 andrew gelman stats-2013-04-15-How effective are football coaches?
8 0.049923077 668 andrew gelman stats-2011-04-19-The free cup and the extra dollar: A speculation in philosophy
9 0.048574038 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
10 0.047547411 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
11 0.043894053 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents
12 0.043588687 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?
13 0.041154183 29 andrew gelman stats-2010-05-12-Probability of successive wins in baseball
14 0.04082739 1790 andrew gelman stats-2013-04-06-Calling Jenny Davidson . . .
15 0.040477987 2274 andrew gelman stats-2014-03-30-Adjudicating between alternative interpretations of a statistical interaction?
16 0.0400635 1885 andrew gelman stats-2013-06-06-Leahy Versus Albedoman and the Moneygoround, Part One
17 0.04001696 1541 andrew gelman stats-2012-10-19-Statistical discrimination again
18 0.038926963 1099 andrew gelman stats-2012-01-05-Approaching harmonic convergence
19 0.037803642 1559 andrew gelman stats-2012-11-02-The blog is back
20 0.037469193 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking
topicId topicWeight
[(0, 0.041), (1, -0.008), (2, 0.009), (3, -0.018), (4, 0.007), (5, 0.004), (6, 0.005), (7, 0.002), (8, -0.012), (9, 0.003), (10, -0.031), (11, -0.011), (12, 0.009), (13, -0.008), (14, -0.003), (15, 0.018), (16, -0.002), (17, 0.008), (18, 0.016), (19, 0.014), (20, -0.017), (21, 0.021), (22, -0.002), (23, -0.007), (24, -0.002), (25, 0.01), (26, -0.006), (27, 0.001), (28, 0.007), (29, -0.017), (30, 0.005), (31, 0.004), (32, 0.005), (33, 0.002), (34, -0.007), (35, 0.019), (36, 0.006), (37, 0.005), (38, 0.002), (39, 0.012), (40, -0.004), (41, -0.02), (42, -0.018), (43, 0.01), (44, -0.014), (45, -0.024), (46, -0.001), (47, -0.01), (48, 0.011), (49, 0.005)]
simIndex simValue blogId blogTitle
same-blog 1 0.9272753 2249 andrew gelman stats-2014-03-15-Recently in the sister blog
Introduction: This would make Jean Piaget very happy: CenturyLink Arena in Boise, also home to the Idaho Stampede of the NBA’s D-League, is facing a potential class-action lawsuit from four fans, alleging that the arena management company defrauded fans by offering taller-but-thinner large-size cups that hold the same 16 ounces as the shorter, wider small. “While different shapes, both cup sizes hold substantially the same amount of liquid and are not large versus small in actual capacity,” the group’s attorney, Wyatt Johnson, wrote in the lawsuit. . . .
2 0.68324655 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking
Introduction: People keep pointing me to this excellent news article by David Brown, about a scientist who was convicted of data manipulation: In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either. Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. . . . Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statis
Introduction: Thomas Lumley writes : The Herald has a story about hazards of coffee. The picture caption says Men who drink more than four cups a day are 56 per cent more likely to die. which is obviously not true: deaths, as we’ve observed before, are fixed at one per customer. The story says It’s not that people are dying at a rapid rate. But men who drink more than four cups a day are 56 per cent more likely to die and women have double the chance compared with moderate drinkers, according to the The University of Queensland and the University of South Carolina study. What the study actually reported was rates of death: over an average of 17 years, men who drink more than four cups a day died at about a 21% higher rate, with little evidence of any difference in men. After they considered only men and women under 55 (which they don’t say was something they had planned to do), and attempted to control for a whole bunch of other factors, the rate increase went to 56% for me
4 0.6327371 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
Introduction: Robin Hanson writes: On the criteria of potential to help people avoid death, this would seem to be among the most important news I’ve ever heard. [In his recent Ph.D. thesis , Ken Lee finds that] death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income combined ! Now for the details. The US Department of Labor has described each of 807 occupations with over 200 detailed features on how jobs are done, skills required, etc.. Lee looked at seven domains of such features, each containing 16 to 57 features, and for each domain Lee did a factor analysis of those features to find the top 2-4 factors. This gave Lee a total of 22 domain factors. Lee also found four overall factors to describe his total set of 225 job and 9 demographic features. (These four factors explain 32%, 15%, 7%, and 4% of total variance.) Lee then tried to use these 26 job factors, along with his other standard predictors (age, race, gender, m
5 0.62979192 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.
Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.
6 0.62194413 1906 andrew gelman stats-2013-06-19-“Behind a cancer-treatment firm’s rosy survival claims”
7 0.61798352 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper
9 0.61350018 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
10 0.60137159 1427 andrew gelman stats-2012-07-24-More from the sister blog
12 0.5971998 156 andrew gelman stats-2010-07-20-Burglars are local
13 0.59159726 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?
14 0.5892784 527 andrew gelman stats-2011-01-20-Cars vs. trucks
16 0.58199269 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!
17 0.57889438 1893 andrew gelman stats-2013-06-11-Folic acid and autism
18 0.56789333 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
19 0.56380725 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.
20 0.56336182 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
topicId topicWeight
[(6, 0.071), (10, 0.031), (16, 0.076), (24, 0.064), (30, 0.029), (32, 0.029), (34, 0.021), (45, 0.031), (51, 0.033), (58, 0.029), (62, 0.027), (63, 0.13), (64, 0.057), (73, 0.059), (81, 0.062), (99, 0.13)]
simIndex simValue blogId blogTitle
same-blog 1 0.94343859 2249 andrew gelman stats-2014-03-15-Recently in the sister blog
Introduction: This would make Jean Piaget very happy: CenturyLink Arena in Boise, also home to the Idaho Stampede of the NBA’s D-League, is facing a potential class-action lawsuit from four fans, alleging that the arena management company defrauded fans by offering taller-but-thinner large-size cups that hold the same 16 ounces as the shorter, wider small. “While different shapes, both cup sizes hold substantially the same amount of liquid and are not large versus small in actual capacity,” the group’s attorney, Wyatt Johnson, wrote in the lawsuit. . . .
2 0.78880227 739 andrew gelman stats-2011-05-31-When Did Girls Start Wearing Pink?
Introduction: That cute picture is of toddler FDR in a dress, from 1884. Jeanne Maglaty writes : A Ladies’ Home Journal article [or maybe from a different source, according to a commenter] in June 1918 said, “The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.” Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti. In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene’s told parents to dress boys in pink. So did Best & Co. in New York City, Halle’s in Cleveland and Marshall Field in Chicago. Today’s color dictate wasn’t established until the 1940s . . . When the women’s liberation movement arrived in the mid-1960s, w
3 0.77684855 313 andrew gelman stats-2010-10-03-A question for psychometricians
Introduction: Don Coffin writes: A colleague of mine and I are doing a presentation for new faculty on a number of topics related to teaching. Our charge is to identify interesting issues and to find research-based information for them about how to approach things. So, what I wondered is, do you know of any published research dealing with the sort of issues about structuring a course and final exam in the ways you talk about in this blog post ? Some poking around in the usual places hasn’t turned anything up yet. I don’t really know the psychometrics literature but I imagine that some good stuff has been written on principles of test design. There are probably some good papers from back in the 1920s. Can anyone supply some references?
4 0.76032615 1621 andrew gelman stats-2012-12-13-Puzzles of criminal justice
Introduction: Four recent news stories about crime and punishment made me realize, yet again, how little I understand all this. 1. “HSBC to Pay $1.92 Billion to Settle Charges of Money Laundering” : State and federal authorities decided against indicting HSBC in a money-laundering case over concerns that criminal charges could jeopardize one of the world’s largest banks and ultimately destabilize the global financial system. Instead, HSBC announced on Tuesday that it had agreed to a record $1.92 billion settlement with authorities. . . . I don’t understand this idea of punishing the institution. I have the same problem when the NCAA punishes a college football program. These are individual people breaking the law (or the rules), right? So why not punish them directly? Giving 40 lashes to a bunch of HSBC executives and garnisheeing their salaries for life, say, that wouldn’t destabilize the global financial system would it? From the article: “A money-laundering indictment, or a guilt
Introduction: When predicting 0/1 data we can use logit (or probit or robit or some other robust model such as invlogit (0.01 + 0.98*X*beta)). Logit is simple enough and we can use bayesglm to regularize and avoid the problem of separation. What if there are more than 2 categories? If they’re ordered (1, 2, 3, etc), we can do ordered logit (and use bayespolr() to avoid separation). If the categories are unordered (vanilla, chocolate, strawberry), there are unordered multinomial logit and probit models out there. But it’s not so easy to fit these multinomial model in a multilevel setting (with coefficients that vary by group), especially if the computation is embedded in an iterative routine such as mi where you have real time constraints at each step. So this got me wondering whether we could kluge it with logits. Here’s the basic idea (in the ordered and unordered forms): - If you have a variable that goes 1, 2, 3, etc., set up a series of logits: 1 vs. 2,3,…; 2 vs. 3,…; and so forth
6 0.74902469 628 andrew gelman stats-2011-03-25-100-year floods
7 0.74617577 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit
8 0.73672688 1484 andrew gelman stats-2012-09-05-Two exciting movie ideas: “Second Chance U” and “The New Dirty Dozen”
9 0.73551667 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
11 0.73296571 293 andrew gelman stats-2010-09-23-Lowess is great
12 0.73042727 1480 andrew gelman stats-2012-09-02-“If our product is harmful . . . we’ll stop making it.”
13 0.72926581 102 andrew gelman stats-2010-06-21-Why modern art is all in the mind
14 0.72809297 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle
15 0.72137022 745 andrew gelman stats-2011-06-04-High-level intellectual discussions in the Columbia statistics department
16 0.69689655 568 andrew gelman stats-2011-02-11-Calibration in chess
17 0.69543958 1316 andrew gelman stats-2012-05-12-black and Black, white and White
18 0.69037151 221 andrew gelman stats-2010-08-21-Busted!
20 0.6884222 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?