andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-584 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Amy Cohen points me to this blog by Jim Manzi, who writes: Ezra Klein and a variety of other thoughtful liberal bloggers have been pointing to an Economic Policy Institute analysis that they claim demonstrates that Wisconsin’s public employees, even after adjusting for benefits and hours worked, face a ” compensation penalty of 5% for choosing to work in the public sector.” Unfortunately, when you get under the hood, the study shows no such thing. . . . reading the actual paper by Jeffrey H. Keefe is instructive. Keefe took a representative sample of Wisconsin workers, and built a regression model that relates “fundamental personal characteristics and labor market skills” to compensation, and then compared public to private sector employees, after “controlling” for these factors. As far as I can see, the factors adjusted for were: years of education; years of experience; gender; race; ethnicity; disability; size of organization where the employee works; and, hours worked per
sentIndex sentText sentNum sentScore
1 Keefe took a representative sample of Wisconsin workers, and built a regression model that relates “fundamental personal characteristics and labor market skills” to compensation, and then compared public to private sector employees, after “controlling” for these factors. [sent-8, score-0.592]
2 As far as I can see, the factors adjusted for were: years of education; years of experience; gender; race; ethnicity; disability; size of organization where the employee works; and, hours worked per year. [sent-9, score-0.311]
3 Stripped of jargon, what Keefe asserts is that, on average, any two individuals with identical scores on each of these listed characteristics “should” be paid the same amount. [sent-10, score-0.182]
4 Manzi concludes: The whole question – as is obvious even to untrained observers – is whether or not there are material systematic differences between the public and private employee that are not captured by the list of coefficients in his regression model. [sent-15, score-0.534]
5 I don’t know if Wisconsin’s public employees are underpaid, overpaid, or paid just right. [sent-17, score-0.587]
6 But I don’t think this sort of study is completely useless either. [sent-20, score-0.147]
7 (Just to be clear: I haven’t actually followed the link to read Keefe’s report, so in writing about this study, I’m really writing about this study as described by Manzi. [sent-21, score-0.249]
8 ) From one perspective, sure, I agree that a statistical analysis of the sort described above based on observational data can never be a true direct comparison. [sent-22, score-0.168]
9 (Not to mention the difficulty of classifying people like me who work in the quasi-public sector. [sent-23, score-0.108]
10 ) But if you take things from the other direction, this sort of study can be valuable. [sent-24, score-0.147]
11 I mean, suppose you start, as people do, with raw numbers: Salary plus benefits = X% of the state budget. [sent-26, score-0.204]
12 Then you start adjusting for hours worked, ages of the employees, etc etc, and . [sent-29, score-0.239]
13 And once you start to compare, it makes sense to try to compare comparable cases. [sent-35, score-0.224]
14 Taking Manzi’s criticism too strongly would leave us in the position of allowing raw numbers, and allowing pure unblemished randomized experiments, but nothing in between. [sent-36, score-0.293]
15 Regressions of observational data can be a good way of going beyond raw comparisons and averages. [sent-40, score-0.158]
16 Some of this discussion reminds me of the literature on the wage premium for risk, where people run regressions on salaries for comparable jobs in order to estimate how much people need to be paid to risk death or injury. [sent-41, score-0.581]
17 With care, you can get those regressions to give reasonable coefficients in the range of $1 million per life, but I don’t really see these numbers as meaning anything at all; they’re just the results of fiddling with the models until something reasonable comes out. [sent-46, score-0.455]
18 I’m not saying that the people who do these analyses are cheating, just that they want reasonable results but the models seem too open-ended to be a good measure of risk premiums. [sent-47, score-0.193]
19 Ezra Klein replies , agreeing with Manzi’s statistical critique but writing that “the burden of proof is on those who say Wisconsin’s public employees make too much money. [sent-50, score-0.686]
20 ” I’m sure people can disagree about where the burden of proof should fall, but I think Klein’s point is similar to mine, that if you want to claim that public employees are overpaid, that claim will start with a comparison of some sort, and then you have to go from there. [sent-51, score-0.882]
wordName wordTfidf (topN-words)
[('manzi', 0.437), ('keefe', 0.374), ('employees', 0.329), ('wisconsin', 0.219), ('sector', 0.187), ('klein', 0.161), ('public', 0.146), ('compensation', 0.137), ('private', 0.134), ('overpaid', 0.114), ('paid', 0.112), ('ezra', 0.102), ('comparable', 0.097), ('study', 0.095), ('regressions', 0.094), ('raw', 0.094), ('jobs', 0.09), ('burden', 0.086), ('hours', 0.086), ('employee', 0.085), ('risk', 0.08), ('adjusting', 0.078), ('penalty', 0.077), ('start', 0.075), ('proof', 0.074), ('workers', 0.071), ('allowing', 0.071), ('characteristics', 0.07), ('per', 0.07), ('worked', 0.07), ('observational', 0.064), ('reasonable', 0.059), ('claim', 0.059), ('numbers', 0.059), ('average', 0.058), ('coefficients', 0.057), ('fiddling', 0.057), ('hood', 0.057), ('pasting', 0.057), ('unblemished', 0.057), ('untrained', 0.057), ('benefits', 0.056), ('regression', 0.055), ('people', 0.054), ('difficulty', 0.054), ('disability', 0.054), ('described', 0.052), ('compare', 0.052), ('sort', 0.052), ('writing', 0.051)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 584 andrew gelman stats-2011-02-22-“Are Wisconsin Public Employees Underpaid?”
Introduction: Amy Cohen points me to this blog by Jim Manzi, who writes: Ezra Klein and a variety of other thoughtful liberal bloggers have been pointing to an Economic Policy Institute analysis that they claim demonstrates that Wisconsin’s public employees, even after adjusting for benefits and hours worked, face a ” compensation penalty of 5% for choosing to work in the public sector.” Unfortunately, when you get under the hood, the study shows no such thing. . . . reading the actual paper by Jeffrey H. Keefe is instructive. Keefe took a representative sample of Wisconsin workers, and built a regression model that relates “fundamental personal characteristics and labor market skills” to compensation, and then compared public to private sector employees, after “controlling” for these factors. As far as I can see, the factors adjusted for were: years of education; years of experience; gender; race; ethnicity; disability; size of organization where the employee works; and, hours worked per
2 0.17781399 732 andrew gelman stats-2011-05-26-What Do We Learn from Narrow Randomized Studies?
Introduction: Under the headline, “A Raise Won’t Make You Work Harder,” Ray Fisman writes : To understand why it might be a bad idea to cut wages in recessions, it’s useful to know how workers respond to changes in pay–both positive and negative changes. Discussion on the topic goes back at least as far as Henry Ford’s “5 dollars a day,” which he paid to assembly line workers in 1914. The policy was revolutionary at the time, as the wages were more than double what his competitors were paying. This wasn’t charity. Higher-paid workers were efficient workers–Ford attracted the best mechanics to his plant, and the high pay ensured that employees worked hard throughout their eight-hour shifts, knowing that if their pace slackened, they’d be out of a job. Raising salaries to boost productivity became known as “efficiency wages.” So far, so good. Fisman then moves from history and theory to recent research: How much gift exchange really matters to American bosses and workers remained largely a
3 0.13046366 1385 andrew gelman stats-2012-06-20-Reconciling different claims about working-class voters
Introduction: After our discussions of psychologist Jonathan Haidt’s opinions about working-class voters (see here and here ), a question arose on how to reconcile the analyses of Alan Abramowitz and Tom Edsall (showing an increase in Republican voting among low-education working white southerners), with Larry Bartels’s finding that “there has been no discernible trend in presidential voting behavior among the ‘working white working class.’” Here is my resolution: All the statistics that have been posted seem reasonable to me. Also relevant to the discussion, I believe, are Figures 3.1, 4.2b, 10.1, and 10.2 of Red State Blue State. In short: Republicans continue to do about 20 percentage points better among upper-income voters compared to lower-income, but the compositions of these coalitions have changed over time. As has been noted, low-education white workers have moved toward the Republican party over the past few decades, and at the same time there have been compositional changes
4 0.12524013 140 andrew gelman stats-2010-07-10-SeeThroughNY
Introduction: From Ira Stoll , a link to this cool data site , courtesy of the Manhattan Institute, with all sorts of state budget information including the salaries of all city and state employees.
Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l
6 0.099229813 962 andrew gelman stats-2011-10-17-Death!
7 0.092122436 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)
8 0.087637119 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor
9 0.085470781 2255 andrew gelman stats-2014-03-19-How Americans vote
10 0.084985934 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
11 0.083027862 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients
12 0.082455739 1605 andrew gelman stats-2012-12-04-Write This Book
13 0.077495404 93 andrew gelman stats-2010-06-17-My proposal for making college admissions fairer
14 0.075768068 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools
15 0.075512864 2220 andrew gelman stats-2014-02-22-Quickies
16 0.075278059 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
17 0.074480884 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
18 0.074323803 164 andrew gelman stats-2010-07-26-A very short story
20 0.072889999 2222 andrew gelman stats-2014-02-24-On deck this week
topicId topicWeight
[(0, 0.178), (1, -0.043), (2, 0.055), (3, -0.032), (4, 0.023), (5, 0.0), (6, 0.012), (7, 0.015), (8, -0.002), (9, 0.01), (10, -0.033), (11, 0.003), (12, 0.007), (13, 0.017), (14, 0.015), (15, 0.029), (16, 0.031), (17, 0.006), (18, 0.002), (19, -0.004), (20, 0.006), (21, 0.027), (22, -0.023), (23, -0.001), (24, 0.006), (25, 0.01), (26, 0.006), (27, -0.021), (28, -0.005), (29, 0.038), (30, -0.042), (31, -0.01), (32, 0.047), (33, 0.016), (34, 0.007), (35, -0.003), (36, 0.037), (37, 0.015), (38, -0.005), (39, -0.024), (40, 0.055), (41, -0.01), (42, -0.057), (43, 0.012), (44, 0.022), (45, -0.037), (46, -0.037), (47, 0.019), (48, -0.043), (49, -0.023)]
simIndex simValue blogId blogTitle
same-blog 1 0.96053153 584 andrew gelman stats-2011-02-22-“Are Wisconsin Public Employees Underpaid?”
Introduction: Amy Cohen points me to this blog by Jim Manzi, who writes: Ezra Klein and a variety of other thoughtful liberal bloggers have been pointing to an Economic Policy Institute analysis that they claim demonstrates that Wisconsin’s public employees, even after adjusting for benefits and hours worked, face a ” compensation penalty of 5% for choosing to work in the public sector.” Unfortunately, when you get under the hood, the study shows no such thing. . . . reading the actual paper by Jeffrey H. Keefe is instructive. Keefe took a representative sample of Wisconsin workers, and built a regression model that relates “fundamental personal characteristics and labor market skills” to compensation, and then compared public to private sector employees, after “controlling” for these factors. As far as I can see, the factors adjusted for were: years of education; years of experience; gender; race; ethnicity; disability; size of organization where the employee works; and, hours worked per
2 0.81411153 646 andrew gelman stats-2011-04-04-Graphical insights into the safety of cycling.
Introduction: This article by Thomas Crag, at Copenhagenize, is marred by reliance on old data, but it’s so full of informative graphical displays — most of them not made by the author, I think — that it’s hard to pick just one. But here ya go. This figure shows fatalities (among cyclists) versus distance cycled, with a point for each year…unfortunately ending in way back in 1998, but still: This is a good alternative to the more common choice for this sort of plot, which would be overlaying curves of fatalities vs time and distance cycled vs time. The article also explicitly discusses the fact, previously discussed on this blog , that it’s misleading, to the point of being wrong in most contexts, to compare the safety of walking vs cycling vs driving by looking at the casualty or fatality rate per kilometer . Often, as in this article, the question of interest is something like, if more people switched from driving to cycling, how many more or fewer people would die? Obviously, if peo
3 0.81372756 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study
Introduction: Hank Aaron at the Brookings Institution, who knows a lot more about policy than I do, had some interesting comments on the recent New York Times article about problems with the Dartmouth health care atlas. which I discussed a few hours ago . Aaron writes that much of the criticism in that newspaper article was off-base, but that there are real difficulties in translating the Dartmouth results (finding little relation between spending and quality of care) to cost savings in the real world. Aaron writes: The Dartmouth research, showing huge variation in the use of various medical procedures and large variations in per patient spending under Medicare, has been a revelation and a useful one. There is no way to explain such variation on medical grounds and it is problematic. But readers, including my former colleague Orszag, have taken an oversimplistic view of what the numbers mean and what to do about them. There are three really big problems with the common interpreta
Introduction: Reed Abelson and Gardiner Harris report in the New York Times that some serious statistical questions have been raised about the Dartmouth Atlas of Health Care, an influential project that reports huge differences in health care costs and practices in different places in the United States, suggesting large potential cost savings if more efficient practices are used. (A claim that is certainly plausible to me, given this notorious graph ; see here for background.) Here’s an example of a claim from the Dartmouth Atlas (just picking something that happens to be featured on their webpage right now): Medicare beneficiaries who move to some regions receive many more diagnostic tests and new diagnoses than those who move to other regions. This study, published in the New England Journal of Medicine, raises important questions about whether being given more diagnoses is beneficial to patients and may help to explain recent controversies about regional differences in spending. A
5 0.79288352 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
Introduction: As part of his continuing plan to sap etc etc., Aleks pointed me to an article by Max Miller reporting on a recommendation from Jacob Appel: Adding trace amounts of lithium to the drinking water could limit suicides. . . . Communities with higher than average amounts of lithium in their drinking water had significantly lower suicide rates than communities with lower levels. Regions of Texas with lower lithium concentrations had an average suicide rate of 14.2 per 100,000 people, whereas those areas with naturally higher lithium levels had a dramatically lower suicide rate of 8.7 per 100,000. The highest levels in Texas (150 micrograms of lithium per liter of water) are only a thousandth of the minimum pharmaceutical dose, and have no known deleterious effects. I don’t know anything about this and am offering no judgment on it; I’m just passing it on. The research studies are here and here . I am skeptical, though, about this part of the argument: We are not talking a
6 0.785276 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
8 0.77230203 732 andrew gelman stats-2011-05-26-What Do We Learn from Narrow Randomized Studies?
9 0.76492673 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
11 0.75728607 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”
12 0.75728089 988 andrew gelman stats-2011-11-02-Roads, traffic, and the importance in decision analysis of carefully examining your goals
13 0.75306267 1397 andrew gelman stats-2012-06-27-Stand Your Ground laws and homicides
14 0.75265884 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking
16 0.75062126 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
17 0.75009876 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations
18 0.74660569 864 andrew gelman stats-2011-08-21-Going viral — not!
19 0.74517655 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
20 0.74369287 94 andrew gelman stats-2010-06-17-SAT stories
topicId topicWeight
[(9, 0.014), (15, 0.04), (16, 0.11), (21, 0.036), (24, 0.117), (77, 0.017), (85, 0.141), (86, 0.014), (87, 0.014), (95, 0.018), (99, 0.318)]
simIndex simValue blogId blogTitle
1 0.9764747 734 andrew gelman stats-2011-05-28-Funniest comment ever
Introduction: Here (scroll down to the bottom; for some reason the link doesn’t go directly to the comment itself). I’ve never actually seen a Kaypro but I remember the ads. (Background here .)
2 0.97523606 2300 andrew gelman stats-2014-04-21-Ticket to Baaaath
Introduction: Ooooooh, I never ever thought I’d have a legitimate excuse to tell this story, and now I do! The story took place many years ago, but first I have to tell you what made me think of it: Rasmus Bååth posted the following comment last month: On airplane tickets a Swedish “å” is written as “aa” resulting in Rasmus Baaaath. Once I bought a ticket online and five minutes later a guy from Lufthansa calls me and asks if I misspelled my name… OK, now here’s my story (which is not nearly as good). A long time ago (but when I was already an adult), I was in England for some reason, and I thought I’d take a day trip from London to Bath. So here I am on line, trying to think of what to say at the ticket counter. I remember that in England, they call Bath, Bahth. So, should I ask for “a ticket to Bahth”? I’m not sure, I’m afraid that it will sound silly, like I’m trying to fake an English accent. So, when I get to the front of the line, I say, hesitantly, “I’d like a ticket to Bath?
3 0.96700323 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models
Introduction: Becky Passonneau and colleagues at the Center for Computational Learning Systems (CCLS) at Columbia have been working on a project for ConEd (New York’s major electric utility) to rank structures based on vulnerability to secondary events (e.g., transformer explosions, cable meltdowns, electrical fires). They’ve been using the R implementation BayesTree of Chipman, George and McCulloch’s Bayesian Additive Regression Trees (BART). BART is a Bayesian non-parametric method that is non-identifiable in two ways. Firstly, it is an additive tree model with a fixed number of trees, the indexes of which aren’t identified (you get the same predictions in a model swapping the order of the trees). This is the same kind of non-identifiability you get with any mixture model (additive or interpolated) with an exchangeable prior on the mixture components. Secondly, the trees themselves have varying structure over samples in terms of number of nodes and their topology (depth, branching, etc
4 0.96557719 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?
Introduction: I was reading this news article by famed business reporter James Stewart: Measured by market capitalization, Apple is the world’s biggest public company. . . . Sales for the quarter that ended Dec. 31 . . . totaled $46.33 billion, up 73 percent from the year before. Earnings more than doubled. . . . Here is the rub: Apple is so big, it’s running up against the law of large numbers. Huh? At this point I sat up, curious. Stewart continued: Also known as the golden theorem, with a proof attributed to the 17th-century Swiss mathematician Jacob Bernoulli, the law states that a variable will revert to a mean over a large sample of results. In the case of the largest companies, it suggests that high earnings growth and a rapid rise in share price will slow as those companies grow ever larger. If Apple’s share price grew even 20 percent a year for the next decade, which is far below its current blistering pace, its $500 billion market capitalization would be more than $3 tri
5 0.96410078 1790 andrew gelman stats-2013-04-06-Calling Jenny Davidson . . .
Introduction: Now that you have some free time again, you’ll have to check out these books and tell us if they’re worth reading. Claire Kirch reports : Lizzie Skurnick Books launches in September with the release of Debutante Hill by Lois Duncan. The novel, which was originally published by Dodd, Mead, in 1958, has been out of print for about three decades. The other books on the initial list, all reissues, are A Long Day in November by Ernest J. Gaines (originally published in 1971), Happy Endings Are All Alike by Sandra Scoppettone (1979), I’ll Love You When You’re More Like Me by M.E. Kerr (1977), Secret Lives by Berthe Amoss (1979), To All My Fans, With Love, From Sylvie by Ellen Conford (1982), and Me and Fat Glenda by Lila Perl (1972). . . . Noting that many of the books of that era beloved by teen boys are still in print – such as Isaac Asimov’s novels and The Chocolate War by Robert Cormier – Skurnick pointed out that, in contrast, many of the books that were embraced by teen gir
6 0.95977896 58 andrew gelman stats-2010-05-29-Stupid legal crap
same-blog 7 0.95804071 584 andrew gelman stats-2011-02-22-“Are Wisconsin Public Employees Underpaid?”
9 0.9566533 417 andrew gelman stats-2010-11-17-Clutering and variance components
10 0.95576894 912 andrew gelman stats-2011-09-15-n = 2
12 0.95052689 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference
13 0.93565702 167 andrew gelman stats-2010-07-27-Why don’t more medical discoveries become cures?
14 0.93473226 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc
15 0.9345032 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data
16 0.93152905 1318 andrew gelman stats-2012-05-13-Stolen jokes
17 0.93135893 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems
18 0.9295814 843 andrew gelman stats-2011-08-07-Non-rant
19 0.92846912 2216 andrew gelman stats-2014-02-18-Florida backlash