andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1808 knowledge-graph by maker-knowledge-mining

1808 andrew gelman stats-2013-04-17-Excel-bashing


meta infos for this blog

Source: html

Introduction: In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its d


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. [sent-1, score-0.576]

2 Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. [sent-2, score-0.551]

3 It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! [sent-3, score-0.613]

4 Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. [sent-4, score-0.631]

5 I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. [sent-5, score-1.146]

6 Word has problems too, but I can see how these problems arise from its desirable features. [sent-7, score-0.452]

7 The disaster that is Excel seems like more of a mystery. [sent-8, score-0.148]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('excel', 0.531), ('infringement', 0.184), ('creep', 0.184), ('unsuccessfully', 0.184), ('propagate', 0.176), ('cancel', 0.165), ('compatibility', 0.165), ('apologies', 0.153), ('mystery', 0.151), ('backward', 0.148), ('disaster', 0.148), ('desirable', 0.139), ('reproduce', 0.136), ('besides', 0.136), ('consulting', 0.129), ('maintain', 0.129), ('controversy', 0.126), ('microsoft', 0.126), ('preferences', 0.119), ('replicate', 0.119), ('researchers', 0.118), ('krugman', 0.117), ('understand', 0.115), ('lots', 0.114), ('hard', 0.112), ('base', 0.112), ('user', 0.111), ('surprising', 0.11), ('problems', 0.107), ('feature', 0.104), ('arise', 0.099), ('remain', 0.099), ('latest', 0.099), ('paul', 0.094), ('software', 0.094), ('expert', 0.092), ('somewhat', 0.092), ('word', 0.089), ('wasn', 0.084), ('professor', 0.083), ('serious', 0.074), ('top', 0.073), ('able', 0.068), ('error', 0.068), ('response', 0.066), ('regression', 0.062), ('large', 0.058), ('trying', 0.057), ('ago', 0.056), ('mean', 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1808 andrew gelman stats-2013-04-17-Excel-bashing

Introduction: In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its d

2 0.18837619 1919 andrew gelman stats-2013-06-29-R sucks

Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.

3 0.12863277 530 andrew gelman stats-2011-01-22-MS-Bayes?

Introduction: I received the following email: Did you know that it looks like Microsoft is entering the modeling game? I mean, outside of Excel. I recently received an email at work from a MS research contractor looking for ppl that program in R, SAS, Matlab, Excel, and Mathematica. . . . So far I [the person who sent me this email] haven’t seen anything about applying any actual models. Only stuff about assigning variables, deleting rows, merging tables, etc. I don’t know how common knowledge this all is within the statistical community. I did a quick google search for the name of the programming language and didn’t come up with anything. That sounds cool. Working with anything from Microsoft sounds pretty horrible, but it would be useful to have another modeling language out there, just for checking our answers if nothing else.

4 0.12429141 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure

5 0.1131258 1844 andrew gelman stats-2013-05-06-Against optimism about social science

Introduction: Social science research has been getting pretty bad press recently, what with the Excel buccaneers who didn’t know how to handle data with different numbers of observations per country, and the psychologist who published dozens of papers based on fabricated data, and the Evilicious guy who wouldn’t let people review his data tapes, etc etc. And that’s not even considering Dr. Anil Potti. On the other hand, the revelation of all these problems can be taken as evidence that things are getting better. Psychology researcher Gary Marcus writes : There is something positive that has come out of the crisis of replicability—something vitally important for all experimental sciences. For years, it was extremely difficult to publish a direct replication, or a failure to replicate an experiment, in a good journal. . . . Now, happily, the scientific culture has changed. . . . The Reproducibility Project, from the Center for Open Science is now underway . . . And sociologist Fabio Rojas

6 0.10811995 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

7 0.1036907 1721 andrew gelman stats-2013-02-13-A must-read paper on statistical analysis of experimental data

8 0.10358464 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

9 0.10013173 1661 andrew gelman stats-2013-01-08-Software is as software does

10 0.093426012 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics

11 0.087593555 318 andrew gelman stats-2010-10-04-U-Haul statistics

12 0.085890472 2137 andrew gelman stats-2013-12-17-Replication backlash

13 0.078763209 2054 andrew gelman stats-2013-10-07-Bing is preferred to Google by people who aren’t like me

14 0.073968843 1597 andrew gelman stats-2012-11-29-What is expected of a consultant

15 0.070335835 192 andrew gelman stats-2010-08-08-Turning pages into data

16 0.069856949 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

17 0.068616517 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

18 0.066381171 1435 andrew gelman stats-2012-07-30-Retracted articles and unethical behavior in economics journals?

19 0.066160627 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update

20 0.065877222 124 andrew gelman stats-2010-07-02-Note to the quals


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.127), (1, -0.012), (2, -0.014), (3, -0.013), (4, 0.034), (5, -0.005), (6, 0.001), (7, -0.024), (8, 0.011), (9, 0.007), (10, -0.023), (11, -0.026), (12, -0.014), (13, -0.011), (14, -0.021), (15, 0.015), (16, -0.021), (17, -0.028), (18, -0.002), (19, -0.007), (20, 0.008), (21, 0.045), (22, -0.027), (23, -0.005), (24, -0.035), (25, -0.012), (26, 0.012), (27, -0.001), (28, 0.002), (29, -0.006), (30, 0.012), (31, 0.032), (32, 0.008), (33, -0.011), (34, -0.024), (35, -0.011), (36, -0.021), (37, 0.055), (38, -0.02), (39, 0.021), (40, 0.022), (41, 0.0), (42, -0.008), (43, 0.027), (44, -0.002), (45, 0.026), (46, 0.013), (47, 0.0), (48, 0.055), (49, 0.003)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94644284 1808 andrew gelman stats-2013-04-17-Excel-bashing

Introduction: In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its d

2 0.79166442 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure

3 0.74854594 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

Introduction: Christian Robert posts these thoughts : I [Ross Ihaka] have been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory use), but there are more fundamental issues too. Some of these were inherited from S and some are peculiar to R. One of the worst problems is scoping. Consider the following little gem. f =function() { if (runif(1) > .5) x = 10 x } The x being returned by this function is randomly local or global. There are other examples where variables alternate between local and non-local throughout the body of a function. No sensible language would allow this. It’s ugly and it makes optimisation really difficult. This isn’t the only problem, even weirder things happen because of interactions between scoping and lazy evaluation. In light of this, I [Ihaka] have come to the c

4 0.7478506 1525 andrew gelman stats-2012-10-08-Ethical standards in different data communities

Introduction: I opened the paper today and saw this from Paul Krugman, on Jack Welch, the former chairman of General Electric, who posted an assertion on Twitter that the [recent unemployment data] had been cooked to help President Obama’s re-election campaign. His claim was quickly picked up by right-wing pundits and media personalities. It was nonsense, of course. Job numbers are prepared by professional civil servants, at an agency that currently has no political appointees. But then maybe Mr. Welch — under whose leadership G.E. reported remarkably smooth earnings growth, with none of the short-term fluctuations you might have expected (fluctuations that reappeared under his successor) — doesn’t know how hard it would be to cook the jobs data. I was curious so I googled *general electric historical earnings*. It was surprisingly difficult to find the numbers! Most of the links just went back to 2011, or to 2008. Eventually I came across this blog by Barry Ritholtz that showed this

5 0.72658283 910 andrew gelman stats-2011-09-15-Google Refine

Introduction: Tools worth knowing about: Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase. A recent discussion on the Polmeth list about the ANES Cumulative File is a setting where I think Refine might help (admittedly 49760×951 is bigger than I’d really like to deal with in the browser with js… but on a subset yes). [I might write this example up later.] Go watch the screencast videos for Refine. Data-entry problems are rampant in stuff we all use — leading or trailing spaces; mixed decimal-indicators; different units or transformations used in the same column; mixed lettercase leading to false duplicates; that’s only the beginning. Refine certainly would help find duplicates, and it counts things for you too. Just counting rows is too much for researchers sometimes (see yesterday’s post )! Refine 2.0 adds some data-collection tools for

6 0.71692681 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism

7 0.70839494 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

8 0.69885677 266 andrew gelman stats-2010-09-09-The future of R

9 0.69359207 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

10 0.69296962 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!

11 0.69218552 527 andrew gelman stats-2011-01-20-Cars vs. trucks

12 0.69204605 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

13 0.68666154 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events

14 0.68614554 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.

15 0.68580002 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R

16 0.68126768 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs

17 0.68069088 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

18 0.67906231 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

19 0.67832434 360 andrew gelman stats-2010-10-21-Forensic bioinformatics, or, Don’t believe everything you read in the (scientific) papers

20 0.67831182 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.016), (13, 0.015), (16, 0.043), (21, 0.065), (24, 0.093), (27, 0.017), (44, 0.034), (57, 0.016), (75, 0.205), (77, 0.017), (89, 0.037), (94, 0.015), (95, 0.041), (97, 0.018), (99, 0.26)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96200371 893 andrew gelman stats-2011-09-06-Julian Symons on Frances Newman

Introduction: “She was forty years old when she died. It is possible that her art might have developed to include a wider area of human experience, just as possible that the chilling climate of the thirties might have withered it altogether. But what she actually wrote was greatly talented. She deserves a place, although obviously not a foremost one, in any literary history of the years between the wars. The last letter she wrote, or rather dictated, to the printer of the Laforgue translations shows the invariable fastidiousness of her talent, a fastidiousness which is often infuriating but just as often impressive, and is in any case rare enough to be worth remembrance: To the Printer of Six Moral Tales This book is to be spelled and its words are to be hyphenated according to the usage of the Concise Oxford Dictionary. Page introduction continuously with the tales. Do not put brackets around the numbers of the pages. All the ‘todays’ and all the ‘tomorrows’ should be spelled w

2 0.93463266 1067 andrew gelman stats-2011-12-18-Christopher Hitchens was a Bayesian

Introduction: 1. We Bayesian statisticians like to say there are three kinds of statisticians: a. Bayesians; b. People who are Bayesians but don’t realize it (that is, they act in coherence with some unstated probability); c. Failed Bayesians (that is, people whose inference could be improved by some attention to coherence). So, if a statistician does great work, we are inclined to claim this person for the Bayesian cause, even if he or she vehemently denies any Bayesian leanings. 2. In his autobiography, Bertrand Russell tells the story of when he went to prison for opposing World War 1: I [Russell] was much cheered on my arrival by the warden at the gate, who had to take particulars about me. He asked my religion, and I replied ‘agnostic.’ He asked how to spell it, and remarked with a sigh: “Well, there are many religions, but I suppose they all worship the same God.” This remark kept me cheerful for about a week. 3. In an op-ed today, Ross Douthat argues that celebrated a

3 0.93209159 522 andrew gelman stats-2011-01-18-Problems with Haiti elections?

Introduction: Mark Weisbrot points me to this report trashing a recent OAS report on Haiti’s elections. Weisbrot writes: The two simplest things that are wrong with the OAS analysis are: (1) By looking only at a sample of the tally sheets and not using any statistical test, they have no idea how many other tally sheets would also be thrown out by the same criteria that they used, and how that would change the result and (2) The missing/quarantined tally sheets are much greater in number than the ones that they threw out; our analysis indicates that if these votes had been counted, the result would go the other way. I have not had a chance to take a look at this myself but I’m posting it here so that experts on election irregularities can see this and give their judgments. P.S. Weisbrot updates: We [Weisbrot et al.] published our actual paper on the OAS Mission’s Report today. The press release is here and gives a very good summary of the major problems with the OAS Mission rep

4 0.92904711 1396 andrew gelman stats-2012-06-27-Recently in the sister blog

Introduction: If Paul Krugman is right and it’s 1931, what happens next? What’s with Niall Ferguson? Hey, this reminds me of the Democrats in the U.S. . . . Would President Romney contract the economy? Inconsistency with prior knowledge triggers children’s causal explanatory reasoning

same-blog 5 0.91302013 1808 andrew gelman stats-2013-04-17-Excel-bashing

Introduction: In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its d

6 0.89536893 28 andrew gelman stats-2010-05-12-Alert: Incompetent colleague wastes time of hardworking Wolfram Research publicist

7 0.87290549 1003 andrew gelman stats-2011-11-11-$

8 0.85466313 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation

9 0.84525585 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

10 0.83636165 2157 andrew gelman stats-2014-01-02-2013

11 0.82948488 2034 andrew gelman stats-2013-09-23-My talk Tues 24 Sept at 12h30 at Université de Technologie de Compiègne

12 0.82038844 2081 andrew gelman stats-2013-10-29-My talk in Amsterdam tomorrow (Wed 29 Oct): Can we use Bayesian methods to resolve the current crisis of statistically-significant research findings that don’t hold up?

13 0.81228614 8 andrew gelman stats-2010-04-28-Advice to help the rich get richer

14 0.79613781 967 andrew gelman stats-2011-10-20-Picking on Gregg Easterbrook

15 0.79455173 2228 andrew gelman stats-2014-02-28-Combining two of my interests

16 0.79351383 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

17 0.79316157 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

18 0.79279852 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults

19 0.79176718 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think

20 0.7913534 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!