andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1919 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.
sentIndex sentText sentNum sentScore
1 I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co. [sent-1, score-1.925]
2 fipsid” and now R is automatically changing them to “co_fipsid”. [sent-2, score-0.453]
3 Or maybe the names had underbars all along, and the old R had changed them into dots. [sent-3, score-0.865]
4 I understand that backward compatibility can be hard to maintain, but this is just annoying. [sent-5, score-0.901]
wordName wordTfidf (topN-words)
[('names', 0.427), ('compatibility', 0.348), ('backward', 0.313), ('files', 0.284), ('maintain', 0.273), ('annoying', 0.253), ('automatically', 0.234), ('changing', 0.219), ('changed', 0.191), ('code', 0.18), ('variable', 0.168), ('old', 0.156), ('graphs', 0.152), ('along', 0.135), ('reading', 0.133), ('understand', 0.121), ('got', 0.121), ('trying', 0.12), ('hard', 0.119), ('problems', 0.113), ('maybe', 0.091), ('using', 0.086), ('new', 0.074), ('make', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1919 andrew gelman stats-2013-06-29-R sucks
Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.
2 0.25807148 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff
Introduction: Ubs writes: I was looking at baby name data last night and I stumbled upon something curious. I follow the baby names blog occasionally but not regularly, so I’m not sure if it’s been noticed before. Let me present it like this: Take the statement… Of the top 100 boys and top 100 girls names, only ___% contain the letter __. I’m using the SSA baby names page, so that’s U.S. births, and I’m looking at the decade of 2000-2009 (so kids currently aged 4 to 13). Which letters would you expect to have the lowest rate of occurrence? As expected, the lowest score is for Q, which appears zero times. (Jacqueline ranks #104 for girls.) It’s the second lowest that surprised me. (… You can pause and try to guess now. Spoilers to follow.) Of the other big-point Scrabble letters, Z appears in four names (Elizabeth, Zachary, Mackenzie, Zoe) and X in six, of which five are closely related (Alexis, Alexander, Alexandra, Alexa, Alex, Xavier). J is heavily overrepresented, especial
3 0.2219597 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin
Introduction: In our Cliff thread from yesterday, sociologist Philip Cohen pointed to his discussions in the decline in the popularity of the name Mary. One thing that came up was the traditional trendiness of girls’ names. So I thought I’d share my thoughts from a couple of years ago, as reported by David Leonhardt: Andrew Gelman, a statistics professor at Columbia and an amateur name-ologist, argues that many parents want their boys to seem mature and so pick classic names. William, David, Joseph and James, all longtime stalwarts, remain in the Top 20. With girls, Gelman says, parents are attracted to names that convey youth even into adulthood and choose names that seem to be on the upswing. By the 1990s, of course, not many girls from the 1880s were still around, and that era’s names could seem fresh again. This search for youthfulness makes girls’ names more volatile — and increasingly so, as more statistics about names become available and parents grow more willing to experiment
4 0.18837619 1808 andrew gelman stats-2013-04-17-Excel-bashing
Introduction: In response to the latest controversy , a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel that I was able to reproduce his results – it may be better now, but at the time, Excel could propagate round-off error and catastrophically cancel like no other software! Microsoft has lots of top researchers so it’s hard for me to understand how Excel can remain so crappy. I mean, sure, I understand in some general way that they have a large user base, it’s hard to maintain backward compatibility, there’s feature creep, and, besides all that, lots of people have different preferences in data analysis than I do. But still, it’s such a joke. Word has problems too, but I can see how these problems arise from its d
5 0.16180396 41 andrew gelman stats-2010-05-19-Updated R code and data for ARM
Introduction: Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data.
6 0.15718332 925 andrew gelman stats-2011-09-26-Ethnicity and Population Structure in Personal Naming Networks
7 0.15066609 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
8 0.14117536 2071 andrew gelman stats-2013-10-21-Most Popular Girl Names by State over Time
9 0.11714001 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0
10 0.11636905 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?
11 0.10160537 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence
12 0.095124625 733 andrew gelman stats-2011-05-27-Another silly graph
13 0.095029257 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff
14 0.089604214 2166 andrew gelman stats-2014-01-10-3 years out of date on the whole Dennis the dentist thing!
15 0.087505147 99 andrew gelman stats-2010-06-19-Paired comparisons
16 0.087222643 319 andrew gelman stats-2010-10-04-“Who owns Congress”
17 0.084635392 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
18 0.083611071 790 andrew gelman stats-2011-07-08-Blog in motion
19 0.082454205 372 andrew gelman stats-2010-10-27-A use for tables (really)
20 0.081979252 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes
topicId topicWeight
[(0, 0.085), (1, -0.019), (2, -0.022), (3, 0.038), (4, 0.097), (5, -0.036), (6, -0.004), (7, -0.032), (8, 0.005), (9, -0.012), (10, -0.003), (11, 0.023), (12, -0.021), (13, 0.023), (14, 0.03), (15, 0.078), (16, 0.011), (17, -0.036), (18, -0.008), (19, -0.013), (20, 0.017), (21, 0.037), (22, -0.014), (23, 0.006), (24, -0.018), (25, -0.042), (26, -0.007), (27, 0.029), (28, 0.016), (29, -0.022), (30, -0.001), (31, 0.035), (32, -0.04), (33, 0.025), (34, -0.0), (35, -0.037), (36, -0.002), (37, 0.051), (38, -0.027), (39, 0.019), (40, -0.035), (41, 0.009), (42, 0.045), (43, 0.026), (44, -0.017), (45, 0.069), (46, -0.027), (47, 0.103), (48, 0.09), (49, 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 0.98371512 1919 andrew gelman stats-2013-06-29-R sucks
Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.
2 0.72606772 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff
Introduction: Ubs writes: I was looking at baby name data last night and I stumbled upon something curious. I follow the baby names blog occasionally but not regularly, so I’m not sure if it’s been noticed before. Let me present it like this: Take the statement… Of the top 100 boys and top 100 girls names, only ___% contain the letter __. I’m using the SSA baby names page, so that’s U.S. births, and I’m looking at the decade of 2000-2009 (so kids currently aged 4 to 13). Which letters would you expect to have the lowest rate of occurrence? As expected, the lowest score is for Q, which appears zero times. (Jacqueline ranks #104 for girls.) It’s the second lowest that surprised me. (… You can pause and try to guess now. Spoilers to follow.) Of the other big-point Scrabble letters, Z appears in four names (Elizabeth, Zachary, Mackenzie, Zoe) and X in six, of which five are closely related (Alexis, Alexander, Alexandra, Alexa, Alex, Xavier). J is heavily overrepresented, especial
3 0.62062681 1716 andrew gelman stats-2013-02-09-iPython Notebook
Introduction: Burak Bayramli writes: I wanted to inform you on iPython Notebook technology – allowing markup, Python code to reside in one document. Someone ported one of your examples from ARM . iPynb file is actually a live document, can be downloaded and reran locally, hence change of code on document means change of images, results. Graphs (as well as text output) which are generated by the code, are placed inside the document automatically. No more referencing image files seperately. For now running notebooks locally require a notebook server, but that part can live “on the cloud” as part of an educational software. Viewers, such as nbviewer.ipython.org, do not even need that much, since all recent results of a notebook are embedded in the notebook itself. A lot of people are excited about this; Also out of nowhere, Alfred P. Sloan Foundation dropped a $1.15 million grant on the developers of ipython which provided some extra energy on the project. Cool. We’ll have to do that ex
4 0.6144321 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin
Introduction: In our Cliff thread from yesterday, sociologist Philip Cohen pointed to his discussions in the decline in the popularity of the name Mary. One thing that came up was the traditional trendiness of girls’ names. So I thought I’d share my thoughts from a couple of years ago, as reported by David Leonhardt: Andrew Gelman, a statistics professor at Columbia and an amateur name-ologist, argues that many parents want their boys to seem mature and so pick classic names. William, David, Joseph and James, all longtime stalwarts, remain in the Top 20. With girls, Gelman says, parents are attracted to names that convey youth even into adulthood and choose names that seem to be on the upswing. By the 1990s, of course, not many girls from the 1880s were still around, and that era’s names could seem fresh again. This search for youthfulness makes girls’ names more volatile — and increasingly so, as more statistics about names become available and parents grow more willing to experiment
5 0.56018078 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?
Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure
6 0.55596328 1249 andrew gelman stats-2012-04-06-Thinking seriously about social science research
7 0.5557577 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
8 0.55520558 266 andrew gelman stats-2010-09-09-The future of R
9 0.54432571 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope
10 0.54188246 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
11 0.53430593 2333 andrew gelman stats-2014-05-13-Personally, I’d rather go with Teragram
12 0.53111577 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff
13 0.52814311 1655 andrew gelman stats-2013-01-05-The statistics software signal
14 0.51649654 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
15 0.5162186 2160 andrew gelman stats-2014-01-06-Spam names
16 0.51144999 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
17 0.50817424 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved
18 0.50751495 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
19 0.50548327 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
20 0.50392079 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines
topicId topicWeight
[(4, 0.228), (16, 0.055), (24, 0.134), (44, 0.067), (99, 0.34)]
simIndex simValue blogId blogTitle
same-blog 1 0.96854758 1919 andrew gelman stats-2013-06-29-R sucks
Introduction: I was trying to make some new graphs using 5-year-old R code and I got all these problems because I was reading in files with variable names such as “co.fipsid” and now R is automatically changing them to “co_fipsid”. Or maybe the names had underbars all along, and the old R had changed them into dots. Whatever. I understand that backward compatibility can be hard to maintain, but this is just annoying.
Introduction: Alexander at GiveWell writes : The Disease Control Priorities in Developing Countries (DCP2), a major report funded by the Gates Foundation . . . provides an estimate of $3.41 per disability-adjusted life-year (DALY) for the cost-effectiveness of soil-transmitted-helminth (STH) treatment, implying that STH treatment is one of the most cost-effective interventions for global health. In investigating this figure, we have corresponded, over a period of months, with six scholars who had been directly or indirectly involved in the production of the estimate. Eventually, we were able to obtain the spreadsheet that was used to generate the $3.41/DALY estimate. That spreadsheet contains five separate errors that, when corrected, shift the estimated cost effectiveness of deworming from $3.41 to $326.43. [I think they mean to say $300 -- ed.] We came to this conclusion a year after learning that the DCP2’s published cost-effectiveness estimate for schistosomiasis treatment – another kind of
3 0.95244575 1618 andrew gelman stats-2012-12-11-The consulting biz
Introduction: I received the following (unsolicited) email: Hello, *** LLC, a ***-based market research company, has a financial client who is interested in speaking with a statistician who has done research in the field of Alzheimer’s Disease and preferably familiar with the SOLA and BAPI trials. We offer an honorarium of $200 for a 30 minute telephone interview. Please advise us if you have an employment or consulting agreement with any organization or operate professionally pursuant to an organization’s code of conduct or employee manual that may control activities by you outside of your regular present and former employment, such as participating in this consulting project for MedPanel. If there are such contracts or other documents that do apply to you, please forward MedPanel a copy of each such document asap as we are obligated to review such documents to determine if you are permitted to participate as a consultant for MedPanel on a project with this particular client. If you are
4 0.93899131 1918 andrew gelman stats-2013-06-29-Going negative
Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki
5 0.93644291 1801 andrew gelman stats-2013-04-13-Can you write a program to determine the causal order?
Introduction: Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature and altitude of cities in Germany, and they said that altitude causes temperature). It has the feeling to me of one of those weird standardized tests we used to see sometimes in school, where there’s no real correct answer so the goal is to figure out what the test-writer wanted you to say. Nonetheless, this might be of interest, so I’m passing it along to you.
6 0.9224838 907 andrew gelman stats-2011-09-14-Reproducibility in Practice
7 0.92043447 238 andrew gelman stats-2010-08-27-No radon lobby
8 0.91890979 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”
9 0.91273707 1829 andrew gelman stats-2013-04-28-Plain old everyday Bayesianism!
10 0.90786874 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics
11 0.90746272 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies
12 0.8966819 2000 andrew gelman stats-2013-08-28-Why during the 1950-1960′s did Jerry Cornfield become a Bayesian?
13 0.89396179 2078 andrew gelman stats-2013-10-26-“The Bayesian approach to forensic evidence”
14 0.89292645 2212 andrew gelman stats-2014-02-15-Mary, Mary, why ya buggin
15 0.88948572 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff
16 0.88688892 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!
17 0.8732661 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?
18 0.8637169 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
19 0.86124253 1605 andrew gelman stats-2012-12-04-Write This Book