andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2117 knowledge-graph by maker-knowledge-mining

2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science


meta infos for this blog

Source: html

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. [sent-1, score-1.562]

2 Currently I am working with multilevel models and find your approach very interesting and useful. [sent-6, score-0.67]

3 May I ask you whether replication materials (e. [sent-7, score-0.482]

4 I had to reply: Hi—I’m embarrassed to say that our R files are a mess! [sent-10, score-0.414]

5 I had ideas of programming the approach more generally as an R package but this has not yet happened yet. [sent-11, score-0.915]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sociological', 0.338), ('methodology', 0.296), ('predictive', 0.234), ('nonlinearity', 0.233), ('comparisons', 0.218), ('models', 0.194), ('yet', 0.193), ('embarrassed', 0.187), ('hi', 0.184), ('files', 0.178), ('average', 0.175), ('mess', 0.174), ('emailed', 0.174), ('components', 0.166), ('approach', 0.164), ('materials', 0.163), ('replication', 0.147), ('programming', 0.143), ('package', 0.135), ('interactions', 0.133), ('somebody', 0.127), ('currently', 0.12), ('variance', 0.116), ('gelman', 0.114), ('code', 0.112), ('researcher', 0.111), ('multilevel', 0.107), ('happened', 0.105), ('available', 0.104), ('article', 0.1), ('ask', 0.098), ('university', 0.092), ('generally', 0.09), ('ideas', 0.085), ('journal', 0.083), ('recently', 0.082), ('working', 0.077), ('reply', 0.075), ('published', 0.074), ('whether', 0.074), ('may', 0.068), ('read', 0.067), ('interesting', 0.066), ('find', 0.062), ('statistical', 0.05), ('say', 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.

2 0.17875426 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

3 0.15242738 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

Introduction: Richard Morey writes: You and your blog readers may be interested to know that a we’ve released a major new version of the BayesFactor package to CRAN. The package computes Bayes factors for linear mixed models and regression models. Of course, I’m aware you don’t like point-null model comparisons, but the package does more than that; it also allows sampling from posterior distributions of the compared models, in much the same way that your arm package does with lmer objects. The sampling (both for the Bayes factors and posteriors) is quite fast, since the back end is written in C. Some basic examples using the package can be found here , and the CRAN page is here . Indeed I don’t like point-null model comparisons . . . but maybe this will be useful to some of you!

4 0.13756433 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

Introduction: Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended using multilevel models to get inference for small cross-classifications or small areas. We’re working on it.

5 0.13620764 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction

6 0.13333307 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

7 0.12930119 1445 andrew gelman stats-2012-08-06-Slow progress

8 0.112028 1363 andrew gelman stats-2012-06-03-Question about predictive checks

9 0.1026811 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

10 0.10248697 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

11 0.10233328 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

12 0.1011372 41 andrew gelman stats-2010-05-19-Updated R code and data for ARM

13 0.097361043 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

14 0.096848935 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?

15 0.094791442 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

16 0.094479308 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

17 0.094310716 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

18 0.094183832 1948 andrew gelman stats-2013-07-21-Bayes related

19 0.093879461 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

20 0.093647547 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.139), (1, 0.075), (2, -0.024), (3, -0.043), (4, 0.034), (5, 0.032), (6, -0.039), (7, -0.095), (8, 0.024), (9, 0.082), (10, 0.063), (11, -0.003), (12, 0.002), (13, 0.029), (14, 0.043), (15, 0.001), (16, -0.045), (17, -0.027), (18, -0.011), (19, 0.001), (20, 0.001), (21, 0.019), (22, 0.04), (23, 0.035), (24, -0.02), (25, -0.084), (26, -0.052), (27, 0.06), (28, 0.027), (29, -0.044), (30, -0.037), (31, 0.006), (32, 0.018), (33, 0.008), (34, 0.025), (35, -0.029), (36, 0.042), (37, 0.012), (38, -0.018), (39, -0.04), (40, 0.018), (41, 0.06), (42, 0.048), (43, -0.028), (44, -0.031), (45, -0.017), (46, -0.059), (47, 0.019), (48, 0.021), (49, -0.137)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98723865 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.

2 0.69985223 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

Introduction: Joscha Legewie points to this article by Lars Ronnegard, Xia Shen, and Moudud Alam, “hglm: A Package for Fitting Hierarchical Generalized Linear Models,” which just appeared in the R journal. This new package has the advantage, compared to lmer(), of allowing non-normal distributions for the varying coefficients. On the downside, they seem to have reverted to the ugly lme-style syntax (for example, “fixed = y ~ week, random = ~ 1|ID” rather than “y ~ week + (1|D)”). The old-style syntax has difficulties handling non-nested grouping factors. They also say they can estimated models with correlated random effects, but isn’t that just the same as varying-intercept, varying-slope models, which lmer (or Stata alternatives such as gllam) can already do? There’s also a bunch of stuff on H-likelihood theory, which seems pretty pointless to me (although probably it won’t do much harm either). In any case, this package might be useful to some of you, hence this note.

3 0.68741822 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

Introduction: Richard Morey writes: You and your blog readers may be interested to know that a we’ve released a major new version of the BayesFactor package to CRAN. The package computes Bayes factors for linear mixed models and regression models. Of course, I’m aware you don’t like point-null model comparisons, but the package does more than that; it also allows sampling from posterior distributions of the compared models, in much the same way that your arm package does with lmer objects. The sampling (both for the Bayes factors and posteriors) is quite fast, since the back end is written in C. Some basic examples using the package can be found here , and the CRAN page is here . Indeed I don’t like point-null model comparisons . . . but maybe this will be useful to some of you!

4 0.68180346 1445 andrew gelman stats-2012-08-06-Slow progress

Introduction: I received the following message: I am a Psychology postgraduate at the University of Glasgow and am writing for an article request. I’ve just read your 2008 published article titled “A weakly informative default prior distribution for logistic and other regression models” and found from it that your group also wrote a report on applying the Bayesian logistic regression approach to multilevel model, which is titled “An approximate EM algorithm for multilevel generalized linear models”. I have been looking for it online but did find it, and was wondering if I may request this report from you? My first thought is that this is a good sign that psychology undergraduates are reading papers like this. Unfortunately I had to reply as follows: Hi, we actually programmed this up but never debugged it! So no actual paper . . . I think I could’ve done it if I had ever focused on the problem. Between the messiness of the algebra and the messiness of the R code, I never got it all to

5 0.67381054 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago

Introduction: John Transue sent it in with the following thoughtful comment: I’d imagine you’ve already received this, but just in case, here’s a cartoon you’d like. At first blush it seems to go against your advice (more nuanced than what I’m about to say by quoting the paper title) to not worry about multiple comparisons. However, if I understand correctly your argument about multiple comparisons in multilevel models, the situation in this comic might have been avoided if shrinkage toward the grand mean (of all colors) had prevented the greens from clearing the .05 threshold. Is that right?

6 0.67246765 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

7 0.65337789 952 andrew gelman stats-2011-10-11-More reason to like Sims besides just his name

8 0.64354485 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

9 0.62945986 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

10 0.6286158 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

11 0.62297094 2277 andrew gelman stats-2014-03-31-The most-cited statistics papers ever

12 0.6208908 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

13 0.62079608 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

14 0.61283261 555 andrew gelman stats-2011-02-04-Handy Matrix Cheat Sheet, with Gradients

15 0.61232984 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

16 0.60601163 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

17 0.59895641 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

18 0.5983237 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

19 0.59810537 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

20 0.59684432 1880 andrew gelman stats-2013-06-02-Flame bait


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.024), (7, 0.026), (14, 0.061), (16, 0.116), (21, 0.022), (24, 0.066), (55, 0.035), (79, 0.025), (86, 0.027), (89, 0.082), (99, 0.391)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98947442 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.

2 0.96705484 859 andrew gelman stats-2011-08-18-Misunderstanding analysis of covariance

Introduction: Jeremy Miles writes: Are you familiar with Miller and Chapman’s (2001) article : Misunderstanding Analysis of Covariance saying that ANCOVA (and therefore, I suppose regression) should not be used when groups differ on a covariate. It has caused a moderate splash in psychology circles. I wondered if you had any thoughts on it. I had not heard of the article so I followed the link . . . ugh! Already on the very first column of the very first page they confuse nonadditivity with nonlinearity. I could probably continue with, “and it gets worse,” but since nobody’s paying me to read this one, I’ll stop reading right there on the first page! I prefer when people point me to good papers to read. . . .

3 0.96553659 901 andrew gelman stats-2011-09-12-Some thoughts on academic cheating, inspired by Frey, Wegman, Fischer, Hauser, Stapel

Introduction: As regular readers of this blog are aware, I am fascinated by academic and scientific cheating and the excuses people give for it. Bruno Frey and colleagues published a single article (with only minor variants) in five different major journals, and these articles did not cite each other. And there have been several other cases of his self-plagiarism (see this review from Olaf Storbeck). I do not mind the general practice of repeating oneself for different audiences—in the social sciences, we call this Arrow’s Theorem —but in this case Frey seems to have gone a bit too far. Blogger Economic Logic has looked into this and concluded that this sort of common practice is standard in “the context of the German(-speaking) academic environment,” and what sets Frey apart is not his self-plagiarism or even his brazenness but rather his practice of doing it in high-visibility journals. Economic Logic writes that “[Frey's] contribution is pedagogical, he found a good and interesting

4 0.96515977 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders

Introduction: Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either. Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away. (Note: Humphreys replies to some of these questions in a comment .) 1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always b

5 0.96332896 1783 andrew gelman stats-2013-03-31-He’s getting ready to write a book

Introduction: Eric Novik does some open-source planning : My co-author, Jacki Buros, and I [Novik] have just signed a contract with Apress to write a book tentatively entitled “Predictive Analytics with R”, which will cover programming best practices, data munging, data exploration, and single and multi-level models with case studies in social media, healthcare, politics, marketing, and the stock market. Why does the world need another R book? We think there is a shortage of books that deal with the complete and programmer centric analysis of real, dirty, and sometimes unstructured data. Our target audience are people who have some familiarity with statistics, but do not have much experience with programming. . . . The book is projected to be about 300 pages across 8 chapters. This is my first experience with writing a book and everything I heard about the process tells me that this is going to be a long and arduous endeavor lasting anywhere from 6 to 8 months. Novik emailed me and wrot

6 0.96263003 1917 andrew gelman stats-2013-06-28-Econ coauthorship update

7 0.96050799 1839 andrew gelman stats-2013-05-04-Jesus historian Niall Ferguson and the improving standards of public discourse

8 0.95974058 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

9 0.95960289 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades

10 0.95762038 430 andrew gelman stats-2010-11-25-The von Neumann paradox

11 0.95739722 2066 andrew gelman stats-2013-10-17-G+ hangout for test run of BDA course

12 0.95733947 2107 andrew gelman stats-2013-11-20-NYT (non)-retraction watch

13 0.95656466 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles

14 0.95621419 1038 andrew gelman stats-2011-12-02-Donate Your Data to Science!

15 0.95607853 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

16 0.9552874 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

17 0.95506948 452 andrew gelman stats-2010-12-06-Followup questions

18 0.95504558 2130 andrew gelman stats-2013-12-11-Multilevel marketing as a way of liquidating participants’ social networks

19 0.95503831 1243 andrew gelman stats-2012-04-03-Don’t do the King’s Gambit

20 0.95386517 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?