andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-918 knowledge-graph by maker-knowledge-mining

918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

meta infos for this blog

Source: html

Introduction: Pablo Verde sends in this letter he and Daniel Curcio just published in the Journal of Antimicrobial Chemotherapy. They had published a meta-analysis with a boundary estimate which, he said, gave nonsense results. Here’s Curcio and Verde’s key paragraph: The authors [of the study they are criticizing] performed a test of heterogeneity between studies. Given that the test result was not significant at 5%, they decided to pool all the RRs by using a fixed-effect meta-analysis model. Unfortunately, this is a common practice in meta-analysis, which usually leads to very misleading results. First of all, the pooled RR as well as its standard error are sensitive to 2 the estimation of the between-studies standard deviation (SD). SD is difficult to estimate with a small number of studies. On the other hand, it is very well known that the significant test of hetero- geneity lacks statistical power to detect values of SD greater than zero. In addition, the statistically non-significant re

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Pablo Verde sends in this letter he and Daniel Curcio just published in the Journal of Antimicrobial Chemotherapy. [sent-1, score-0.269]

2 They had published a meta-analysis with a boundary estimate which, he said, gave nonsense results. [sent-2, score-0.643]

3 Here’s Curcio and Verde’s key paragraph: The authors [of the study they are criticizing] performed a test of heterogeneity between studies. [sent-3, score-0.456]

4 Given that the test result was not significant at 5%, they decided to pool all the RRs by using a fixed-effect meta-analysis model. [sent-4, score-0.619]

5 Unfortunately, this is a common practice in meta-analysis, which usually leads to very misleading results. [sent-5, score-0.169]

6 First of all, the pooled RR as well as its standard error are sensitive to 2 the estimation of the between-studies standard deviation (SD). [sent-6, score-0.574]

7 SD is difficult to estimate with a small number of studies. [sent-7, score-0.101]

8 On the other hand, it is very well known that the significant test of hetero- geneity lacks statistical power to detect values of SD greater than zero. [sent-8, score-0.976]

9 In addition, the statistically non-significant results of this test cannot be interpreted as evidence of the homogeneity of the results among all RCTs included. [sent-9, score-0.587]

10 How can you generally avoid boundary estimates of multilevel variance parameters? [sent-10, score-0.45]

11 Using our cute little trick , implemented in blmer/bglmer in the blme package in R. [sent-11, score-0.548]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('curcio', 0.358), ('verde', 0.358), ('sd', 0.337), ('test', 0.242), ('boundary', 0.233), ('rcts', 0.154), ('blme', 0.154), ('pablo', 0.154), ('nonsense', 0.147), ('rr', 0.147), ('lacks', 0.142), ('pooled', 0.128), ('significant', 0.123), ('heterogeneity', 0.122), ('cute', 0.109), ('detect', 0.108), ('pool', 0.106), ('standard', 0.103), ('implemented', 0.102), ('estimate', 0.101), ('sensitive', 0.101), ('trick', 0.098), ('interpreted', 0.097), ('deviation', 0.097), ('published', 0.094), ('criticizing', 0.092), ('performed', 0.092), ('greater', 0.091), ('letter', 0.09), ('results', 0.09), ('daniel', 0.09), ('misleading', 0.087), ('package', 0.085), ('sends', 0.085), ('leads', 0.082), ('paragraph', 0.081), ('decided', 0.08), ('estimation', 0.077), ('addition', 0.076), ('avoid', 0.075), ('variance', 0.074), ('unfortunately', 0.072), ('well', 0.068), ('using', 0.068), ('gave', 0.068), ('multilevel', 0.068), ('power', 0.068), ('statistically', 0.068), ('values', 0.068), ('known', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

2 0.1987163 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a

3 0.16212389 1524 andrew gelman stats-2012-10-07-An (impressive) increase in survival rate from 50% to 60% corresponds to an R-squared of (only) 1%. Counterintuitive, huh?

Introduction: I was just reading an old post and came across this example which I’d like to share with you again: Here’s a story of R-squared = 1%. Consider a 0/1 outcome with about half the people in each category. For.example, half the people with some disease die in a year and half live. Now suppose there’s a treatment that increases survival rate from 50% to 60%. The unexplained sd is 0.5 and the explained sd is 0.05, hence R-squared is 0.01.

4 0.11107472 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

Introduction: Avi sent along this old paper from Bryk and Raudenbush, who write: The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject’s development fails. An alternative model is required to represent how treatment effects are distributed across individuals. We develop in this article a simple statistical model to demonstrate the link between heterogeneity of variance and random treatment effects. Next, we illustrate with results from two previously published studies how a failure to recognize the substantive importance of heterogeneity of variance obscured significant results present in these data. The article concludes with a review and synthesis of techniques for modeling variances. Although these methods have been well established in the statistical literature, they are not widely known by social and behavioral scientists. T

5 0.10143404 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

Introduction: For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries. Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution. I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero. One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis

6 0.096026108 401 andrew gelman stats-2010-11-08-Silly old chi-square!

7 0.09515658 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

8 0.092817254 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

9 0.088698521 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

10 0.083699644 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

11 0.083317488 2247 andrew gelman stats-2014-03-14-The maximal information coefficient

12 0.082393147 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

13 0.081482425 187 andrew gelman stats-2010-08-05-Update on state size and governors’ popularity

14 0.080858208 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

15 0.079385839 1605 andrew gelman stats-2012-12-04-Write This Book

16 0.078246318 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

17 0.077453867 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

18 0.077146605 899 andrew gelman stats-2011-09-10-The statistical significance filter

19 0.076705337 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

20 0.074489906 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.118), (1, 0.057), (2, 0.061), (3, -0.123), (4, 0.045), (5, -0.02), (6, 0.021), (7, -0.026), (8, -0.015), (9, -0.01), (10, 0.008), (11, -0.016), (12, 0.023), (13, -0.027), (14, 0.024), (15, -0.022), (16, -0.04), (17, 0.007), (18, 0.019), (19, -0.033), (20, 0.013), (21, 0.01), (22, 0.028), (23, -0.001), (24, 0.02), (25, -0.042), (26, -0.037), (27, -0.023), (28, 0.018), (29, -0.056), (30, 0.022), (31, 0.03), (32, 0.022), (33, -0.009), (34, 0.051), (35, 0.009), (36, -0.014), (37, 0.004), (38, 0.058), (39, -0.015), (40, -0.016), (41, -0.037), (42, -0.016), (43, 0.058), (44, -0.024), (45, -0.008), (46, -0.021), (47, 0.015), (48, 0.007), (49, -0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99005938 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

2 0.71451086 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

Introduction: David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight simulator example in your 2007 book. The difference is that I have a lot of predictors specific to each cell (I think equivalent to airport and pilot in your example), and I find after modeling this in JAGS, I happily find that the predictors are much less important than the variability by cell (airport and pilot effects). Happily because this is what I am writing a paper about. However, I then went to check subsets of predictors using lm() and lmer(). I understand that they all use different estimation methods, but what I can’t figure out is why the errors on all of the coefficient estimates are *so* different. For example, using JAGS, and th

3 0.64615011 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

Introduction: Continuing our discussion of general measures of correlations, Ben Murrell sends along this paper (with corresponding R package), which begins: When two variables are related by a known function, the coefficient of determination (denoted R-squared) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, ρ. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably – assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized R-squared when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) – a recently pr

4 0.64045924 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

Introduction: Dean Eckles writes: Thought you might be interested in an example that touches on a couple recurring topics: 1. The difference between a statistically significant finding and one that is non-significant need not be itself statistically significant (thus highlighting the problems of using NHST to declare whether an effect exists or not). 2. Continued issues with the credibility of high profile studies of “social contagion”, especially by Christakis and Fowler . A new paper in Archives of Sexual Behavior produces observational estimates of peer effects in sexual behavior and same-sex attraction. In the text, the authors (who include C&F;) make repeated comparisons of the results for peer effects in sexual intercourse and those for peer effects in same-sex attraction. However, the 95% CI for the later actually includes the point estimate for the former! This is most clear in Figure 2, as highlighted by Real Clear Science’s blog post about the study. (Now because there is som

5 0.63572586 1404 andrew gelman stats-2012-07-03-Counting gays

Introduction: Gary Gates writes : In a recent study, the author of this article estimated that the self- identified lesbian, gay, bisexual, and transgender (LGBT) community makes up 3.8 percent of the American population. The authorâ€™s estimate was far lower than many scholars and activists had contended, and it included a relatively high proportion of persons self-identifying as bisexuals. This article responds to two of the central criticisms that arose in the controversy that followed. First, in response to claims that his estimate did not account for people who are in the closet, the author describes how demographers might measure the size of the closet. Second, in response to those who either ignored the reported large incidence of bisexuality or misconstrued the meaning of that incidence, the Author considers how varying frameworks for conceptualizing sexual orientation might alter the ratio of lesbian or gay individuals to bisexuals. This article goes on to offer observations about the ch

6 0.63485682 310 andrew gelman stats-2010-10-02-The winner’s curse

7 0.6339193 899 andrew gelman stats-2011-09-10-The statistical significance filter

8 0.63372809 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

9 0.6213178 963 andrew gelman stats-2011-10-18-Question on Type M errors

10 0.60949898 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

11 0.60301149 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

12 0.59813696 897 andrew gelman stats-2011-09-09-The difference between significant and not significant…

13 0.59797084 156 andrew gelman stats-2010-07-20-Burglars are local

14 0.59700775 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

15 0.59678936 1893 andrew gelman stats-2013-06-11-Folic acid and autism

16 0.59597623 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

17 0.59473175 226 andrew gelman stats-2010-08-23-More on those L.A. Times estimates of teacher effectiveness

18 0.58997118 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

19 0.58679038 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

20 0.58572084 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(13, 0.035), (16, 0.059), (24, 0.196), (35, 0.03), (36, 0.024), (52, 0.043), (82, 0.025), (85, 0.013), (86, 0.029), (87, 0.205), (95, 0.02), (99, 0.212)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94588834 152 andrew gelman stats-2010-07-17-Distorting the Electoral Connection? Partisan Representation in Confirmation Politics

Introduction: John Kastellec, Jeff Lax, and Justin Phillips write : Do senators respond to the preferences of their statesâ€™ median voters or only to the preferences of their co-partisans? We [Kastellec et al.] study responsiveness using roll call votes on ten recent Supreme Court nominations. We develop a method for estimating state-level public opinion broken down by partisanship. We find that senators respond more powerfully to their partisan base when casting such roll call votes. Indeed, when their state median voter and party median voter disagree, senators strongly favor the latter. [emphasis added] This has significant implications for the study of legislative responsiveness, the role of public opinion in shaping the personnel of the nations highest court, and the degree to which we should expect the Supreme Court to be counter-majoritarian. Our method can be applied elsewhere to estimate opinion by state and partisan group, or by many other typologies, so as to study other important qu

2 0.9437412 225 andrew gelman stats-2010-08-23-Getting into hot water over hot graphics

Introduction: I like what Antony Unwin has to say here (start on page 5).

same-blog 3 0.92389405 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

4 0.89972615 355 andrew gelman stats-2010-10-20-Andy vs. the Ideal Point Model of Voting

Introduction: Last week, as I walked into Andrew’s office for a meeting, he was formulating some misgivings about applying an ideal-point model to budgetary bills in the U.S. Senate. Andrew didn’t like that the model of a senator’s position was an indifference point rather than at their optimal point, and that the effect of moving away from a position was automatically modeled as increasing in one direction and decreasing in the other. Executive Summary The monotonicity of inverse logit entails that the expected vote for a bill among any fixed collection of senators’ ideal points is monotonically increasing (or decreasing) with the bill’s position, with direction determined by the outcome coding. The Ideal-Point Model The ideal-point model’s easy to write down, but hard to reason about because of all the polarity shifting going on. To recapitulate from Gelman and Hill’s Regression book (p. 317), using the U.S. Senate instead of the Supreme Court, and ignoring the dis

5 0.88546157 233 andrew gelman stats-2010-08-25-Lauryn Hill update

Introduction: Juli thought this might answer some of my questions . To me, though, it seemed a bit of a softball interview, didn’t really go into the theory that the reason she’s stopped recording is that she didn’t really write most of the material herself.

6 0.8811295 1773 andrew gelman stats-2013-03-21-2.15

7 0.86385459 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

8 0.86119401 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

9 0.85951102 2087 andrew gelman stats-2013-11-03-The Employment Nondiscrimination Act is overwhelmingly popular in nearly every one of the 50 states

10 0.85455406 183 andrew gelman stats-2010-08-04-Bayesian models for simultaneous equation systems?

11 0.85413063 783 andrew gelman stats-2011-06-30-Don’t stop being a statistician once the analysis is done

12 0.8539927 127 andrew gelman stats-2010-07-04-Inequality and health

13 0.84319246 548 andrew gelman stats-2011-02-01-What goes around . . .

14 0.81773901 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia

15 0.81356287 2074 andrew gelman stats-2013-10-23-Can’t Stop Won’t Stop Mister P Beatdown

16 0.80964464 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

17 0.80778551 1875 andrew gelman stats-2013-05-28-Simplify until your fake-data check works, then add complications until you can figure out where the problem is coming from

18 0.80424643 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

19 0.80370951 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

20 0.80347109 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics