andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-464 knowledge-graph by maker-knowledge-mining

464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model


meta infos for this blog

Source: html

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. [sent-1, score-1.264]

2 I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. [sent-2, score-0.463]

3 Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. [sent-4, score-0.682]

4 However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. [sent-5, score-0.931]

5 I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? [sent-6, score-0.506]

6 My reply: I agree that these points can be confusing, as can be seen by the 5 different definitions of fixed/random effects that I discuss in the Anova paper. [sent-7, score-0.256]

7 Here’s what I would say: Your goal is to estimate what’s going on in these 21 districts. [sent-9, score-0.18]

8 To the extent there is a “true” superpopulation, it could be thought of as representing variation over time as well as space. [sent-10, score-0.256]

9 But, mathematically, the superpop and the associated normal (or whatever) distribution can be viewed as a tool for getting statistically efficient estimates for the 21 districts that you have. [sent-11, score-1.244]

10 Now that you have simultaneously estimated parameters for these 21 districts, you might also be interested in ensemble properties, for example the maximum, minimum, interquartile range, or even–gasp–standard deviation of these 21 numbers. [sent-12, score-0.723]

11 It’s well known that no point estimate in high-dimensional space can capture ensemble properties–the key paper here is a 1984 article by Tom Louis, which is referred to in one of my books (BDA or ARM). [sent-13, score-0.48]

12 I guess what I’m saying is that you make it clear that your goal is the 21 districts and that the Bayesian inference and superpop is a tool for getting there. [sent-14, score-0.967]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('districts', 0.377), ('superpop', 0.281), ('ensemble', 0.242), ('superpopulation', 0.242), ('variance', 0.231), ('variation', 0.173), ('district', 0.168), ('properties', 0.162), ('effects', 0.159), ('tool', 0.132), ('interquartile', 0.128), ('normal', 0.119), ('sparsely', 0.116), ('estimated', 0.113), ('appropriate', 0.107), ('breast', 0.101), ('goal', 0.1), ('incoherent', 0.099), ('populated', 0.097), ('finland', 0.097), ('definitions', 0.097), ('louis', 0.096), ('hospital', 0.093), ('anova', 0.092), ('across', 0.091), ('survival', 0.091), ('distribution', 0.089), ('simultaneously', 0.088), ('regional', 0.088), ('estimates', 0.088), ('mainly', 0.086), ('patients', 0.085), ('minimum', 0.084), ('confusing', 0.084), ('component', 0.083), ('mathematically', 0.083), ('representing', 0.083), ('modeled', 0.082), ('viewed', 0.081), ('estimate', 0.08), ('summarize', 0.08), ('bda', 0.08), ('referred', 0.079), ('hill', 0.079), ('capture', 0.079), ('getting', 0.077), ('deviation', 0.076), ('interested', 0.076), ('arm', 0.075), ('tom', 0.075)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

2 0.17543837 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

Introduction: John Lawson writes: I have been experimenting using Bayesian Methods to estimate variance components, and I have noticed that even when I use a noninformative prior, my estimates are never close to the method of moments or REML estimates. In every case I have tried, the sum of the Bayesian estimated variance components is always larger than the sum of the estimates obtained by method of moments or REML. For data sets I have used that arise from a simple one-way random effects model, the Bayesian estimates of the between groups variance component is usually larger than the method of moments or REML estimates. When I use a uniform prior on the between standard deviation (as you recommended in your 2006 paper ) rather than an inverse gamma prior on the between variance component, the between variance component is usually reduced. However, for the dyestuff data in Davies(1949, p74), the opposite appears to be the case. I am a worried that the Bayesian estimators of the varian

3 0.161919 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

Introduction: Chris Che-Castaldo writes: I am trying to compute variance components for a hierarchical model where the group level has two binary predictors and their interaction. When I model each of these three predictors as N(0, tau) the model will not converge, perhaps because the number of coefficients in each batch is so small (2 for the main effects and 4 for the interaction). Although I could simply leave all these as predictors as unmodeled fixed effects, the last sentence of section 21.2 on page 462 of Gelman and Hill (2007) suggests this would not be a wise course of action: For example, it is not clear how to define the (finite) standard deviation of variables that are included in interactions. I am curious – is there still no clear cut way to directly compute the finite standard deviation for binary unmodeled variables that are also part of an interaction as well as the interaction itself? My reply: I’d recommend including these in your model (it’s probably easiest to do so

4 0.15020739 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

5 0.14635079 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t

6 0.13648428 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers

7 0.13431656 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

8 0.12981232 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

9 0.12942354 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

10 0.12876923 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

11 0.12818645 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

12 0.12689057 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

13 0.12462052 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

14 0.12270335 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

15 0.12256307 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

16 0.11482206 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

17 0.1136221 846 andrew gelman stats-2011-08-09-Default priors update?

18 0.11354622 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

19 0.10728025 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

20 0.10632322 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.173), (1, 0.121), (2, 0.091), (3, -0.04), (4, 0.023), (5, -0.007), (6, 0.047), (7, -0.023), (8, 0.002), (9, 0.041), (10, 0.015), (11, -0.011), (12, 0.046), (13, -0.034), (14, 0.054), (15, 0.004), (16, -0.058), (17, 0.031), (18, -0.008), (19, 0.032), (20, -0.02), (21, -0.006), (22, 0.045), (23, 0.008), (24, 0.031), (25, -0.026), (26, -0.083), (27, 0.092), (28, -0.017), (29, 0.016), (30, -0.022), (31, 0.008), (32, -0.028), (33, -0.056), (34, 0.036), (35, -0.001), (36, -0.013), (37, -0.039), (38, -0.009), (39, -0.005), (40, 0.015), (41, -0.01), (42, -0.038), (43, 0.05), (44, -0.055), (45, 0.021), (46, 0.049), (47, -0.011), (48, 0.006), (49, 0.005)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97308803 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

2 0.83382261 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

3 0.81586748 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models

Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer

4 0.80711514 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

Introduction: Fred Wu writes: I work at National Prescribing Services in Australia. I have a database representing say, antidiabetic drug utilisation for the entire Australia in the past few years. I planned to do a longitudinal analysis across GP Division Network (112 divisions in AUS) using mixed-effects models (or as you called in your book varying intercept and varying slope) on this data. The problem here is: as data actually represent the population who use antidiabetic drugs in AUS, should I use 112 fixed dummy variables to capture the random variations or use varying intercept and varying slope for the model ? Because some one may aruge, like divisions in AUS or states in USA can hardly be considered from a “superpopulation”, then fixed dummies should be used. What I think is the population are those who use the drugs, what will happen when the rest need to use them? In terms of exchangeability, using varying intercept and varying slopes can be justified. Also you provided in y

5 0.80558681 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi

6 0.77987361 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

7 0.76749629 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

8 0.76246017 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

9 0.76020002 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

10 0.75729448 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff

11 0.75106287 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

12 0.7489661 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

13 0.74553436 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

14 0.73412931 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology

15 0.72429603 472 andrew gelman stats-2010-12-17-So-called fixed and random effects

16 0.72037965 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

17 0.71850145 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

18 0.71430331 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

19 0.70739543 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

20 0.70259959 184 andrew gelman stats-2010-08-04-That half-Cauchy prior


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.011), (2, 0.045), (9, 0.028), (16, 0.039), (20, 0.02), (21, 0.012), (24, 0.182), (27, 0.028), (42, 0.017), (43, 0.022), (48, 0.143), (53, 0.019), (76, 0.019), (79, 0.014), (86, 0.026), (89, 0.017), (99, 0.264)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95738292 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

Introduction: José Iparraguirre writes: There’s a letter in the latest issue of The Economist (July 31st) signed by Sir Richard Branson (Virgin), Michael Masters (Masters Capital Management) and David Frenk (Better Markets) about an “>OECD report on speculation and the prices of commodities, which includes the following: “The report uses a Granger causality test to measure the relationship between the level of commodities futures contracts held by swap dealers, and the prices of those commodities. Granger tests, however, are of dubious applicability to extremely volatile variables like commodities prices.” The report says: Granger causality is a standard statistical technique for determining whether one time series is useful in forecasting another. It is important to bear in mind that the term causality is used in a statistical sense, and not in a philosophical one of structural causation. More precisely a variable A is said to Granger cause B if knowing the time paths of B and A toge

2 0.95372462 848 andrew gelman stats-2011-08-11-That xkcd cartoon on multiple comparisons that all of you were sending me a couple months ago

Introduction: John Transue sent it in with the following thoughtful comment: I’d imagine you’ve already received this, but just in case, here’s a cartoon you’d like. At first blush it seems to go against your advice (more nuanced than what I’m about to say by quoting the paper title) to not worry about multiple comparisons. However, if I understand correctly your argument about multiple comparisons in multilevel models, the situation in this comic might have been avoided if shrinkage toward the grand mean (of all colors) had prevented the greens from clearing the .05 threshold. Is that right?

same-blog 3 0.94980466 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

4 0.94979239 332 andrew gelman stats-2010-10-10-Proposed new section of the American Statistical Association on Imaging Sciences

Introduction: Martin Lindquist writes that he and others are trying to start a new ASA section on statistics in imaging. If you’re interested in being a signatory to its formation, please send him an email.

5 0.93849397 181 andrew gelman stats-2010-08-03-MCMC in Python

Introduction: John Salvatier forwards a note from Anand Patil that a paper on PyMC has appeared in the Journal of Statistical Software, We’ll have to check this out.

6 0.91814989 681 andrew gelman stats-2011-04-26-Worst statistical graphic I have seen this year

7 0.91576385 605 andrew gelman stats-2011-03-09-Does it feel like cheating when I do this? Variation in ethical standards and expectations

8 0.91553938 823 andrew gelman stats-2011-07-26-Including interactions or not

9 0.91549838 1088 andrew gelman stats-2011-12-28-Argument in favor of Ddulites

10 0.90815175 1240 andrew gelman stats-2012-04-02-Blogads update

11 0.90695018 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

12 0.90610641 1518 andrew gelman stats-2012-10-02-Fighting a losing battle

13 0.90582919 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

14 0.90190846 1234 andrew gelman stats-2012-03-28-The Supreme Court’s Many Median Justices

15 0.90053326 1496 andrew gelman stats-2012-09-14-Sides and Vavreck on the 2012 election

16 0.89686763 2118 andrew gelman stats-2013-11-30-???

17 0.89609593 1941 andrew gelman stats-2013-07-16-Priors

18 0.89600903 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

19 0.89567471 2358 andrew gelman stats-2014-06-03-Did you buy laundry detergent on their most recent trip to the store? Also comments on scientific publication and yet another suggestion to do a study that allows within-person comparisons

20 0.8955844 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values