andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-851 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Ana Sequeira writes: I am using a temporal data series and I am trying specifically to understand if there is a temporal trends in the occurrence of a species, for which I need to use “Year” in my models (and from what I understood from pages 244-246 [in ARM] is that factors should always be used as random effects). I believe that in your book the closest example to my situation is the one shown in Figure 14.3: I also have 4 different regions in my study, states in your example are replaced by years in my study, and the x axis is a specific value for a climatic factor I am using in my analysis (IOD). The reason why I am writing you, is because I am having troubles understanding if my variable “Year” (factor), should only be added as a random effect (1|Year) or if I should include the “Years” (used not as factor) in my models as well (Species ~ …Years + (1|Year))? My doubt lies in the fact that I am looking for a trend and if I do not include “Years” as variable I believe
sentIndex sentText sentNum sentScore
1 I believe that in your book the closest example to my situation is the one shown in Figure 14. [sent-2, score-0.431]
2 3: I also have 4 different regions in my study, states in your example are replaced by years in my study, and the x axis is a specific value for a climatic factor I am using in my analysis (IOD). [sent-3, score-0.967]
3 The reason why I am writing you, is because I am having troubles understanding if my variable “Year” (factor), should only be added as a random effect (1|Year) or if I should include the “Years” (used not as factor) in my models as well (Species ~ …Years + (1|Year))? [sent-4, score-0.794]
4 My doubt lies in the fact that I am looking for a trend and if I do not include “Years” as variable I believe the variance shown in the resulting random coefficients is conditional to the variables and effects used in the model, i. [sent-5, score-1.617]
5 if I am not specifically accounting for a possible trend (linear or polynomial), would my model still give me a trustworthy answer regarding yearly trends? [sent-7, score-0.713]
6 Also, some of my factors include only 4 and 5 levels (seasons and regions, respectively) – in which case, I understood that lmer() approximate inference is not reliable. [sent-8, score-0.688]
7 My reply: Yes, you can include year + (1|year). [sent-9, score-0.494]
8 Also, you could fit using blmer/bglmer to get more stable estimates of the group-level variances. [sent-12, score-0.189]
wordName wordTfidf (topN-words)
[('year', 0.28), ('temporal', 0.272), ('include', 0.214), ('factor', 0.213), ('species', 0.208), ('regions', 0.203), ('understood', 0.179), ('trend', 0.177), ('random', 0.171), ('trends', 0.162), ('shown', 0.154), ('specifically', 0.154), ('ana', 0.151), ('troubles', 0.142), ('yearly', 0.142), ('factors', 0.137), ('trustworthy', 0.136), ('climatic', 0.136), ('seasons', 0.136), ('years', 0.131), ('occurrence', 0.127), ('variable', 0.123), ('respectively', 0.114), ('lmer', 0.114), ('polynomial', 0.113), ('lies', 0.109), ('closest', 0.109), ('used', 0.109), ('variances', 0.105), ('accounting', 0.104), ('axis', 0.096), ('stable', 0.095), ('replaced', 0.094), ('believe', 0.094), ('using', 0.094), ('effects', 0.094), ('resulting', 0.089), ('arm', 0.088), ('approximate', 0.086), ('study', 0.084), ('coefficients', 0.076), ('pages', 0.076), ('models', 0.075), ('situation', 0.074), ('levels', 0.072), ('conditional', 0.071), ('added', 0.069), ('variance', 0.068), ('doubt', 0.068), ('linear', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 851 andrew gelman stats-2011-08-12-year + (1|year)
Introduction: Ana Sequeira writes: I am using a temporal data series and I am trying specifically to understand if there is a temporal trends in the occurrence of a species, for which I need to use “Year” in my models (and from what I understood from pages 244-246 [in ARM] is that factors should always be used as random effects). I believe that in your book the closest example to my situation is the one shown in Figure 14.3: I also have 4 different regions in my study, states in your example are replaced by years in my study, and the x axis is a specific value for a climatic factor I am using in my analysis (IOD). The reason why I am writing you, is because I am having troubles understanding if my variable “Year” (factor), should only be added as a random effect (1|Year) or if I should include the “Years” (used not as factor) in my models as well (Species ~ …Years + (1|Year))? My doubt lies in the fact that I am looking for a trend and if I do not include “Years” as variable I believe
2 0.14511204 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
Introduction: Maggie Fox writes : Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . . About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better. Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information. “From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said. “It is the one re
4 0.12891304 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?
Introduction: Henry Harpending writes: I am writing to ask you for a recommendation of something I can read to catch up on multivariate statistics. I am happy with random processes and linear algebra since they are important in population genetics. My last encounter with real statistics was several decades ago. Recently I have had to dip my toes into real multivariate statistics again and I am completely lost. I can’t, for example, figure out how a random effects model is different from what we used to call “partialing out” nuisance covariates. I have a hard time concentrating on exactly what a “BLURP” model is because the name is so silly. Can you recommend something accessible to me that would put me on track? My reply: if you’re interested particularly in random effects models, I will (parochially) refer you to my own book with Jennifer Hill. You can jump straight to the chapters on multilevel modeling. If the question is about traditional multivariate methods such as factor
5 0.12573448 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi
6 0.12341961 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
7 0.1232246 97 andrew gelman stats-2010-06-18-Economic Disparities and Life Satisfaction in European Regions
8 0.12067072 417 andrew gelman stats-2010-11-17-Clutering and variance components
9 0.11946695 1201 andrew gelman stats-2012-03-07-Inference = data + model
10 0.11761376 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
11 0.11725619 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates
12 0.11647694 2258 andrew gelman stats-2014-03-21-Random matrices in the news
13 0.11357017 1102 andrew gelman stats-2012-01-06-Bayesian Anova found useful in ecology
14 0.10878407 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling
15 0.10556082 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?
16 0.10106467 962 andrew gelman stats-2011-10-17-Death!
17 0.10058478 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
18 0.098323032 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models
19 0.096940465 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not
20 0.09660168 472 andrew gelman stats-2010-12-17-So-called fixed and random effects
topicId topicWeight
[(0, 0.181), (1, 0.09), (2, 0.087), (3, -0.022), (4, 0.1), (5, -0.011), (6, -0.016), (7, -0.018), (8, 0.072), (9, 0.074), (10, 0.008), (11, 0.011), (12, 0.008), (13, -0.024), (14, 0.068), (15, 0.036), (16, 0.017), (17, 0.011), (18, 0.015), (19, 0.014), (20, -0.026), (21, 0.03), (22, -0.003), (23, -0.013), (24, 0.004), (25, -0.02), (26, -0.08), (27, 0.028), (28, 0.008), (29, -0.013), (30, -0.032), (31, 0.008), (32, -0.06), (33, -0.011), (34, -0.01), (35, -0.014), (36, -0.027), (37, 0.045), (38, 0.016), (39, 0.006), (40, -0.02), (41, -0.01), (42, -0.006), (43, 0.057), (44, -0.026), (45, -0.045), (46, -0.032), (47, 0.022), (48, -0.0), (49, -0.008)]
simIndex simValue blogId blogTitle
same-blog 1 0.98158389 851 andrew gelman stats-2011-08-12-year + (1|year)
Introduction: Ana Sequeira writes: I am using a temporal data series and I am trying specifically to understand if there is a temporal trends in the occurrence of a species, for which I need to use “Year” in my models (and from what I understood from pages 244-246 [in ARM] is that factors should always be used as random effects). I believe that in your book the closest example to my situation is the one shown in Figure 14.3: I also have 4 different regions in my study, states in your example are replaced by years in my study, and the x axis is a specific value for a climatic factor I am using in my analysis (IOD). The reason why I am writing you, is because I am having troubles understanding if my variable “Year” (factor), should only be added as a random effect (1|Year) or if I should include the “Years” (used not as factor) in my models as well (Species ~ …Years + (1|Year))? My doubt lies in the fact that I am looking for a trend and if I do not include “Years” as variable I believe
2 0.81599665 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
Introduction: I received the following email from someone who wishes to remain anonymous: My colleague and I are trying to understand the best way to approach a problem involving measuring a group of individuals’ abilities across time, and are hoping you can offer some guidance. We are trying to analyze the combined effect of two distinct groups of people (A and B, with no overlap between A and B) who collaborate to produce a binary outcome, using a mixed logistic regression along the lines of the following. Outcome ~ (1 | A) + (1 | B) + Other variables What we’re interested in testing was whether the observed A random effects in period 1 are predictive of the A random effects in the following period 2. Our idea being create two models, each using a different period’s worth of data, to create two sets of A coefficients, then observe the relationship between the two. If the A’s have a persistent ability across periods, the coefficients should be correlated or show a linear-ish relationshi
3 0.79519778 417 andrew gelman stats-2010-11-17-Clutering and variance components
Introduction: Raymond Lim writes: Do you have any recommendations on clustering and binary models? My particular problem is I’m running a firm fixed effect logit and want to cluster by industry-year (every combination of industry-year). My control variable of interest in measured by industry-year and when I cluster by industry-year, the standard errors are 300x larger than when I don’t cluster. Strangely, this problem only occurs when doing logit and not OLS (linear probability). Also, clustering just by field doesn’t blow up the errors. My hunch is it has something to do with the non-nested structure of year, but I don’t understand why this is only problematic under logit and not OLS. My reply: I’d recommend including four multilevel variance parameters, one for firm, one for industry, one for year, and one for industry-year. (In lmer, that’s (1 | firm) + (1 | industry) + (1 | year) + (1 | industry.year)). No need to include (1 | firm.year) since in your data this is the error term. Try
4 0.77433616 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”
Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross
5 0.76906425 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
Introduction: Cyrus writes: I [Cyrus] was teaching a class on multilevel modeling, and we were playing around with different method to fit a random effects logit model with 2 random intercepts—one corresponding to “family” and another corresponding to “community” (labeled “mom” and “cluster” in the data, respectively). There are also a few regressors at the individual, family, and community level. We were replicating in part some of the results from the following paper : Improved estimation procedures for multilevel models with binary response: a case-study, by G Rodriguez, N Goldman. (I say “replicating in part” because we didn’t include all the regressors that they use, only a subset.) We were looking at the performance of estimation via glmer in R’s lme4 package, glmmPQL in R’s MASS package, and Stata’s xtmelogit. We wanted to study the performance of various estimation methods, including adaptive quadrature methods and penalized quasi-likelihood. I was shocked to discover that glmer
6 0.75324821 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups
7 0.75230771 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
8 0.7509582 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
9 0.74408567 2296 andrew gelman stats-2014-04-19-Index or indicator variables
11 0.7382834 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
12 0.73730636 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models
13 0.73612219 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
14 0.73140466 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”
15 0.73026317 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model
16 0.72876853 850 andrew gelman stats-2011-08-11-Understanding how estimates change when you move to a multilevel model
17 0.72814763 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model
18 0.72149199 823 andrew gelman stats-2011-07-26-Including interactions or not
19 0.71832132 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
20 0.71611375 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
topicId topicWeight
[(2, 0.057), (5, 0.043), (6, 0.165), (16, 0.097), (24, 0.075), (76, 0.013), (85, 0.023), (86, 0.056), (99, 0.362)]
simIndex simValue blogId blogTitle
1 0.96731853 221 andrew gelman stats-2010-08-21-Busted!
Introduction: I’m just glad that universities don’t sanction professors for publishing false theorems. If the guy really is nailed by the feds for fraud, I hope they don’t throw him in prison. In general, prison time seems like a brutal, expensive, and inefficient way to punish people. I’d prefer if the government just took 95% of his salary for several years, made him do community service (cleaning equipment at the local sewage treatment plant, perhaps; a lab scientist should be good at this sort of thing, no?), etc. If restriction of this dude’s personal freedom is judged be part of the sentence, he could be given some sort of electronic tag that would send a message to the police if he were ever more than 3 miles from his home. But no need to bill the taxpayers for the cost of keeping him in prison.
2 0.96257854 1906 andrew gelman stats-2013-06-19-“Behind a cancer-treatment firm’s rosy survival claims”
Introduction: Brett Keller points to a recent news article by Sharon Begley and Robin Respaut: A lot of doctors, hospitals and other healthcare providers in the United States decline to treat people who can’t pay, or have inadequate insurance, among other reasons. What sets CTCA [Cancer Treatment Centers of America] apart is that rejecting certain patients and, even more, culling some of its patients from its survival data lets the company tout in ads and post on its website patient outcomes that look dramatically better than they would if the company treated all comers. These are the rosy survival numbers . . . Details: CTCA reports on its website that the percentage of its patients who are alive after six months, a year, 18 months and longer regularly tops national figures. For instance, 60 percent of its non-small-cell lung cancer patients are alive at six months, CTCA says, compared to 38 percent nationally. And 64 percent of its prostate cancer patients are alive at three years, vers
3 0.95487112 819 andrew gelman stats-2011-07-24-Don’t idealize “risk aversion”
Introduction: Richard Thaler writes (click here and search on Thaler): Both risk and risk aversion are concepts that were once well defined, but are now in danger of becoming Aetherized [this is Thaler's term for adding free parameters to a model to make it work, thus destroying the purity and much of the value of the original model]. Stocks that earn surprisingly high returns are labeled as risky, because in the theory, excess returns must be accompanied by higher risk. If, inconveniently, the traditional measures of risk such as variance or covariance with the market are not high, then the Aetherists tell us there must be some other risk; we just don’t know what it is. Similarly, traditionally the concept of risk aversion was taken to be a primitive; each person had a parameter, gamma, that measured her degree of risk aversion. Now risk aversion is allowed to be time varying, and Aetherists can say with a straight face that the market crashes of 2001 and 2008 were caused by sudden increases
same-blog 4 0.95258474 851 andrew gelman stats-2011-08-12-year + (1|year)
Introduction: Ana Sequeira writes: I am using a temporal data series and I am trying specifically to understand if there is a temporal trends in the occurrence of a species, for which I need to use “Year” in my models (and from what I understood from pages 244-246 [in ARM] is that factors should always be used as random effects). I believe that in your book the closest example to my situation is the one shown in Figure 14.3: I also have 4 different regions in my study, states in your example are replaced by years in my study, and the x axis is a specific value for a climatic factor I am using in my analysis (IOD). The reason why I am writing you, is because I am having troubles understanding if my variable “Year” (factor), should only be added as a random effect (1|Year) or if I should include the “Years” (used not as factor) in my models as well (Species ~ …Years + (1|Year))? My doubt lies in the fact that I am looking for a trend and if I do not include “Years” as variable I believe
5 0.95095867 1924 andrew gelman stats-2013-07-03-Kuhn, 1-f noise, and the fractal nature of scientific revolutions
Introduction: Bill Harris writes: I was re-reading your and Shalizi’s “Philosophy and the practice of Bayesian statistics” [see also the rejoinder ] and noticed a statement near the end of section 6 about paradigm shifts coming in different magnitudes over different time spans. That reminded me of the almost-mystical ideas surrounding 1/f (f being frequency”) noise in some areas — the notion that almost everything exhibits that effect, and that effect extends to arbitrarily low f. (I sense the idea only gets mystical when f gets low enough so that the event that may happen stochastically is really big—say, you model the height of waves in the Atlantic as 1/f and discover that, at some low frequency, Bermuda becomes submerged. In other words, does the same mechanism that accounts for physical vibrations in the range of Hertz also account for the creation and destruction of islands that may occur in the range of reciprocal centuries?) When I first encountered 1/f noise in the area of electr
6 0.94812059 1710 andrew gelman stats-2013-02-06-The new Stan 1.1.1, featuring Gaussian processes!
8 0.94046104 618 andrew gelman stats-2011-03-18-Prior information . . . about the likelihood
9 0.940189 1625 andrew gelman stats-2012-12-15-“I coach the jumpers here at Boise State . . .”
12 0.91925108 2332 andrew gelman stats-2014-05-12-“The results (not shown) . . .”
13 0.91524047 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events
16 0.90965414 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over
17 0.90810758 263 andrew gelman stats-2010-09-08-The China Study: fact or fallacy?
18 0.90210688 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?
19 0.89936471 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models
20 0.89934319 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?