andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-397 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Ryan Seals writes: I’m an epidemiologist at Emory University, and I’m working on a project of release patterns in jails (basically trying to model how long individuals are in jail before they’re release, for purposes of designing short-term health interventions, i.e. HIV testing, drug counseling, etc…). The question lends itself to quantile regression; we’re interested in the # of days it takes for 50% and 75% of inmates to be released. But being a clustered/nested data structure, it also obviously lends itself to multilevel modeling, with the group-level being individual jails. So: do you know of any work on multilevel quantile regression? My quick lit search didn’t yield much, and I don’t see any preprogrammed way to do it in SAS. My reply: To start with, I’m putting in the R keyword here, on the hope that some readers might be able to refer you to an R function that does what you want. Beyond this, I think it should be possible to program something in Bugs. In ARM we hav
sentIndex sentText sentNum sentScore
1 Ryan Seals writes: I’m an epidemiologist at Emory University, and I’m working on a project of release patterns in jails (basically trying to model how long individuals are in jail before they’re release, for purposes of designing short-term health interventions, i. [sent-1, score-1.01]
2 The question lends itself to quantile regression; we’re interested in the # of days it takes for 50% and 75% of inmates to be released. [sent-4, score-0.945]
3 But being a clustered/nested data structure, it also obviously lends itself to multilevel modeling, with the group-level being individual jails. [sent-5, score-0.667]
4 So: do you know of any work on multilevel quantile regression? [sent-6, score-0.779]
5 My quick lit search didn’t yield much, and I don’t see any preprogrammed way to do it in SAS. [sent-7, score-0.243]
6 My reply: To start with, I’m putting in the R keyword here, on the hope that some readers might be able to refer you to an R function that does what you want. [sent-8, score-0.425]
7 Beyond this, I think it should be possible to program something in Bugs. [sent-9, score-0.072]
8 In ARM we have an example of a multilevel ordered logit, which doesn’t sound so different from what you’re doing. [sent-10, score-0.547]
9 I’ve never done a full quantile regression, but I imagine that you have to take some care in setting up the distributional form. [sent-11, score-0.837]
10 To start, you could fit some multilevel logistic regressions using different quantiles as cut-off points and plot your inferences to see generally what’s going on. [sent-12, score-0.822]
wordName wordTfidf (topN-words)
[('quantile', 0.501), ('lends', 0.301), ('multilevel', 0.278), ('release', 0.183), ('counseling', 0.167), ('emory', 0.167), ('epidemiologist', 0.167), ('regression', 0.16), ('hiv', 0.151), ('jail', 0.138), ('quantiles', 0.138), ('distributional', 0.132), ('designing', 0.12), ('interventions', 0.118), ('ordered', 0.117), ('ryan', 0.115), ('start', 0.111), ('logit', 0.106), ('purposes', 0.106), ('yield', 0.1), ('arm', 0.098), ('refer', 0.095), ('drug', 0.094), ('regressions', 0.093), ('re', 0.09), ('obviously', 0.088), ('sound', 0.086), ('logistic', 0.086), ('structure', 0.082), ('putting', 0.081), ('inferences', 0.081), ('plot', 0.08), ('individuals', 0.08), ('basically', 0.078), ('testing', 0.076), ('patterns', 0.076), ('search', 0.075), ('function', 0.073), ('setting', 0.073), ('days', 0.072), ('program', 0.072), ('takes', 0.071), ('project', 0.071), ('etc', 0.071), ('health', 0.069), ('quick', 0.068), ('different', 0.066), ('imagine', 0.066), ('care', 0.065), ('hope', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
Introduction: Ryan Seals writes: I’m an epidemiologist at Emory University, and I’m working on a project of release patterns in jails (basically trying to model how long individuals are in jail before they’re release, for purposes of designing short-term health interventions, i.e. HIV testing, drug counseling, etc…). The question lends itself to quantile regression; we’re interested in the # of days it takes for 50% and 75% of inmates to be released. But being a clustered/nested data structure, it also obviously lends itself to multilevel modeling, with the group-level being individual jails. So: do you know of any work on multilevel quantile regression? My quick lit search didn’t yield much, and I don’t see any preprogrammed way to do it in SAS. My reply: To start with, I’m putting in the R keyword here, on the hope that some readers might be able to refer you to an R function that does what you want. Beyond this, I think it should be possible to program something in Bugs. In ARM we hav
2 0.19825132 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?
Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure
3 0.1630263 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations
Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r
4 0.14158684 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together
Introduction: Vlad Kogan writes: I’ve using your book on regression and multilevel modeling and have a quick R question for you. Do you happen to know if there is any R package that can estimate a two-stage (instrumental variable) multi-level model? My reply: I don’t know. I’ll post on blog and maybe there will be a response. You could also try the R help list.
Introduction: A sociologist writes in: Samuel Lucas has just published a paper in Quality and Quantity arguing that anything less than a full probability sample of higher levels in HLMs yields biased and unusable results. If I follow him correctly, he is arguing that not only are the SEs too small, but the parameter estimates themselves are biased and we cannot say in advance whether the bias is positive or negative. Lucas has thrown down a big gauntlet, advising us throw away our data unless the sample of macro units is right and ignore the published results that fail this standard. Extreme. Is there another conclusion to be drawn? Other advice to be given? A Bayesian path out of the valley? Heres’s the abstract to Lucas’s paper: The multilevel model has become a staple of social research. I textually and formally explicate sample design features that, I contend, are required for unbiased estimation of macro-level multilevel model parameters and the use of tools for statistical infe
6 0.13077308 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit
7 0.12158617 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
8 0.12107798 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits
9 0.12030633 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
10 0.11988106 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
11 0.11851668 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .
12 0.11635977 1445 andrew gelman stats-2012-08-06-Slow progress
13 0.11414736 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
14 0.11297067 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance
15 0.10843283 77 andrew gelman stats-2010-06-09-Sof[t]
16 0.10802684 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
17 0.10653953 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
18 0.10572682 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list
19 0.10461494 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis
20 0.10456684 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling
topicId topicWeight
[(0, 0.15), (1, 0.06), (2, 0.039), (3, 0.001), (4, 0.12), (5, 0.046), (6, -0.008), (7, -0.075), (8, 0.056), (9, 0.094), (10, 0.013), (11, -0.001), (12, 0.024), (13, -0.006), (14, 0.03), (15, 0.031), (16, -0.04), (17, -0.026), (18, 0.003), (19, 0.018), (20, 0.021), (21, 0.025), (22, 0.001), (23, -0.005), (24, -0.052), (25, -0.112), (26, 0.008), (27, -0.062), (28, -0.074), (29, 0.006), (30, 0.008), (31, 0.025), (32, -0.001), (33, -0.009), (34, -0.032), (35, -0.072), (36, 0.0), (37, 0.038), (38, -0.0), (39, 0.013), (40, 0.019), (41, 0.025), (42, -0.015), (43, -0.093), (44, 0.041), (45, 0.025), (46, 0.008), (47, 0.014), (48, -0.084), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.97309107 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
Introduction: Ryan Seals writes: I’m an epidemiologist at Emory University, and I’m working on a project of release patterns in jails (basically trying to model how long individuals are in jail before they’re release, for purposes of designing short-term health interventions, i.e. HIV testing, drug counseling, etc…). The question lends itself to quantile regression; we’re interested in the # of days it takes for 50% and 75% of inmates to be released. But being a clustered/nested data structure, it also obviously lends itself to multilevel modeling, with the group-level being individual jails. So: do you know of any work on multilevel quantile regression? My quick lit search didn’t yield much, and I don’t see any preprogrammed way to do it in SAS. My reply: To start with, I’m putting in the R keyword here, on the hope that some readers might be able to refer you to an R function that does what you want. Beyond this, I think it should be possible to program something in Bugs. In ARM we hav
2 0.82849866 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable
Introduction: Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended using multilevel models to get inference for small cross-classifications or small areas. We’re working on it.
3 0.77975112 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together
Introduction: Vlad Kogan writes: I’ve using your book on regression and multilevel modeling and have a quick R question for you. Do you happen to know if there is any R package that can estimate a two-stage (instrumental variable) multi-level model? My reply: I don’t know. I’ll post on blog and maybe there will be a response. You could also try the R help list.
Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When
5 0.74297076 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations
Introduction: James O’Brien writes: How would you explain, to a “classically-trained” hypothesis-tester, that “It’s OK to fit a multilevel model even if some groups have only one observation each”? I [O'Brien] think I understand the logic and the statistical principles at work in this, but I’ve having trouble being clear and persuasive. I also feel like I’m contending with some methodological conventional wisdom here. My reply: I’m so used to this idea that I find it difficult to defend it in some sort of general conceptual way. So let me retreat to a more functional defense, which is that multilevel modeling gives good estimates, especially when the number of observations per group is small. One way to see this in any particular example in through cross-validation. Another way is to consider the alternatives. If you try really hard you can come up with a “classical hypothesis testing” approach which will do as well as the multilevel model. It would just take a lot of work. I’d r
6 0.73610753 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis
7 0.73247385 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
8 0.7202661 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
9 0.71509618 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression
10 0.70325494 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .
11 0.70225161 2296 andrew gelman stats-2014-04-19-Index or indicator variables
12 0.69391865 1445 andrew gelman stats-2012-08-06-Slow progress
13 0.69060862 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
14 0.68472272 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
17 0.67889833 417 andrew gelman stats-2010-11-17-Clutering and variance components
18 0.67491835 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit
19 0.67312282 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health
20 0.67046595 948 andrew gelman stats-2011-10-10-Combining data from many sources
topicId topicWeight
[(4, 0.015), (7, 0.021), (15, 0.027), (16, 0.054), (17, 0.165), (21, 0.03), (24, 0.128), (31, 0.013), (41, 0.023), (55, 0.011), (59, 0.018), (61, 0.018), (63, 0.014), (77, 0.025), (83, 0.015), (86, 0.011), (99, 0.312)]
simIndex simValue blogId blogTitle
1 0.96838677 2314 andrew gelman stats-2014-05-01-Heller, Heller, and Gorfine on univariate and multivariate information measures
Introduction: Malka Gorfine writes: We noticed that the important topic of association measures and tests came up again in your blog, and we have few comments in this regard. It is useful to distinguish between the univariate and multivariate methods. A consistent multivariate method can recognise dependence between two vectors of random variables, while a univariate method can only loop over pairs of components and check for dependency between them. There are very few consistent multivariate methods. To the best of our knowledge there are three practical methods: 1) HSIC by Gretton et al. (http://www.gatsby.ucl.ac.uk/~gretton/papers/GreBouSmoSch05.pdf) 2) dcov by Szekely et al. (http://projecteuclid.org/euclid.aoas/1267453933) 3) A method we introduced in Heller et al (Biometrika, 2013, 503—510, http://biomet.oxfordjournals.org/content/early/2012/12/04/biomet.ass070.full.pdf+html, and an R package, HHG, is available as well http://cran.r-project.org/web/packages/HHG/index.html). A
2 0.96466839 309 andrew gelman stats-2010-10-01-Why Development Economics Needs Theory?
Introduction: Robert Neumann writes: in the JEP 24(3), page18, Daron Acemoglu states: Why Development Economics Needs Theory There is no general agreement on how much we should rely on economic theory in motivating empirical work and whether we should try to formulate and estimate “structural parameters.” I (Acemoglu) argue that the answer is largely “yes” because otherwise econometric estimates would lack external validity, in which case they can neither inform us about whether a particular model or theory is a useful approximation to reality, nor would they be useful in providing us guidance on what the effects of similar shocks and policies would be in different circumstances or if implemented in different scales. I therefore define “structural parameters” as those that provide external validity and would thus be useful in testing theories or in policy analysis beyond the specific environment and sample from which they are derived. External validity becomes a particularly challenging t
3 0.96212626 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures
Introduction: Malka Gorfine, Ruth Heller, and Yair Heller write a comment on the paper of Reshef et al. that we discussed a few months ago. Just to remind you what’s going on here, here’s my quick summary from December: Reshef et al. propose a new nonlinear R-squared-like measure. Unlike R-squared, this new method depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the scale of the fit matters. The clever idea of the paper is that, instead of going for an absolute measure (which, as we’ve seen, will be scale-dependent), they focus on the problem of summarizing the grid of pairwise dependences in a large set of variables. As they put it: “Imagine a data set with hundreds
4 0.95381689 1616 andrew gelman stats-2012-12-10-John McAfee is a Heinlein hero
Introduction: “A small group of mathematicians” Jenny Davidson points to this article by Krugman on Asimov’s Foundation Trilogy. Given the silliness of the topic, Krugman’s piece is disappointingly serious (“Maybe the first thing to say about Foundation is that it’s not exactly science fiction – not really. Yes, it’s set in the future, there’s interstellar travel, people shoot each other with blasters instead of pistols and so on. But these are superficial details . . . the story can sound arid and didactic. . . . you’ll also be disappointed if you’re looking for shoot-em-up action scenes, in which Han Solo and Luke Skywalker destroy the Death Star in the nick of time. . . .”). What really jumped out at me from Krugman’s piece, though, was this line: In Foundation, we learn that a small group of mathematicians have developed “psychohistory”, the aforementioned rigorous science of society. Like Davidson (and Krugman), I read the Foundation books as a child. I remember the “psychohisto
5 0.95249617 1136 andrew gelman stats-2012-01-23-Fight! (also a bit of reminiscence at the end)
Introduction: Martin Lindquist and Michael Sobel published a fun little article in Neuroimage on models and assumptions for causal inference with intermediate outcomes. As their subtitle indicates (“A response to the comments on our comment”), this is a topic of some controversy. Lindquist and Sobel write: Our original comment (Lindquist and Sobel, 2011) made explicit the types of assumptions neuroimaging researchers are making when directed graphical models (DGMs), which include certain types of structural equation models (SEMs), are used to estimate causal effects. When these assumptions, which many researchers are not aware of, are not met, parameters of these models should not be interpreted as effects. . . . [Judea] Pearl does not disagree with anything we stated. However, he takes exception to our use of potential outcomes notation, which is the standard notation used in the statistical literature on causal inference, and his comment is devoted to promoting his alternative conventions. [C
6 0.95180178 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’
7 0.94884574 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting
same-blog 8 0.94484377 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
9 0.94085145 1362 andrew gelman stats-2012-06-03-Question 24 of my final exam for Design and Analysis of Sample Surveys
10 0.93613791 1076 andrew gelman stats-2011-12-21-Derman, Rodrik and the nature of statistical models
11 0.93248373 2324 andrew gelman stats-2014-05-07-Once more on nonparametric measures of mutual information
12 0.91891485 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life
13 0.91719013 1591 andrew gelman stats-2012-11-26-Politics as an escape hatch
14 0.91649723 1383 andrew gelman stats-2012-06-18-Hierarchical modeling as a framework for extrapolation
15 0.91545355 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions
16 0.91076851 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system
17 0.90867233 1467 andrew gelman stats-2012-08-23-The pinch-hitter syndrome again
18 0.90843523 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?
19 0.90700269 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks
20 0.90653908 216 andrew gelman stats-2010-08-18-More forecasting competitions