andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-772 knowledge-graph by maker-knowledge-mining

772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models


meta infos for this blog

Source: html

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . [sent-2, score-1.385]

2 The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. [sent-4, score-0.787]

3 So now we want to implement these in R and put them into arm along with bglmer etc. [sent-5, score-0.161]

4 Setting up coefplot so it works more generally (that is, so the graphics look nice for models with one predictor, two predictors, or twenty predictors). [sent-7, score-0.345]

5 Also a bunch of expansions to coefplot: - Defining coefplot for multilevel models - Also displaying average predictive comparisons for nonlinear models - Setting it up to automatically display several regressions in a large “table” 3. [sent-8, score-1.492]

6 Automatic plots showing data and fitted regression lines/curves. [sent-9, score-0.39]

7 With multiple inputs, you hold all the inputs but one to fixed values–it’s sort of like an average predictive comparison, but graphical. [sent-10, score-0.624]

8 We also have to handle interactions and multilevel models. [sent-11, score-0.256]

9 Generalizing R-squared and partial pooling factors for multivariate (varying-intercept, varying-slope) models. [sent-13, score-0.471]

10 Graphs showing what happens as you add a multilevel component to a model. [sent-15, score-0.417]

11 This is something I’ve been thinking about for awhile, ever since doing the police stop and frisk model with Jeff Fagan and Alex Kiss. [sent-16, score-0.169]

12 I wanted a graph that showed how the key estimates were changing when we went multilevel, and what in the data was making the change. [sent-17, score-0.071]

13 We’re always giving these data-and-model stories of why when we control for variable X, our estimate changes on variable Y. [sent-22, score-0.232]

14 Or why our multilevel estimates are a compromise between something and something else. [sent-23, score-0.404]

15 What I’d like to do is to formalize and automate these explanations. [sent-24, score-0.268]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('coefplot', 0.268), ('defining', 0.264), ('multilevel', 0.256), ('predictive', 0.234), ('fitted', 0.23), ('inputs', 0.215), ('pooling', 0.2), ('automate', 0.185), ('partial', 0.178), ('average', 0.175), ('comparisons', 0.174), ('automated', 0.162), ('formulated', 0.159), ('input', 0.126), ('iain', 0.103), ('model', 0.101), ('predictors', 0.097), ('bglmer', 0.097), ('expansions', 0.097), ('fagan', 0.097), ('showing', 0.094), ('factors', 0.093), ('setting', 0.09), ('separating', 0.087), ('altered', 0.087), ('values', 0.085), ('variable', 0.084), ('formalize', 0.083), ('loose', 0.078), ('formulate', 0.078), ('generalizing', 0.078), ('models', 0.077), ('compromise', 0.077), ('understanding', 0.074), ('tricks', 0.073), ('sparse', 0.071), ('estimates', 0.071), ('automatic', 0.07), ('police', 0.068), ('displaying', 0.067), ('component', 0.067), ('nonlinear', 0.067), ('alternatives', 0.066), ('regression', 0.066), ('alex', 0.066), ('estimate', 0.064), ('averaging', 0.064), ('concepts', 0.064), ('implement', 0.064), ('clever', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

2 0.18147771 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

Introduction: After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth. I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output. P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that: - print() doesn’t give enough information - summary() gives everything to a zillion decimal places and gives useless things like p-values - plot() gives a bunch

3 0.17875426 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

Introduction: Somebody emailed me: I am a researcher at ** University and I have recently read your article on average predictive comparisons for statistical models published 2007 in the journal “Sociological Methodology”. Gelman, Andrew/Iain Pardoe. 2007. “Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components”. Sociological Methodology 37: 23-51. Currently I am working with multilevel models and find your approach very interesting and useful. May I ask you whether replication materials (e.g. R Code) for this article are available? I had to reply: Hi—I’m embarrassed to say that our R files are a mess! I had ideas of programming the approach more generally as an R package but this has not yet happened yet.

4 0.16817093 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?

Introduction: Subhadeep Mukhopadhyay writes: I am convinced of the power of hierarchical modeling and individual parameter pooling concept. I was wondering how could multi-level modeling could influence the estimate of grad mean (NOT individual label). My reply: Multilevel modeling will affect the estimate of the grand mean in two ways: 1. If the group-level mean is correlated with group size, then the partial pooling will change the estimate of the grand mean (and, indeed, you might want to include group size or some similar variable as a group-level predictor. 2. In any case, the extra error term(s) in a multilevel model will typically affect the standard error of everything, including the estimate of the grand mean.

5 0.15568332 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

6 0.15105774 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

7 0.15033802 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

8 0.14942257 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

9 0.14831406 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

10 0.14230445 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

11 0.13878308 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

12 0.13857521 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

13 0.13813356 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

14 0.1380938 1363 andrew gelman stats-2012-06-03-Question about predictive checks

15 0.13679206 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

16 0.1334568 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

17 0.13313554 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

18 0.13291049 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

19 0.1318993 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

20 0.13165145 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.19), (1, 0.147), (2, 0.058), (3, 0.023), (4, 0.146), (5, -0.022), (6, -0.061), (7, -0.048), (8, 0.096), (9, 0.116), (10, 0.037), (11, 0.038), (12, -0.04), (13, 0.013), (14, 0.022), (15, 0.0), (16, -0.016), (17, -0.04), (18, -0.006), (19, 0.009), (20, 0.0), (21, 0.008), (22, 0.011), (23, -0.014), (24, -0.029), (25, -0.086), (26, -0.034), (27, -0.022), (28, -0.035), (29, -0.016), (30, 0.006), (31, 0.027), (32, 0.033), (33, -0.019), (34, 0.03), (35, 0.001), (36, 0.051), (37, 0.012), (38, 0.017), (39, -0.029), (40, 0.007), (41, 0.041), (42, -0.012), (43, -0.05), (44, -0.012), (45, -0.018), (46, -0.028), (47, 0.066), (48, -0.028), (49, -0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97430193 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

2 0.81199354 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

3 0.81007236 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc

4 0.80403829 948 andrew gelman stats-2011-10-10-Combining data from many sources

Introduction: Mark Grote writes: I’d like to request general feedback and references for a problem of combining disparate data sources in a regression model. We’d like to model log crop yield as a function of environmental predictors, but the observations come from many data sources and are peculiarly structured. Among the issues are: 1. Measurement precision in predictors and outcome varies widely with data sources. Some observations are in very coarse units of measurement, due to rounding or even observer guesswork. 2. There are obvious clusters of observations arising from studies in which crop yields were monitored over successive years in spatially proximate communities. Thus some variables may be constant within clusters–this is true even for log yield, probably due to rounding of similar yields. 3. Cluster size and intra-cluster association structure (temporal, spatial or both) vary widely across the dataset. My [Grote's] intuition is that we can learn about central tendency

5 0.79347533 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

Introduction: Terence Teo writes: I was wondering if multilevel models can be used as an alternative to 2SLS or IV models to deal with (i) endogeneity and (ii) selection problems. More concretely, I am trying to assess the impact of investment treaties on foreign investment. Aside from the fact that foreign investment is correlated over time, it may be the case that countries that already receive sufficient amounts of foreign investment need not sign treaties, and countries that sign treaties are those that need foreign investment in the first place. Countries thus “select” into treatment; treaty signing is non-random. As such, I argue that to properly estimate the impact of treaties on investment, we must model the determinants of treaty signing. I [Teo] am currently modeling this as two separate models: (1) regress predictors on likelihood of treaty signing, (2) regress treaty (with interactions, etc) on investment (I’ve thought of using propensity score matching for this part of the model)

6 0.77925813 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

7 0.77341348 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

8 0.76399201 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

9 0.74544352 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

10 0.73844934 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

11 0.73797494 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

12 0.73374164 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

13 0.72761297 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

14 0.72459739 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

15 0.72230148 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

16 0.71981484 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

17 0.71533966 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

18 0.71502346 417 andrew gelman stats-2010-11-17-Clutering and variance components

19 0.7137844 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

20 0.71215767 851 andrew gelman stats-2011-08-12-year + (1|year)


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.03), (21, 0.013), (24, 0.056), (84, 0.014), (86, 0.025), (99, 0.73)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99958992 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

2 0.99920326 1434 andrew gelman stats-2012-07-29-FindTheData.org

Introduction: I received the following (unsolicited) email: Hi Andrew, I work on the business development team of FindTheData.org, an unbiased comparison engine founded by Kevin O’Connor (founder and former CEO of DoubleClick) and backed by Kleiner Perkins with ~10M unique visitors per month. We are working with large online publishers including Golf Digest, Huffington Post, Under30CEO, and offer a variety of options to integrate our highly engaging content with your site.  I believe our un-biased and reliable data resources would be of interest to you and your readers. I’d like to set up a quick call to discuss similar partnership ideas with you and would greatly appreciate 10 minutes of your time. Please suggest a couple times that work best for you or let me know if you would like me to send some more information before you make time for a call. Looking forward to hearing from you, Jonny – JONNY KINTZELE Business Development, FindThe Data mobile: 619-307-097

3 0.99873108 521 andrew gelman stats-2011-01-17-“the Tea Party’s ire, directed at Democrats and Republicans alike”

Introduction: Mark Lilla recalls some recent Barack Obama quotes and then writes : If this is the way the president and his party think about human psychology, it’s little wonder they’ve taken such a beating. In the spirit of that old line, “That and $4.95 will get you a tall latte,” let me agree with Lilla and attribute the Democrats’ losses in 2010 to the following three factors: 1. A poor understanding of human psychology; 2. The Democrats holding unified control of the presidency and congress with a large majority in both houses (factors that are historically associated with big midterm losses); and 3. A terrible economy. I will let you, the readers, make your best guesses as to the relative importance of factors 1, 2, and 3 above. Don’t get me wrong: I think psychology is important, as is the history of ideas (the main subject of Lilla’s article), and I’d hope that Obama (and also his colleagues in both parties in congress) can become better acquainted with psychology, moti

4 0.99866766 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti

5 0.99862963 1813 andrew gelman stats-2013-04-19-Grad students: Participate in an online survey on statistics education

Introduction: Joan Garfield, a leading researcher in statistics education, is conducting a survey of graduate students who teach or assist with the teaching of statistics. She writes: We want to invite them to take a short survey that will enable us to collect some baseline data that we may use in a grant proposal we are developing. The project would provide summer workshops and ongoing support for graduate students who will be teaching or assisting with teaching introductory statistics classes. If the grant is funded, we would invite up to 40 students from around the country who are entering graduate programs in statistics to participate in a three-year training and support program. The goal of this program is to help these students become expert and flexible teachers of statistics, and to support them as they move through their teaching experiences as graduate students. Here’s the the online survey . Garfield writes, “Your responses are completely voluntary and anonymous. Results w

6 0.99805802 589 andrew gelman stats-2011-02-24-On summarizing a noisy scatterplot with a single comparison of two points

7 0.99790537 1483 andrew gelman stats-2012-09-04-“Bestselling Author Caught Posting Positive Reviews of His Own Work on Amazon”

8 0.99774241 174 andrew gelman stats-2010-08-01-Literature and life

9 0.99772722 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

10 0.99761003 756 andrew gelman stats-2011-06-10-Christakis-Fowler update

11 0.99760514 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

12 0.99759388 180 andrew gelman stats-2010-08-03-Climate Change News

13 0.99755526 1431 andrew gelman stats-2012-07-27-Overfitting

14 0.99700034 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

15 0.99687499 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

16 0.99660897 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor

17 0.99646664 809 andrew gelman stats-2011-07-19-“One of the easiest ways to differentiate an economist from almost anyone else in society”

18 0.99616301 1952 andrew gelman stats-2013-07-23-Christakis response to my comment on his comments on social science (or just skip to the P.P.P.S. at the end)

19 0.99610806 638 andrew gelman stats-2011-03-30-More on the correlation between statistical and political ideology

20 0.99596423 860 andrew gelman stats-2011-08-18-Trolls!