andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1815 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend any references and/or examples that would suggest what results to report in what format? I have found no references and scant examples of reporting such results in tables. My reply: I think graphs are the way to go. I don’t have any immediate ideas beyond what’s in the book with Jennifer. I think this is an important area of research.
sentIndex sentText sentNum sentScore
1 David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. [sent-1, score-1.355]
2 The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. [sent-2, score-1.093]
3 I used hierarchical logistic regression (SAS glimmix) to analyze the data. [sent-3, score-0.55]
4 Can you please recommend any references and/or examples that would suggest what results to report in what format? [sent-5, score-1.084]
5 I have found no references and scant examples of reporting such results in tables. [sent-6, score-0.943]
6 I don’t have any immediate ideas beyond what’s in the book with Jennifer. [sent-8, score-0.428]
wordName wordTfidf (topN-words)
[('references', 0.29), ('completing', 0.268), ('reporting', 0.249), ('area', 0.233), ('doctoral', 0.22), ('adverse', 0.215), ('dissertation', 0.215), ('williams', 0.207), ('sas', 0.203), ('faced', 0.194), ('examples', 0.183), ('birth', 0.179), ('format', 0.177), ('level', 0.174), ('immediate', 0.172), ('dealing', 0.165), ('results', 0.148), ('analyze', 0.148), ('logistic', 0.137), ('risk', 0.125), ('complex', 0.124), ('factors', 0.122), ('recommend', 0.116), ('hierarchical', 0.115), ('suggest', 0.113), ('please', 0.11), ('graphs', 0.099), ('beyond', 0.098), ('individual', 0.098), ('david', 0.098), ('variables', 0.098), ('report', 0.094), ('modeling', 0.093), ('ideas', 0.088), ('regression', 0.086), ('reply', 0.078), ('found', 0.073), ('book', 0.07), ('important', 0.068), ('models', 0.067), ('used', 0.064), ('think', 0.058), ('research', 0.053), ('way', 0.042), ('writes', 0.04), ('would', 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
Introduction: David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend any references and/or examples that would suggest what results to report in what format? I have found no references and scant examples of reporting such results in tables. My reply: I think graphs are the way to go. I don’t have any immediate ideas beyond what’s in the book with Jennifer. I think this is an important area of research.
2 0.13789958 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o
3 0.13428955 373 andrew gelman stats-2010-10-27-It’s better than being forwarded the latest works of you-know-who
Introduction: In the inbox today: From Jimmy. From Kieran. The relevant references are here and, of course, here .
4 0.10761113 83 andrew gelman stats-2010-06-13-Silly Sas lays out old-fashioned statistical thinking
Introduction: People keep telling me that Sas isn’t as bad as everybody says, but then I see (from Christian Robert ) this listing from the Sas website of “disadvantages in using Bayesian analysis”: There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results. . . . From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior. That is so tacky! As if least squares, logistic regressions, Cox models, and all those other likelihoods mentioned in the Sas documentation are so automatically convincing to subject matter experts. P.S. For some more serious objections to Bayesian statistics, see here and here . P.P.S. In case you’re wondering why I’m commenting on month-old blog entries . . . I have a monthlong backlog of entries, and I’m spooling
5 0.10546523 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship
Introduction: Following my talk on infovis and statistical graphics at the Empirical Legal Studies conference , Dan Kahan writes: The legal academy, which is making strides toward sensible integration of a variety of empirical methods into its scholarship, is horribly ignorant of the utility of graphic reporting of data, a likely influence of the formative influence that econometric methods has exerted on expectations and habits of mind among legal scholars. Lee Epstein has written a pair of wonderful articles on graphic reporting – 1. Epstein, L., Martin, A. & Boyd, C. On the Effective Communication of the Results of Empirical Studies, Part II. Vand. L. Rev. 60, 798-846 (2007). 2. Epstein, L., Martin, A. & Schneider, M. On the Effective Communication of the Results of Empirical Studies, Part I. Vand. L. Rev. 59, 1811-1871 (2007). – but her efforts haven’t gotten the attention they deserve, and reinforcement, particularly at a venue like CELS is very important. But the main issue there
7 0.10292824 249 andrew gelman stats-2010-09-01-References on predicting elections
11 0.09589155 1655 andrew gelman stats-2013-01-05-The statistics software signal
12 0.09438628 1661 andrew gelman stats-2013-01-08-Software is as software does
15 0.08800292 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.
16 0.086507492 1364 andrew gelman stats-2012-06-04-Massive confusion about a study that purports to show that exercise may increase heart risk
17 0.085513286 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
18 0.08493343 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?
19 0.083794817 313 andrew gelman stats-2010-10-03-A question for psychometricians
20 0.078440011 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
topicId topicWeight
[(0, 0.125), (1, 0.044), (2, 0.003), (3, -0.014), (4, 0.067), (5, 0.014), (6, -0.041), (7, -0.035), (8, 0.012), (9, 0.103), (10, 0.034), (11, -0.009), (12, 0.036), (13, 0.014), (14, 0.057), (15, 0.03), (16, -0.021), (17, -0.002), (18, 0.04), (19, -0.013), (20, -0.002), (21, 0.051), (22, 0.015), (23, 0.005), (24, 0.008), (25, -0.021), (26, 0.056), (27, -0.044), (28, -0.004), (29, -0.011), (30, -0.031), (31, 0.033), (32, 0.011), (33, 0.008), (34, -0.02), (35, -0.01), (36, 0.007), (37, 0.031), (38, 0.009), (39, 0.006), (40, -0.035), (41, -0.02), (42, 0.006), (43, 0.015), (44, 0.015), (45, -0.016), (46, -0.032), (47, 0.037), (48, -0.017), (49, -0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.97837812 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
Introduction: David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend any references and/or examples that would suggest what results to report in what format? I have found no references and scant examples of reporting such results in tables. My reply: I think graphs are the way to go. I don’t have any immediate ideas beyond what’s in the book with Jennifer. I think this is an important area of research.
2 0.76201272 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?
Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o
Introduction: Denis Cote writes: Just read this today and my unsophisticated statistical mind is confused. “Initial bivariate analyses suggest that union membership is actually associated with worse health. This association disappears when controlling for demographics, then reverses and becomes significant when controlling for labor market characteristics.” From my education about statistics, I remember to be suspicious about multiple regression coefficients that are in the opposite direction of the bivariate coefficients. What I am missing? I vaguely remember something about the suppression effect. My reply: There’s a long literature on this from many decades ago. My general feeling about such situations is that, when the coefficient changes a lot after controlling for other variables, it is important to visualize this change, to understand what is the interaction among variables that is associated with the change in the coefficients. This is what we did in our Red State Blue State
4 0.73850173 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
Introduction: Fred Schiff writes: I’m writing to you to ask about the “R-squared” approximation procedure you suggest in your 2004 book with Dr. Hill. [See also this paper with Pardoe---ed.] I’m a media sociologist at the University of Houston. I’ve been using HLM3 for about two years. Briefly about my data. It’s a content analysis of news stories with a continuous scale dependent variable, story prominence. I have 6090 news stories, 114 newspapers, and 59 newspaper group owners. All the Level-1, Level-2 and dependent variables have been standardized. Since the means were zero anyway, we left the variables uncentered. All the Level-3 ownership groups and characteristics are dichotomous scales that were left uncentered. PROBLEM: The single most important result I am looking for is to compare the strength of nine competing Level-1 variables in their ability to predict and explain the outcome variable, story prominence. We are trying to use the residuals to calculate a “R-squ
Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of
6 0.71428102 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients
7 0.70503163 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression
8 0.70074052 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis
9 0.69972342 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance
10 0.69551927 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc
12 0.68050784 2296 andrew gelman stats-2014-04-19-Index or indicator variables
13 0.67767733 271 andrew gelman stats-2010-09-12-GLM – exposure
14 0.67522001 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable
15 0.66378731 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?
16 0.65739536 14 andrew gelman stats-2010-05-01-Imputing count data
17 0.65693414 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together
18 0.65470284 1445 andrew gelman stats-2012-08-06-Slow progress
19 0.64902711 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?
20 0.64323741 948 andrew gelman stats-2011-10-10-Combining data from many sources
topicId topicWeight
[(11, 0.019), (13, 0.013), (15, 0.029), (16, 0.07), (24, 0.128), (32, 0.026), (43, 0.119), (76, 0.032), (82, 0.03), (99, 0.408)]
simIndex simValue blogId blogTitle
1 0.98537505 1253 andrew gelman stats-2012-04-08-Technology speedup graph
Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/ , which misidentified the source (not “visual economics”; visualizingeconomics .com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th
2 0.98353434 1347 andrew gelman stats-2012-05-27-Macromuddle
Introduction: More and more I feel like economics reporting is based on crude principles of adding up “good news” and “bad news.” Sometimes this makes sense: by almost any measure, an unemployment rate of 10% is bad news compared to an unemployment rate of 5%. Other times, though, the good/bad news framework seems so tangled. For example: house prices up is considered good news but inflation is considered bad news. A strong dollar is considered good news but it’s also an unfavorable exchange rate, which is bad news. When facebook shares go down, that’s bad news, but if they automatically go up, that means they were underpriced which doesn’t seem so good either. Pundits are torn between rooting for the euro to fail (which means our team (the U.S.) is better than Europe (their team)) and rooting for it to survive (because a collapse in Europe is bad news for the U.S. economy). China’s economy doing well is bad news—but if their economy slips, that’s bad news too. I think you get the picture
same-blog 3 0.98298824 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
Introduction: David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend any references and/or examples that would suggest what results to report in what format? I have found no references and scant examples of reporting such results in tables. My reply: I think graphs are the way to go. I don’t have any immediate ideas beyond what’s in the book with Jennifer. I think this is an important area of research.
Introduction: Responding to a proposal to move the journal Political Analysis from double-blind to single-blind reviewing (that is, authors would not know who is reviewing their papers but reviewers would know the authors’ names), Tom Palfrey writes: I agree with the editors’ recommendation. I have served on quite a few editorial boards of journals with different blinding policies, and have seen no evidence that double blind procedures are a useful way to improve the quality of articles published in a journal. Aside from the obvious administrative nuisance and the fact that authorship anonymity is a thing of the past in our discipline, the theoretical and empirical arguments in both directions lead to an ambiguous conclusion. Also keep in mind that the editors know the identity of the authors (they need to know for practical reasons), their identity is not hidden from authors, and ultimately it is they who make the accept/reject decision, and also lobby their friends and colleagues to submit “the
5 0.97479683 857 andrew gelman stats-2011-08-17-Bayes pays
Introduction: George Leckie writes: The Centre for Multilevel Modelling at the University of Bristol is seeking to appoint an applied statistician to work on a new ESRC-funded project, Longitudinal Effects, Multilevel Modelling and Applications (LEMMA 3). LEMMA 3 is one of six Nodes of the National Centre for Research Methods (NCRM). The LEMMA 3 Node will focus on methods for the analysis of longitudinal data. The appointment, at Research Assistant or Research Associate level, will be for 2.5 years with likelihood of extension to the end of September 2014. For further details, including information on how to apply online, please go to http://www.bris.ac.uk/boris/jobs/feeds/ads?ID=100571 By “modelling,” I think he means “modeling.” And by “centre,” I think he means “center.” But I think you get the basic idea. It looks like a great place to do research.
6 0.97417438 70 andrew gelman stats-2010-06-07-Mister P goes on a date
8 0.97159141 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools
9 0.97123045 1860 andrew gelman stats-2013-05-17-How can statisticians help psychologists do their research better?
10 0.96985656 1882 andrew gelman stats-2013-06-03-The statistical properties of smart chains (and referral chains more generally)
11 0.96982354 75 andrew gelman stats-2010-06-08-“Is the cyber mob a threat to freedom?”
14 0.96430218 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities
16 0.95955789 110 andrew gelman stats-2010-06-26-Philosophy and the practice of Bayesian statistics
17 0.95809996 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research
18 0.95790225 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research
19 0.95762807 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?
20 0.95683384 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health