andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1815 knowledge-graph by maker-knowledge-mining

1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

meta infos for this blog

Source: html

Introduction: David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend any references and/or examples that would suggest what results to report in what format? I have found no references and scant examples of reporting such results in tables. My reply: I think graphs are the way to go. I don’t have any immediate ideas beyond what’s in the book with Jennifer. I think this is an important area of research.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. [sent-1, score-1.355]

2 The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. [sent-2, score-1.093]

3 I used hierarchical logistic regression (SAS glimmix) to analyze the data. [sent-3, score-0.55]

4 Can you please recommend any references and/or examples that would suggest what results to report in what format? [sent-5, score-1.084]

5 I have found no references and scant examples of reporting such results in tables. [sent-6, score-0.943]

6 I don’t have any immediate ideas beyond what’s in the book with Jennifer. [sent-8, score-0.428]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('references', 0.29), ('completing', 0.268), ('reporting', 0.249), ('area', 0.233), ('doctoral', 0.22), ('adverse', 0.215), ('dissertation', 0.215), ('williams', 0.207), ('sas', 0.203), ('faced', 0.194), ('examples', 0.183), ('birth', 0.179), ('format', 0.177), ('level', 0.174), ('immediate', 0.172), ('dealing', 0.165), ('results', 0.148), ('analyze', 0.148), ('logistic', 0.137), ('risk', 0.125), ('complex', 0.124), ('factors', 0.122), ('recommend', 0.116), ('hierarchical', 0.115), ('suggest', 0.113), ('please', 0.11), ('graphs', 0.099), ('beyond', 0.098), ('individual', 0.098), ('david', 0.098), ('variables', 0.098), ('report', 0.094), ('modeling', 0.093), ('ideas', 0.088), ('regression', 0.086), ('reply', 0.078), ('found', 0.073), ('book', 0.07), ('important', 0.068), ('models', 0.067), ('used', 0.064), ('think', 0.058), ('research', 0.053), ('way', 0.042), ('writes', 0.04), ('would', 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

2 0.13789958 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

Introduction: Yi-Chun Ou writes: I am using a multilevel model with three levels. I read that you wrote a book about multilevel models, and wonder if you can solve the following question. The data structure is like this: Level one: customer (8444 customers) Level two: companys (90 companies) Level three: industry (17 industries) I use 6 level-three variables (i.e. industry characteristics) to explain the variance of the level-one effect across industries. The question here is whether there is an over-fitting problem since there are only 17 industries. I understand that this must be a problem for non-multilevel models, but is it also a problem for multilevel models? My reply: Yes, this could be a problem. I’d suggest combining some of your variables into a common score, or using only some of the variables, or using strong priors to control the inferences. This is an interesting and important area of statistics research, to do this sort of thing systematically. There’s lots o

3 0.13428955 373 andrew gelman stats-2010-10-27-It’s better than being forwarded the latest works of you-know-who

Introduction: In the inbox today: From Jimmy. From Kieran. The relevant references are here and, of course, here .

4 0.10761113 83 andrew gelman stats-2010-06-13-Silly Sas lays out old-fashioned statistical thinking

Introduction: People keep telling me that Sas isn’t as bad as everybody says, but then I see (from Christian Robert ) this listing from the Sas website of “disadvantages in using Bayesian analysis”: There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results. . . . From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior. That is so tacky! As if least squares, logistic regressions, Cox models, and all those other likelihoods mentioned in the Sas documentation are so automatically convincing to subject matter experts. P.S. For some more serious objections to Bayesian statistics, see here and here . P.P.S. In case you’re wondering why I’m commenting on month-old blog entries . . . I have a monthlong backlog of entries, and I’m spooling

5 0.10546523 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship

Introduction: Following my talk on infovis and statistical graphics at the Empirical Legal Studies conference , Dan Kahan writes: The legal academy, which is making strides toward sensible integration of a variety of empirical methods into its scholarship, is horribly ignorant of the utility of graphic reporting of data, a likely influence of the formative influence that econometric methods has exerted on expectations and habits of mind among legal scholars. Lee Epstein has written a pair of wonderful articles on graphic reporting – 1. Epstein, L., Martin, A. & Boyd, C. On the Effective Communication of the Results of Empirical Studies, Part II. Vand. L. Rev. 60, 798-846 (2007). 2. Epstein, L., Martin, A. & Schneider, M. On the Effective Communication of the Results of Empirical Studies, Part I. Vand. L. Rev. 59, 1811-1871 (2007). – but her efforts haven’t gotten the attention they deserve, and reinforcement, particularly at a venue like CELS is very important. But the main issue there

6 0.1035248 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

7 0.10292824 249 andrew gelman stats-2010-09-01-References on predicting elections

8 0.10167956 33 andrew gelman stats-2010-05-14-Felix Salmon wins the American Statistical Association’s Excellence in Statistical Reporting Award

9 0.10094103 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

10 0.099379569 1986 andrew gelman stats-2013-08-17-Somebody’s looking for a book on time series analysis in the style of Angrist and Pischke, or Gelman and Hill

11 0.09589155 1655 andrew gelman stats-2013-01-05-The statistics software signal

12 0.09438628 1661 andrew gelman stats-2013-01-08-Software is as software does

13 0.088838145 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

14 0.088008121 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

15 0.08800292 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

16 0.086507492 1364 andrew gelman stats-2012-06-04-Massive confusion about a study that purports to show that exercise may increase heart risk

17 0.085513286 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

18 0.08493343 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?

19 0.083794817 313 andrew gelman stats-2010-10-03-A question for psychometricians

20 0.078440011 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.125), (1, 0.044), (2, 0.003), (3, -0.014), (4, 0.067), (5, 0.014), (6, -0.041), (7, -0.035), (8, 0.012), (9, 0.103), (10, 0.034), (11, -0.009), (12, 0.036), (13, 0.014), (14, 0.057), (15, 0.03), (16, -0.021), (17, -0.002), (18, 0.04), (19, -0.013), (20, -0.002), (21, 0.051), (22, 0.015), (23, 0.005), (24, 0.008), (25, -0.021), (26, 0.056), (27, -0.044), (28, -0.004), (29, -0.011), (30, -0.031), (31, 0.033), (32, 0.011), (33, 0.008), (34, -0.02), (35, -0.01), (36, 0.007), (37, 0.031), (38, 0.009), (39, 0.006), (40, -0.035), (41, -0.02), (42, 0.006), (43, 0.015), (44, 0.015), (45, -0.016), (46, -0.032), (47, 0.037), (48, -0.017), (49, -0.039)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97837812 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

2 0.76201272 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

3 0.74210793 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

Introduction: Denis Cote writes: Just read this today and my unsophisticated statistical mind is confused. “Initial bivariate analyses suggest that union membership is actually associated with worse health. This association disappears when controlling for demographics, then reverses and becomes significant when controlling for labor market characteristics.” From my education about statistics, I remember to be suspicious about multiple regression coefficients that are in the opposite direction of the bivariate coefficients. What I am missing? I vaguely remember something about the suppression effect. My reply: There’s a long literature on this from many decades ago. My general feeling about such situations is that, when the coefficient changes a lot after controlling for other variables, it is important to visualize this change, to understand what is the interaction among variables that is associated with the change in the coefficients. This is what we did in our Red State Blue State

4 0.73850173 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

Introduction: Fred Schiff writes: I’m writing to you to ask about the “R-squared” approximation procedure you suggest in your 2004 book with Dr. Hill. [See also this paper with Pardoe---ed.] I’m a media sociologist at the University of Houston. I’ve been using HLM3 for about two years. Briefly about my data. It’s a content analysis of news stories with a continuous scale dependent variable, story prominence. I have 6090 news stories, 114 newspapers, and 59 newspaper group owners. All the Level-1, Level-2 and dependent variables have been standardized. Since the means were zero anyway, we left the variables uncentered. All the Level-3 ownership groups and characteristics are dichotomous scales that were left uncentered. PROBLEM: The single most important result I am looking for is to compare the strength of nine competing Level-1 variables in their ability to predict and explain the outcome variable, story prominence. We are trying to use the residuals to calculate a “R-squ

5 0.72461456 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of

6 0.71428102 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

7 0.70503163 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

8 0.70074052 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

9 0.69972342 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

10 0.69551927 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

11 0.68559581 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

12 0.68050784 2296 andrew gelman stats-2014-04-19-Index or indicator variables

13 0.67767733 271 andrew gelman stats-2010-09-12-GLM – exposure

14 0.67522001 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

15 0.66378731 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

16 0.65739536 14 andrew gelman stats-2010-05-01-Imputing count data

17 0.65693414 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

18 0.65470284 1445 andrew gelman stats-2012-08-06-Slow progress

19 0.64902711 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

20 0.64323741 948 andrew gelman stats-2011-10-10-Combining data from many sources

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.019), (13, 0.013), (15, 0.029), (16, 0.07), (24, 0.128), (32, 0.026), (43, 0.119), (76, 0.032), (82, 0.03), (99, 0.408)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98537505 1253 andrew gelman stats-2012-04-08-Technology speedup graph

Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/ , which misidentified the source (not “visual economics”; visualizingeconomics .com , which attributes it to Nicholas Felton , who apparently condensed this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th

2 0.98353434 1347 andrew gelman stats-2012-05-27-Macromuddle

Introduction: More and more I feel like economics reporting is based on crude principles of adding up “good news” and “bad news.” Sometimes this makes sense: by almost any measure, an unemployment rate of 10% is bad news compared to an unemployment rate of 5%. Other times, though, the good/bad news framework seems so tangled. For example: house prices up is considered good news but inflation is considered bad news. A strong dollar is considered good news but it’s also an unfavorable exchange rate, which is bad news. When facebook shares go down, that’s bad news, but if they automatically go up, that means they were underpriced which doesn’t seem so good either. Pundits are torn between rooting for the euro to fail (which means our team (the U.S.) is better than Europe (their team)) and rooting for it to survive (because a collapse in Europe is bad news for the U.S. economy). China’s economy doing well is bad news—but if their economy slips, that’s bad news too. I think you get the picture

same-blog 3 0.98298824 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

4 0.97737288 601 andrew gelman stats-2011-03-05-Against double-blind reviewing: Political science and statistics are not like biology and physics

Introduction: Responding to a proposal to move the journal Political Analysis from double-blind to single-blind reviewing (that is, authors would not know who is reviewing their papers but reviewers would know the authors’ names), Tom Palfrey writes: I agree with the editors’ recommendation. I have served on quite a few editorial boards of journals with different blinding policies, and have seen no evidence that double blind procedures are a useful way to improve the quality of articles published in a journal. Aside from the obvious administrative nuisance and the fact that authorship anonymity is a thing of the past in our discipline, the theoretical and empirical arguments in both directions lead to an ambiguous conclusion. Also keep in mind that the editors know the identity of the authors (they need to know for practical reasons), their identity is not hidden from authors, and ultimately it is they who make the accept/reject decision, and also lobby their friends and colleagues to submit “the

5 0.97479683 857 andrew gelman stats-2011-08-17-Bayes pays

Introduction: George Leckie writes: The Centre for Multilevel Modelling at the University of Bristol is seeking to appoint an applied statistician to work on a new ESRC-funded project, Longitudinal Effects, Multilevel Modelling and Applications (LEMMA 3). LEMMA 3 is one of six Nodes of the National Centre for Research Methods (NCRM). The LEMMA 3 Node will focus on methods for the analysis of longitudinal data. The appointment, at Research Assistant or Research Associate level, will be for 2.5 years with likelihood of extension to the end of September 2014. For further details, including information on how to apply online, please go to http://www.bris.ac.uk/boris/jobs/feeds/ads?ID=100571 By “modelling,” I think he means “modeling.” And by “centre,” I think he means “center.” But I think you get the basic idea. It looks like a great place to do research.

6 0.97417438 70 andrew gelman stats-2010-06-07-Mister P goes on a date

7 0.97204757 1707 andrew gelman stats-2013-02-05-Glenn Hubbard and I were on opposite sides of a court case and I didn’t even know it!

8 0.97159141 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools

9 0.97123045 1860 andrew gelman stats-2013-05-17-How can statisticians help psychologists do their research better?

10 0.96985656 1882 andrew gelman stats-2013-06-03-The statistical properties of smart chains (and referral chains more generally)

11 0.96982354 75 andrew gelman stats-2010-06-08-“Is the cyber mob a threat to freedom?”

12 0.96655566 1458 andrew gelman stats-2012-08-14-1.5 million people were told that extreme conservatives are happier than political moderates. Approximately .0001 million Americans learned that the opposite is true.

13 0.96446049 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

14 0.96430218 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities

15 0.96057791 22 andrew gelman stats-2010-05-07-Jenny Davidson wins Mark Van Doren Award, also some reflections on the continuity of work within literary criticism or statistics

16 0.95955789 110 andrew gelman stats-2010-06-26-Philosophy and the practice of Bayesian statistics

17 0.95809996 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research

18 0.95790225 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research

19 0.95762807 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

20 0.95683384 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health