andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1462 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i
sentIndex sentText sentNum sentScore
1 candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. [sent-3, score-2.047]
2 Here is my question: If you code the binary input as -1 and 1, do you then standardize it? [sent-4, score-1.21]
3 This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. [sent-5, score-1.383]
4 I know that if you code the binary input as 0 and 1 it should not be standardized. [sent-6, score-0.701]
5 Also, I am not interested in the actual units (i. [sent-7, score-0.173]
6 mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. [sent-9, score-0.608]
7 Would it make sense to standardize the response variable also? [sent-10, score-0.842]
8 My reply: No, I don’t standardize the binary input. [sent-11, score-0.865]
9 The point of standardizing inputs is to make the coefs directly interpretable, but with binary inputs the interpretation is already clear, since there is only one possible comparison. [sent-12, score-1.768]
10 Unless the analysis is on the log scale (as in an elasticity model), in which case, again, the coefs are already directly interpretable. [sent-14, score-0.65]
wordName wordTfidf (topN-words)
[('standardize', 0.509), ('binary', 0.356), ('inputs', 0.355), ('sd', 0.263), ('input', 0.234), ('coefs', 0.229), ('interpretable', 0.209), ('standardized', 0.184), ('response', 0.129), ('zoology', 0.127), ('standardizing', 0.12), ('elasticity', 0.115), ('code', 0.111), ('variable', 0.103), ('scaled', 0.1), ('directly', 0.096), ('coded', 0.095), ('scaling', 0.094), ('andy', 0.092), ('deviations', 0.088), ('stating', 0.087), ('units', 0.084), ('already', 0.081), ('log', 0.075), ('mean', 0.072), ('candidate', 0.064), ('continuous', 0.064), ('unless', 0.061), ('sense', 0.06), ('interpretation', 0.06), ('compare', 0.058), ('scale', 0.054), ('actual', 0.051), ('zero', 0.051), ('different', 0.05), ('also', 0.05), ('variables', 0.046), ('couple', 0.043), ('yes', 0.042), ('make', 0.041), ('reading', 0.041), ('regression', 0.041), ('standard', 0.04), ('possible', 0.039), ('clear', 0.038), ('interested', 0.038), ('reply', 0.037), ('since', 0.036), ('found', 0.035), ('post', 0.033)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i
2 0.17525181 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a
Introduction: I was just reading an old post and came across this example which I’d like to share with you again: Here’s a story of R-squared = 1%. Consider a 0/1 outcome with about half the people in each category. For.example, half the people with some disease die in a year and half live. Now suppose there’s a treatment that increases survival rate from 50% to 60%. The unexplained sd is 0.5 and the explained sd is 0.05, hence R-squared is 0.01.
4 0.1321805 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model
Introduction: Elissa Brown writes: I’m working on some data using a multinomial model (3 categories for the response & 2 predictors-1 continuous and 1 binary), and I’ve been looking and looking for some sort of nice graphical way to show my model at work. Something like a predicted probabilities plot. I know you can do this for the levels of Y with just one covariate, but is this still a valid way to describe the multinomial model (just doing a pred plot for each covariate)? What’s the deal, is there really no way to graphically represent a successful multinomial model? Also, is it unreasonable to break down your model into a binary response just to get some ROC curves? This seems like cheating. From what I’ve found so far, it seems that people just avoid graphical support when discussing their fitted multinomial models. My reply: It’s hard for me to think about this sort of thing in the abstract with no context. We do have one example in chapter 6 of ARM where we display data and fitted m
5 0.12855703 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice
6 0.11326243 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables
8 0.10696086 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks
9 0.10504217 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
10 0.096885934 2163 andrew gelman stats-2014-01-08-How to display multinominal logit results graphically?
11 0.092817254 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models
12 0.086421706 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model
13 0.085405029 14 andrew gelman stats-2010-05-01-Imputing count data
14 0.084697433 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
15 0.084681734 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model
16 0.083810858 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance
17 0.083556592 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points
18 0.083101951 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools
20 0.080939993 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
topicId topicWeight
[(0, 0.093), (1, 0.054), (2, 0.04), (3, -0.002), (4, 0.074), (5, -0.01), (6, 0.023), (7, -0.037), (8, 0.033), (9, 0.045), (10, 0.023), (11, 0.024), (12, -0.003), (13, -0.034), (14, 0.008), (15, 0.039), (16, -0.002), (17, -0.014), (18, -0.006), (19, 0.019), (20, 0.02), (21, 0.046), (22, 0.017), (23, -0.016), (24, -0.008), (25, 0.007), (26, 0.016), (27, -0.031), (28, -0.012), (29, -0.038), (30, 0.032), (31, 0.047), (32, 0.012), (33, 0.026), (34, 0.024), (35, -0.04), (36, 0.024), (37, 0.012), (38, -0.023), (39, 0.003), (40, 0.003), (41, -0.034), (42, 0.012), (43, 0.001), (44, -0.024), (45, 0.026), (46, 0.003), (47, -0.002), (48, 0.019), (49, 0.036)]
simIndex simValue blogId blogTitle
same-blog 1 0.96011138 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i
2 0.78160429 14 andrew gelman stats-2010-05-01-Imputing count data
Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.
3 0.7695545 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
Introduction: Andrew Eppig writes: I’m a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I’m used to much weaker correlations in the social sciences. To me, it’s a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology. So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlatio
4 0.76511937 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
Introduction: Fred Schiff writes: I’m writing to you to ask about the “R-squared” approximation procedure you suggest in your 2004 book with Dr. Hill. [See also this paper with Pardoe---ed.] I’m a media sociologist at the University of Houston. I’ve been using HLM3 for about two years. Briefly about my data. It’s a content analysis of news stories with a continuous scale dependent variable, story prominence. I have 6090 news stories, 114 newspapers, and 59 newspaper group owners. All the Level-1, Level-2 and dependent variables have been standardized. Since the means were zero anyway, we left the variables uncentered. All the Level-3 ownership groups and characteristics are dichotomous scales that were left uncentered. PROBLEM: The single most important result I am looking for is to compare the strength of nine competing Level-1 variables in their ability to predict and explain the outcome variable, story prominence. We are trying to use the residuals to calculate a “R-squ
5 0.76362628 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a
6 0.76269859 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression
7 0.75031149 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health
8 0.74381483 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?
10 0.73767054 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
11 0.73163581 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation
12 0.71241832 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making
14 0.70158958 684 andrew gelman stats-2011-04-28-Hierarchical ordered logit or probit
15 0.69358057 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model
16 0.69163126 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters
17 0.68904328 753 andrew gelman stats-2011-06-09-Allowing interaction terms to vary
18 0.68496603 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference
19 0.681431 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients
20 0.68128645 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c
topicId topicWeight
[(9, 0.025), (11, 0.162), (16, 0.048), (18, 0.043), (24, 0.08), (52, 0.042), (76, 0.03), (84, 0.034), (87, 0.016), (89, 0.023), (93, 0.021), (99, 0.341)]
simIndex simValue blogId blogTitle
same-blog 1 0.96062142 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i
2 0.93575096 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys
Introduction: We had 28 class periods, so I wrote an exam with an approximate correspondence of one question per class. Rather than dumping the exam in your lap all at once, I’ll post the questions once per day. Then each day I’ll post the answer to yesterday’s questions. So it will be 29 days in all. I’ll post them to appear late in the day so as not to interfere with our main daily posts (which are currently backed up to early June). The course was offered in the political science department and covered a mix of statistical and political topics. Followers of our recent discussion on test questions won’t be surprised to learn that some of the questions are ambiguous. This wasn’t on purpose. I tried my best, but good questions are hard to write. Question 1 will appear tomorrow.
Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan
4 0.93137366 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?
Introduction: Dave Kane writes: I [Kane] am involved in a dispute relating to whether or not a blog can be considered part of one’s academic writing. Williams College restricts the use of undergraduate theses as follows: Non-commercial, academic use within the scope of “Fair Use” standards is acceptable. Otherwise, you may not copy or distribute any content without the permission of the copyright holder. Seems obvious enough. Yet some folks think that my use of thesis material in a blog post fails this test because it is not “academic.” See this post for the gory details. Parenthetically, your readers might be interested in the substantive discovery here, the details of the Williams admissions process (which is probably very similar to Columbia’s). Williams places students into academic rating (AR) categories as follows: verbal math composite SAT II ACT AP AR 1: 770-800 750-800 1520-1600 750-800 35-36 mostly 5s AR 2: 730-770 720-750 1450-1520 720-770 33-34 4s an
5 0.91992766 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”
Introduction: This came in the spam the other day: College Station, TX–August 16, 2010–Change and hope were central themes to the November 2008 U.S. presidential election. A new longitudinal study published in the September issue of Social Science Quarterly analyzes suicide rates at a state level from 1981ï¼2005 and determines that presidential election outcomes directly influence suicide rates among voters. In states where the majority of voters supported the national election winner suicide rates decreased. However, counter-intuitively, suicide rates decreased even more dramatically in states where the majority of voters supported the election loser (4.6 percent lower for males and 5.3 lower for females). This article is the first in its field to focus on candidate and state-specific outcomes in relation to suicide rates. Prior research on this topic focused on whether the election process itself influenced suicide rates, and found that suicide rates fell during the election season. Ric
6 0.91753978 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge
7 0.91737944 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates
8 0.91250134 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability
10 0.90732062 598 andrew gelman stats-2011-03-03-Is Harvard hurting poor kids by cutting tuition for the upper middle class?
11 0.90395224 2175 andrew gelman stats-2014-01-18-A course in sample surveys for political science
12 0.89861906 2328 andrew gelman stats-2014-05-10-What property is important in a risk prediction model? Discrimination or calibration?
14 0.89488125 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics
15 0.89369684 1193 andrew gelman stats-2012-03-03-“Do you guys pay your bills?”
16 0.8936196 2320 andrew gelman stats-2014-05-05-On deck this month
17 0.89285541 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
18 0.89194989 2066 andrew gelman stats-2013-10-17-G+ hangout for test run of BDA course
19 0.89167833 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2
20 0.8908304 1917 andrew gelman stats-2013-06-28-Econ coauthorship update