andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1346 knowledge-graph by maker-knowledge-mining

1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables


meta infos for this blog

Source: html

Introduction: Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. Naively, I would like to be able to plug this into the APC framework. For example, equation 5 in your paper might look something like this (for brevity, I’m omitting the summations): Numerator: w_ij * (E(y|u1_j, u2_j, v_i, the


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). [sent-1, score-0.41]

2 At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. [sent-2, score-0.34]

3 A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. [sent-3, score-0.327]

4 For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. [sent-4, score-0.532]

5 Naively, I would like to be able to plug this into the APC framework. [sent-5, score-0.094]

6 My questions are – Is such an approach valid within the APC framework? [sent-8, score-0.131]

7 Do you see any obvious (technical) issues that this naïve approach ignores? [sent-9, score-0.071]

8 Would this be expected to work for a general function f with multiple inputs of interest? [sent-10, score-0.329]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('apc', 0.627), ('sum', 0.293), ('inputs', 0.232), ('theta', 0.202), ('average', 0.188), ('sign', 0.116), ('interest', 0.114), ('framework', 0.113), ('brevity', 0.111), ('occupancy', 0.111), ('association', 0.107), ('pardoe', 0.1), ('omitting', 0.1), ('function', 0.097), ('bird', 0.097), ('comparisons', 0.094), ('plug', 0.094), ('numerator', 0.091), ('oregon', 0.091), ('seattle', 0.089), ('forest', 0.089), ('ignores', 0.087), ('jones', 0.087), ('combinations', 0.082), ('coast', 0.082), ('naively', 0.082), ('calculating', 0.078), ('denominator', 0.077), ('paths', 0.075), ('approach', 0.071), ('equation', 0.07), ('came', 0.07), ('jay', 0.069), ('input', 0.068), ('collected', 0.066), ('model', 0.066), ('intended', 0.062), ('study', 0.062), ('na', 0.062), ('cover', 0.061), ('valid', 0.06), ('moment', 0.06), ('steps', 0.059), ('types', 0.058), ('near', 0.058), ('dataset', 0.058), ('interpret', 0.058), ('look', 0.057), ('depends', 0.057), ('continuous', 0.056)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

Introduction: Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. Naively, I would like to be able to plug this into the APC framework. For example, equation 5 in your paper might look something like this (for brevity, I’m omitting the summations): Numerator: w_ij * (E(y|u1_j, u2_j, v_i, the

2 0.15033802 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

3 0.13668701 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

4 0.12919596 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

5 0.11866073 899 andrew gelman stats-2011-09-10-The statistical significance filter

Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o

6 0.11326243 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

7 0.10833534 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!

8 0.10770792 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

9 0.10655305 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

10 0.10494737 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

11 0.098185644 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

12 0.095573165 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

13 0.093647547 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

14 0.091402233 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum

15 0.090737388 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

16 0.088373765 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

17 0.086059034 99 andrew gelman stats-2010-06-19-Paired comparisons

18 0.08530321 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

19 0.084519394 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

20 0.082807943 1476 andrew gelman stats-2012-08-30-Stan is fast


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.139), (1, 0.08), (2, 0.036), (3, -0.021), (4, 0.036), (5, -0.003), (6, 0.003), (7, -0.012), (8, 0.009), (9, 0.006), (10, -0.009), (11, 0.015), (12, -0.013), (13, -0.021), (14, -0.025), (15, 0.005), (16, 0.016), (17, -0.02), (18, -0.003), (19, -0.013), (20, 0.033), (21, 0.027), (22, 0.041), (23, -0.046), (24, 0.037), (25, 0.003), (26, -0.027), (27, -0.01), (28, 0.04), (29, 0.012), (30, 0.014), (31, 0.015), (32, 0.003), (33, 0.011), (34, 0.027), (35, 0.018), (36, 0.043), (37, 0.021), (38, -0.039), (39, 0.004), (40, 0.061), (41, 0.036), (42, -0.035), (43, -0.019), (44, -0.054), (45, -0.029), (46, 0.036), (47, 0.042), (48, -0.003), (49, 0.007)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93939817 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

Introduction: Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. Naively, I would like to be able to plug this into the APC framework. For example, equation 5 in your paper might look something like this (for brevity, I’m omitting the summations): Numerator: w_ij * (E(y|u1_j, u2_j, v_i, the

2 0.79412353 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.74872786 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets

Introduction: Jeremy Fox asks what I think about this paper by David N. Reshef, Yakir Reshef, Hilary Finucane, Sharon Grossman, Gilean McVean, Peter Turnbaugh, Eric Lander, Michael Mitzenmacher, and Pardis Sabeti which proposes a new nonlinear R-squared-like measure. My quick answer is that it looks really cool! From my quick reading of the paper, it appears that the method reduces on average to the usual R-squared when fit to data of the form y = a + bx + error, and that it also has a similar interpretation when “a + bx” is replaced by other continuous functions. Unlike R-squared, the method of Reshef et al. depends on a tuning parameter that controls the level of discretization, in a “How long is the coast of Britain” sort of way. The dependence on scale is inevitable for such a general method. Just consider: if you sample 1000 points from the unit bivariate normal distribution, (x,y) ~ N(0,I), you’ll be able to fit them perfectly by a 999-degree polynomial fit to the data. So the sca

4 0.73134232 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

5 0.72781116 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

Introduction: Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured for each student as scores derived from averaging a student’s responses on particular subsets of survey questions. (I suppose it is not uncontroversial to treat these scores as interval measures and would be interested to hear if you have any objections.) I am comparing genders on each construct. Researchers typically use t-tests to do so. To use a Bayesian approach I applied the programs written in R and JAGS by John Kruschke for estimating the difference of means: http://www.indiana.edu/~kruschke/BEST/ An issue in that analysis is that the distributions of student scores are not normal. There was skewness in some of the distributions and not always in

6 0.69685048 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

7 0.69625229 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

8 0.69073421 1918 andrew gelman stats-2013-06-29-Going negative

9 0.68852586 1881 andrew gelman stats-2013-06-03-Boot

10 0.68142217 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

11 0.67969173 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

12 0.67838335 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

13 0.67641687 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures

14 0.67511815 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

15 0.67158616 729 andrew gelman stats-2011-05-24-Deviance as a difference

16 0.66777253 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

17 0.66765302 804 andrew gelman stats-2011-07-15-Static sensitivity analysis

18 0.66629678 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

19 0.66565752 938 andrew gelman stats-2011-10-03-Comparing prediction errors

20 0.66357338 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.022), (5, 0.013), (15, 0.046), (16, 0.07), (21, 0.014), (24, 0.132), (39, 0.012), (41, 0.011), (52, 0.013), (53, 0.012), (55, 0.012), (70, 0.125), (79, 0.01), (86, 0.03), (96, 0.012), (99, 0.343)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98448896 1979 andrew gelman stats-2013-08-13-Convincing Evidence

Introduction: Keith O’Rourke and I wrote an article that begins: Textbooks on statistics emphasize care and precision, via concepts such as reliability and validity in measurement, random sampling and treatment assignment in data collection, and causal identification and bias in estimation. But how do researchers decide what to believe and what to trust when choosing which statistical methods to use? How do they decide the credibility of methods? Statisticians and statistical practitioners seem to rely on a sense of anecdotal evidence based on personal experience and on the attitudes of trusted colleagues. Authorship, reputation, and past experience are thus central to decisions about statistical procedures. It’s for a volume on theoretical or methodological research on authorship, functional roles, reputation, and credibility in social media, edited by Sorin Matei and Elisa Bertino.

2 0.97943074 1329 andrew gelman stats-2012-05-18-Those mean psychologists, making fun of dodgy research!

Introduction: Two people separately sent me this amusing mock-research paper by Brian A. Nosek (I assume that’s what’s meant by “Arina K. Bones”). The article is pretty funny, but this poster (by Nosek and Samuel Gosling) is even better! Check it out: I remarked that this was almost as good as my zombies paper, and my correspondent pointed me to this page of (I assume) Nosek’s research on aliens. P.S. I clicked through to take the test to see if I’m dead or alive, but I got bored after a few minutes. I gotta say, if Gosling can come up with a 10-item measure of the Big Five, this crew should be able to come up with a reasonably valid alive-or-dead test that doesn’t require dozens and dozens of questions!

3 0.97011095 1657 andrew gelman stats-2013-01-06-Lee Nguyen Tran Kim Song Shimazaki

Introduction: Andrew Lee writes: I am a recent M.A. graduate in sociology. I am primarily qualitative in method but have been moving in a more mixed-methods direction ever since I discovered sports analytics (Moneyball, Football Outsiders, Wages of Wins, etc.). For my thesis I studied Korean-Americans in education in the health professions through a comparison of Asian ethnic representation in Los Angeles-area medical and dental schools. I did this by counting up different Asian ethnic groups at UC Irvine, USC and Loma Linda University’s medical/dental schools using surnames as an identifier (I coded for ethnicity using an algorithm from the North American Association of Central Cancer Registries which correlated surnames with ethnicity: http://www.naaccr.org/Research/DataAnalysisTools.aspx). The coding was mostly easy, since “Nguyen” and “Tran” is always Vietnamese, “Kim” and “Song” is Korean, “Shimazaki” is Japanese, etc. Now, the first time around I found that Chinese-Americans and

4 0.96920508 116 andrew gelman stats-2010-06-29-How to grab power in a democracy – in 5 easy non-violent steps

Introduction: In the past decades violent means of grabbing power have been discredited and internationally regulated. Still, grabbing power is as desired as it has always been, and I’d like to introduce some new methods used today: Establish your base of power by achieving a critical mass (75%+) within a group with a high barrier to entry . Examples of barriers to entry: genetics (familiar ties, skin, eye color, hair type – takes 2+ generations to enter), religion (takes 2-10 years to enter), language (very hard to enter after the age of 10). Encourage your followers to have many children – because of common ethical concerns, other groups will help you bring them up. Control the system of indoctrination , such as religious schooling, government-based educational system, entertainment, popular culture – limiting the loss of children to out-group (only needed for non-genetic barriers to entry). Wait 18 years for your followers’ children to become eligible to vote. Win elections by

same-blog 5 0.96847194 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

Introduction: Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. Naively, I would like to be able to plug this into the APC framework. For example, equation 5 in your paper might look something like this (for brevity, I’m omitting the summations): Numerator: w_ij * (E(y|u1_j, u2_j, v_i, the

6 0.96259081 1097 andrew gelman stats-2012-01-03-Libertarians in Space

7 0.95490366 982 andrew gelman stats-2011-10-30-“There’s at least as much as an 80 percent chance . . .”

8 0.94668537 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?

9 0.94664866 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist

10 0.94628108 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions

11 0.94489098 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

12 0.94453406 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?

13 0.94414175 2006 andrew gelman stats-2013-09-03-Evaluating evidence from published research

14 0.94370019 1163 andrew gelman stats-2012-02-12-Meta-analysis, game theory, and incentives to do replicable research

15 0.94216454 2245 andrew gelman stats-2014-03-12-More on publishing in journals

16 0.94204581 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

17 0.94201809 1435 andrew gelman stats-2012-07-30-Retracted articles and unethical behavior in economics journals?

18 0.94152993 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

19 0.9415279 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man

20 0.94146121 1529 andrew gelman stats-2012-10-11-Bayesian brains?