andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-547 knowledge-graph by maker-knowledge-mining

547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution


meta infos for this blog

Source: html

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Mike McLaughlin writes: Consider the Seeds example in vol. [sent-1, score-0.06]

2 There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. [sent-3, score-0.747]

3 What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? [sent-4, score-1.522]

4 In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? [sent-5, score-0.538]

5 I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. [sent-6, score-0.812]

6 But this seems to feed the data back into the prior. [sent-7, score-0.204]

7 It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). [sent-9, score-0.8]

8 My reply: Strictly speaking, “n” is data, and so what you want is a likelihood function p(y,n|theta), where theta represents all the parameters in the model. [sent-10, score-0.9]

9 In a binomial-type example, it would make sense to factor the likelihood as p(y|n,theta)*p(n|theta). [sent-11, score-0.324]

10 Or, to make this even clearer: p(y|n,theta_1)*p(n|theta_2), where theta_1 are the parameters of the binomial distribution (or whatever generalization you’re using) and theta_2 are the parameters involving n. [sent-12, score-0.951]

11 In any case, the next step is the prior distribution, p(theta_1,theta_2). [sent-14, score-0.247]

12 Prior dependence between theta_1 and theta_2 induces a model of the form that you’re talking about. [sent-15, score-0.445]

13 In practice, I think it can be reasonable to simplify a bit and write p(y|n,theta) and then use a prior of the form p(theta|n). [sent-16, score-0.443]

14 We discuss this sort of thing in the first or second section of the regression chapter in BDA. [sent-17, score-0.053]

15 Whether you treat n as data to be modeled or data to be conditioned on, either way you can put dependence with theta into the model. [sent-18, score-1.018]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('binomial', 0.428), ('theta', 0.369), ('mclaughlin', 0.328), ('seeds', 0.266), ('prior', 0.247), ('likelihood', 0.214), ('dependence', 0.169), ('logit', 0.153), ('parameters', 0.146), ('commonality', 0.121), ('data', 0.115), ('mitigate', 0.114), ('induces', 0.114), ('function', 0.107), ('simplify', 0.106), ('conditioned', 0.106), ('vectors', 0.1), ('form', 0.09), ('feed', 0.089), ('formulation', 0.087), ('distribution', 0.084), ('constructed', 0.084), ('generalization', 0.084), ('strictly', 0.082), ('beta', 0.082), ('flexible', 0.079), ('occurs', 0.079), ('modeled', 0.077), ('clearer', 0.077), ('legitimate', 0.073), ('bugs', 0.073), ('model', 0.072), ('mike', 0.071), ('corresponding', 0.069), ('treat', 0.067), ('observations', 0.067), ('represents', 0.064), ('involving', 0.063), ('therefore', 0.061), ('example', 0.06), ('speaking', 0.06), ('factor', 0.057), ('wondering', 0.057), ('currently', 0.056), ('allow', 0.056), ('amount', 0.056), ('would', 0.053), ('via', 0.053), ('section', 0.053), ('parameter', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

2 0.31629992 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

3 0.25436729 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

4 0.22466742 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

5 0.22426358 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

6 0.21642388 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

7 0.19914573 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

8 0.19747934 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

9 0.18168378 696 andrew gelman stats-2011-05-04-Whassup with glm()?

10 0.18123178 398 andrew gelman stats-2010-11-06-Quote of the day

11 0.18072797 899 andrew gelman stats-2011-09-10-The statistical significance filter

12 0.18053688 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

13 0.17923489 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

14 0.17623186 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

15 0.16892265 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

16 0.16605541 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

17 0.15907985 833 andrew gelman stats-2011-07-31-Untunable Metropolis

18 0.15127753 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

19 0.15121856 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

20 0.14953956 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.168), (1, 0.209), (2, 0.034), (3, 0.068), (4, 0.007), (5, -0.068), (6, 0.157), (7, -0.003), (8, -0.112), (9, 0.028), (10, -0.002), (11, 0.017), (12, 0.03), (13, -0.055), (14, -0.053), (15, -0.005), (16, -0.029), (17, -0.028), (18, 0.033), (19, -0.051), (20, 0.074), (21, -0.004), (22, 0.022), (23, -0.06), (24, -0.001), (25, 0.025), (26, 0.006), (27, -0.025), (28, 0.073), (29, 0.044), (30, -0.019), (31, 0.012), (32, -0.051), (33, 0.04), (34, -0.011), (35, 0.046), (36, -0.022), (37, 0.001), (38, -0.061), (39, 0.069), (40, 0.073), (41, 0.068), (42, -0.091), (43, -0.025), (44, 0.014), (45, 0.011), (46, 0.096), (47, 0.018), (48, -0.014), (49, 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94845021 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

2 0.90257215 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit

3 0.88478845 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

4 0.81845915 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio

5 0.81727874 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

6 0.79577351 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world

7 0.78190309 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

8 0.78164482 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

9 0.75967646 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

10 0.75948226 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

11 0.75600201 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

12 0.75409079 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

13 0.74537951 1465 andrew gelman stats-2012-08-21-D. Buggin

14 0.72347307 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

15 0.72126919 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

16 0.71786624 398 andrew gelman stats-2010-11-06-Quote of the day

17 0.70849425 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

18 0.681027 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

19 0.67616159 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

20 0.67414898 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.016), (16, 0.076), (21, 0.02), (24, 0.269), (30, 0.01), (41, 0.012), (53, 0.253), (74, 0.014), (86, 0.022), (89, 0.038), (99, 0.154)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.87425339 1589 andrew gelman stats-2012-11-25-Life as a blogger: the emails just get weirder and weirder

Introduction: In the email the other day, subject line “Casting blogger, writer, journalist to host cable series”: Hi there Andrew, I’m casting a male journalist, writer, blogger, documentary filmmaker or comedian with a certain type personality for a television pilot along with production company, Pipeline39. See below: A certain type of character – no cockiness, no ego, a person who is smart, savvy, dry humor, but someone who isn’t imposing, who can infiltrate these organizations. This person will be hosting his own show and covering alternative lifestyles and secret societies around the world. If you’re interested in hearing more or would like to be considered for this project, please email me a photo and a bio of yourself, along with contact information. I’ll respond to you ASAP. I’m looking forward to hearing from you. *** Casting Producer (646) ***.**** ***@gmail.com I was with them until I got to the “no ego” part. . . . Also, I don’t think I could infiltrate any org

same-blog 2 0.87334824 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

3 0.84917754 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

Introduction: I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write : The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. We can actually now fit Gaussian processes in Stan . But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.) GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whip

4 0.82012737 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

Introduction: Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern? I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions. Soren shot back: I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest n

5 0.80664903 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data

Introduction: Jelte Wicherts writes: I thought you might be interested in reading this paper that is to appear this week in PLoS ONE. In it we [Wicherts, Marjan Bakker, and Dylan Molenaar] show that the willingness to share data from published psychological research is associated both with “the strength of the evidence” (against H0) and the prevalence of errors in the reporting of p-values. The issue of data archiving will likely be put on the agenda of granting bodies and the APA/APS because of what Diederik Stapel did . I hate hate hate hate hate when people don’t share their data. In fact, that’s the subject of my very first column on ethics for Chance magazine. I have a story from 22 years ago, when I contacted some scientists and showed them how I could reanalyze their data more efficiently (based on a preliminary analysis of their published summary statistics). They seemed to feel threatened by the suggestion and refused to send me their raw data. (It was an animal experiment

6 0.80571449 298 andrew gelman stats-2010-09-27-Who is that masked person: The use of face masks on Mexico City public transportation during the Influenza A (H1N1) outbreak

7 0.79990792 1905 andrew gelman stats-2013-06-18-There are no fat sprinters

8 0.799649 1677 andrew gelman stats-2013-01-16-Greenland is one tough town

9 0.79341495 687 andrew gelman stats-2011-04-29-Zero is zero

10 0.78965712 1555 andrew gelman stats-2012-10-31-Social scientists who use medical analogies to explain causal inference are, I think, implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes

11 0.78536367 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

12 0.78446376 413 andrew gelman stats-2010-11-14-Statistics of food consumption

13 0.78203619 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

14 0.78085989 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

15 0.77721548 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

16 0.7746762 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave

17 0.77383411 795 andrew gelman stats-2011-07-10-Aleks says this is the future of visualization

18 0.77026373 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

19 0.76873827 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

20 0.76765692 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism