andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1130 knowledge-graph by maker-knowledge-mining

1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries


meta infos for this blog

Source: html

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. [sent-1, score-0.993]

2 It’s a belief about where the decision boundary between classes should fall. [sent-3, score-0.88]

3 Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. [sent-4, score-2.036]

4 For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= . [sent-5, score-0.662]

5 5 and X is ‘far’ from grid lines, then d is large. [sent-6, score-0.246]

6 Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? [sent-7, score-0.204]

7 My real data consist of geocoded Craigslist’s postings that are labeled with the neighborhood claimed by the poster, and I have a strong belief that decision boundaries separating neighborhoods (as classes) should fall along streets, railroad embankments, parks, and the river. [sent-8, score-1.376]

8 My reply: This reminds me of some models in spatial statistics such as conditional autoregressions, where you can specify a reasonable prior distribution over the space of possible parameter values, but the prior doesn’t have any simple normalized form. [sent-9, score-1.004]

9 Essentially, you’re setting up a prior by penalizing certain configurations. [sent-10, score-0.431]

10 If you label all configurations as theta, you set up a penalty function g(theta), then your prior is p(theta) proportional to exp(-beta*g(theta)), where beta is a hyperprior that expresses how strongly the penalty function is expressed. [sent-11, score-1.291]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('prior', 0.315), ('theta', 0.282), ('boundary', 0.249), ('grid', 0.246), ('belief', 0.244), ('decision', 0.199), ('classes', 0.188), ('fall', 0.178), ('logistic', 0.178), ('penalty', 0.156), ('tx', 0.116), ('separates', 0.116), ('autoregressions', 0.116), ('geocoded', 0.116), ('penalizing', 0.116), ('hyperprior', 0.109), ('avenue', 0.109), ('craigslist', 0.109), ('normalized', 0.109), ('space', 0.109), ('normalizing', 0.105), ('classifier', 0.105), ('configurations', 0.105), ('function', 0.102), ('consist', 0.101), ('streets', 0.098), ('postings', 0.098), ('separating', 0.098), ('parks', 0.098), ('exp', 0.098), ('notions', 0.095), ('parameterized', 0.095), ('forest', 0.093), ('expresses', 0.093), ('boundaries', 0.091), ('poster', 0.09), ('neighborhoods', 0.09), ('gregg', 0.087), ('neighborhood', 0.085), ('pursue', 0.085), ('lay', 0.082), ('specify', 0.079), ('beta', 0.078), ('spatial', 0.077), ('incorporate', 0.077), ('along', 0.076), ('pr', 0.076), ('anywhere', 0.076), ('proportional', 0.075), ('regression', 0.074)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

2 0.29608259 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

3 0.24271917 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

4 0.23023266 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

Introduction: Some recent blog discussion revealed some confusion that I’ll try to resolve here. I wrote that I’m not a big fan of subjective priors. Various commenters had difficulty with this point, and I think the issue was most clearly stated by Bill Jeff re erys, who wrote : It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not? But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example. I think I see what Bill is getting at. A prior represents your subjective belief, or some approximation to your subjective belief, even if it’s not perfect. That sounds reasonable but I don’t think it works. Or, at least, it often doesn’t work. Let’s start

5 0.22466742 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

6 0.19947655 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

7 0.19677442 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

8 0.18360917 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

9 0.16987744 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

10 0.16760486 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

11 0.16486976 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

12 0.15724953 247 andrew gelman stats-2010-09-01-How does Bayes do it?

13 0.14461356 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

14 0.14427963 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

15 0.1430555 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

16 0.13890378 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

17 0.13617662 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

18 0.13579439 899 andrew gelman stats-2011-09-10-The statistical significance filter

19 0.13246539 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

20 0.12699585 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.13), (1, 0.173), (2, 0.027), (3, 0.069), (4, -0.019), (5, -0.072), (6, 0.141), (7, 0.043), (8, -0.156), (9, 0.055), (10, -0.006), (11, 0.015), (12, 0.044), (13, -0.001), (14, -0.036), (15, -0.03), (16, -0.02), (17, -0.015), (18, 0.033), (19, -0.036), (20, 0.041), (21, -0.004), (22, 0.012), (23, -0.056), (24, 0.023), (25, 0.006), (26, 0.049), (27, -0.062), (28, 0.003), (29, 0.012), (30, 0.017), (31, 0.001), (32, -0.045), (33, -0.0), (34, -0.028), (35, -0.022), (36, 0.018), (37, 0.031), (38, -0.055), (39, 0.036), (40, 0.039), (41, 0.069), (42, -0.092), (43, -0.024), (44, 0.015), (45, 0.016), (46, 0.044), (47, 0.033), (48, -0.03), (49, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97757357 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

2 0.82687902 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

3 0.81870943 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

4 0.81168866 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili

5 0.80787814 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

Introduction: A student writes: I have a question about an earlier recommendation of yours on the election of the prior distribution for the precision hyperparameter of a normal distribution, and a reference for the recommendation. If I recall correctly I have read that you have suggested to use Gamma(1.4, 0.4) instead of Gamma(0.01,0.01) for the prior distribution of the precision hyper parameter of a normal distribution. I would very much appreciate if you would have the time to point me to this publication of yours. The reason is that I have used the prior distribution (Gamma(1.4, 0.4)) in a study which we now revise for publication, and where a reviewer question the choice of the distribution (claiming that it is too informative!). I am well aware of that you in recent publications (Prior distributions for variance parameters in hierarchical models. Bayesian Analysis; Data Analysis using regression and multilevel/hierarchical models) suggest to model the precision as pow(standard deviatio

6 0.80581796 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

7 0.80422568 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

8 0.79465336 442 andrew gelman stats-2010-12-01-bayesglm in Stata?

9 0.78954464 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

10 0.78622174 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

11 0.77253771 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

12 0.75430346 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

13 0.75303715 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

14 0.7454707 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

15 0.73048031 1465 andrew gelman stats-2012-08-21-D. Buggin

16 0.72686678 247 andrew gelman stats-2010-09-01-How does Bayes do it?

17 0.72668552 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

18 0.71852601 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

19 0.7163409 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

20 0.70901924 1792 andrew gelman stats-2013-04-07-X on JLP


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.01), (16, 0.052), (21, 0.058), (24, 0.246), (28, 0.01), (31, 0.01), (32, 0.012), (35, 0.169), (36, 0.012), (37, 0.012), (40, 0.018), (42, 0.012), (43, 0.011), (55, 0.011), (58, 0.011), (65, 0.017), (68, 0.024), (70, 0.011), (77, 0.012), (80, 0.011), (99, 0.162)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92950153 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

2 0.88533378 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

Introduction: Bruce McCullough writes: The Sept 2009 issue of Wired had a big article on the increase in the placebo effect, and why it’s been getting bigger. Kaiser Fung has a synopsis . As if you don’t have enough to do, I thought you might be interested in blogging on this. My reply: I thought Kaiser’s discussion was good, especially this point: Effect on treatment group = Effect of the drug + effect of belief in being treated Effect on placebo group = Effect of belief in being treated Thus, the difference between the two groups = effect of the drug, since the effect of belief in being treated affects both groups of patients. Thus, as Kaiser puts it, if the treatment isn’t doing better than placebo, it doesn’t say that the placebo effect is big (let alone “too big”) but that the treatment isn’t showing any additional effect. It’s “treatment + placebo” vs. placebo, not treatment vs. placebo. That said, I’d prefer for Kaiser to make it clear that the additivity he’s assu

3 0.85317487 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

Introduction: Andre de Boer writes: In my profession as a risk manager I encountered this graph: I can’t figure out what kind of regression this is, would you be so kind to enlighten me? The points represent (maturity,yield) of bonds. My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything. Does anyone have any ideas?

4 0.85235775 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking

Introduction: People keep pointing me to this excellent news article by David Brown, about a scientist who was convicted of data manipulation: In all, 330 patients were randomly assigned to get either interferon gamma-1b or placebo injections. Disease progression or death occurred in 46 percent of those on the drug and 52 percent of those on placebo. That was not a significant difference, statistically speaking. When only survival was considered, however, the drug looked better: 10 percent of people getting the drug died, compared with 17 percent of those on placebo. However, that difference wasn’t “statistically significant,” either. Specifically, the so-called P value — a mathematical measure of the strength of the evidence that there’s a true difference between a treatment and placebo — was 0.08. . . . Technically, the study was a bust, although the results leaned toward a benefit from interferon gamma-1b. Was there a group of patients in which the results tipped? Harkonen asked the statis

5 0.84956884 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

Introduction: Medical researchers care about main effects, psychologists care about interactions. In psychology, the main effects are typically obvious, and it’s only the interactions that are worth studying.

6 0.84667754 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

7 0.8466506 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

8 0.84584832 1706 andrew gelman stats-2013-02-04-Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

9 0.84511977 1479 andrew gelman stats-2012-09-01-Mothers and Moms

10 0.8450433 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism

11 0.84392381 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

12 0.84353983 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense

13 0.84268433 938 andrew gelman stats-2011-10-03-Comparing prediction errors

14 0.8421936 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

15 0.84182489 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

16 0.84136331 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

17 0.84065121 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

18 0.8403281 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!

19 0.84027177 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

20 0.83738756 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census