andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1284 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). These probability estimates are the dependent variable of my experiment. My goal is to model the data with a (hierarchical) Bayesian regression. A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. The tau as precision is another free parameter. y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). However, until now nobody was able to tell me how I can correctly model probability data. My reply: You can take the logit of the data before analyzing them. That is assuming there
sentIndex sentText sentNum sentScore
1 Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). [sent-1, score-0.778]
2 These probability estimates are the dependent variable of my experiment. [sent-2, score-0.389]
3 My goal is to model the data with a (hierarchical) Bayesian regression. [sent-3, score-0.226]
4 A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. [sent-4, score-0.9]
5 y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). [sent-6, score-0.897]
6 However, until now nobody was able to tell me how I can correctly model probability data. [sent-7, score-0.538]
7 My reply: You can take the logit of the data before analyzing them. [sent-8, score-0.225]
8 That is assuming there are no stated probabilities of 0 and 1. [sent-9, score-0.211]
9 In any case you should graph the data and fitted model to see if there are problems. [sent-13, score-0.294]
wordName wordTfidf (topN-words)
[('subj', 0.721), ('mu', 0.333), ('tau', 0.222), ('probability', 0.208), ('rafael', 0.124), ('forecaster', 0.124), ('normal', 0.122), ('huber', 0.114), ('trader', 0.114), ('dnorm', 0.111), ('model', 0.104), ('defines', 0.097), ('round', 0.091), ('odds', 0.085), ('logit', 0.083), ('equation', 0.083), ('dependent', 0.083), ('stated', 0.082), ('precision', 0.08), ('log', 0.077), ('fitted', 0.074), ('conducted', 0.073), ('correctly', 0.072), ('data', 0.071), ('analyzing', 0.071), ('event', 0.071), ('subjects', 0.071), ('limited', 0.07), ('information', 0.069), ('probabilities', 0.066), ('assuming', 0.063), ('presented', 0.061), ('experiment', 0.06), ('nobody', 0.059), ('linear', 0.058), ('hierarchical', 0.057), ('variable', 0.053), ('certain', 0.053), ('correct', 0.052), ('goal', 0.051), ('tell', 0.049), ('asked', 0.049), ('free', 0.048), ('able', 0.046), ('error', 0.046), ('distribution', 0.045), ('graph', 0.045), ('estimates', 0.045), ('however', 0.044), ('believe', 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1284 andrew gelman stats-2012-04-26-Modeling probability data
Introduction: Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). These probability estimates are the dependent variable of my experiment. My goal is to model the data with a (hierarchical) Bayesian regression. A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. The tau as precision is another free parameter. y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). However, until now nobody was able to tell me how I can correctly model probability data. My reply: You can take the logit of the data before analyzing them. That is assuming there
2 0.20141551 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
Introduction: David Rossell writes: A friend pointed out that you were having an interesting philosophical discussion on my paper with Val Johnson [on Bayesian model selection in high-dimensional settings]. I agree with the view that in almost all practical situations the true model is not in the set under consideration. Still, asking a model choice procedure to be able to pick up the correct model when it is in the set under consideration seems a minimal requirement (though perhaps not sufficient). In other words, if a procedure is unable to pick the data-generating model even when it is one of the models under consideration, I don’t have high hopes for it working well in more realistic scenarios either. Most results in the history in statistics seem to have been obtained under an assumed model, e.g. why even do MLE or penalized-likelihood if we don’t trust the model. While unrealistic, these results were useful to help understand important basic principles. In our case Val and I are defe
3 0.11667202 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
Introduction: I received the following email: I have an interesting thought on a prior for a logistic regression, and would love your input on how to make it “work.” Some of my research, two published papers, are on mathematical models of **. Along those lines, I’m interested in developing more models for **. . . . Empirical studies show that the public is rather smart and that the wisdom-of-the-crowd is fairly accurate. So, my thought would be to tread the public’s probability of the event as a prior, and then see how adding data, through a model, would change or perturb our inferred probability of **. (Similarly, I could envision using previously published epidemiological research as a prior probability of a disease, and then seeing how the addition of new testing protocols would update that belief.) However, everything I learned about hierarchical Bayesian models has a prior as a distribution on the coefficients. I don’t know how to start with a prior point estimate for the probabili
4 0.11629598 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis
Introduction: Peter Huber’s most famous work derives from his paper on robust statistics published nearly fifty years ago in which he introduced the concept of M-estimation (a generalization of maximum likelihood) to unify some ideas of Tukey and others for estimation procedures that were relatively insensitive to small departures from the assumed model. Huber has in many ways been ahead of his time. While remaining connected to the theoretical ideas from the early part of his career, his interests have shifted to computational and graphical statistics. I never took Huber’s class on data analysis–he left Harvard while I was still in graduate school–but fortunately I have an opportunity to learn his lessons now, as he has just released a book, “Data Analysis: What Can Be Learned from the Past 50 Years.” The book puts together a few articles published in the past 15 years, along with some new material. Many of the examples are decades old, which is appropriate given that Huber is reviewing f
5 0.11444741 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
Introduction: I had the following email exchange with a reader of Bayesian Data Analysis. My correspondent wrote: Exercise 1(b) involves evaluating the normal pdf at a single point. But p(Y=y|mu,sigma) = 0 (and is not simply N(y|mu,sigma)), since the normal distribution is continuous. So it seems that part (b) of the exercise is inappropriate. The solution does actually evaluate the probability as the value of the pdf at the single point, which is wrong. The probabilities should all be 0, so the answer to (b) is undefined. I replied: The pdf is the probability density function, which for a continuous distribution is defined as the derivative of the cumulative density function. The notation in BDA is rigorous but we do not spell out all the details, so I can see how confusion is possible. My correspondent: I agree that the pdf is the derivative of the cdf. But to compute P(a .lt. Y .lt. b) for a continuous distribution (with support in the real line) requires integrating over t
6 0.11419167 234 andrew gelman stats-2010-08-25-Modeling constrained parameters
7 0.09475968 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
9 0.088345118 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
10 0.085737407 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)
11 0.085427433 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
12 0.081667602 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
13 0.08058919 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
15 0.080183491 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
16 0.080146112 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits
17 0.080124818 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”
18 0.079060204 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
19 0.075563461 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
20 0.075481795 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.
topicId topicWeight
[(0, 0.103), (1, 0.12), (2, 0.029), (3, 0.021), (4, 0.024), (5, -0.014), (6, 0.01), (7, 0.006), (8, 0.016), (9, -0.027), (10, 0.009), (11, 0.015), (12, -0.039), (13, -0.014), (14, -0.056), (15, 0.01), (16, 0.032), (17, 0.006), (18, 0.011), (19, -0.028), (20, 0.015), (21, 0.043), (22, -0.007), (23, -0.041), (24, -0.007), (25, 0.02), (26, 0.002), (27, 0.01), (28, -0.011), (29, -0.041), (30, -0.029), (31, 0.032), (32, -0.036), (33, 0.027), (34, -0.042), (35, -0.01), (36, 0.017), (37, -0.024), (38, -0.032), (39, 0.001), (40, -0.006), (41, -0.026), (42, 0.016), (43, -0.035), (44, -0.013), (45, 0.021), (46, 0.028), (47, 0.033), (48, -0.032), (49, 0.001)]
simIndex simValue blogId blogTitle
same-blog 1 0.94714296 1284 andrew gelman stats-2012-04-26-Modeling probability data
Introduction: Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). These probability estimates are the dependent variable of my experiment. My goal is to model the data with a (hierarchical) Bayesian regression. A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. The tau as precision is another free parameter. y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). However, until now nobody was able to tell me how I can correctly model probability data. My reply: You can take the logit of the data before analyzing them. That is assuming there
Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor
Introduction: Someone who wants to remain anonymous writes: I am working to create a more accurate in-game win probability model for basketball games. My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. This problem would seem to fit a multi-level model structure well. It seems silly to estimate 2,000 regressions (one for each timestep), but the coefficients should vary at each timestep. Do you have suggestions for what type of model this could/would be? Additionally, I believe this needs to be some form of logit/probit given the binary dependent variable (win or loss). Finally, do you have suggestions for what package could accomplish this in Stata or R? To answer the questions in reverse order: 3. I’d hope this could be done in Stan (which can be run from R)
4 0.77759683 2342 andrew gelman stats-2014-05-21-Models with constraints
Introduction: I had an interesting conversation with Aki about monotonicity constraints. We were discussing a particular set of Gaussian processes that we were fitting to the arsenic well-switching data (the example from the logistic regression chapter in my book with Jennifer) but some more general issues arose that I thought might interest you. The idea was to fit a model where the response (the logit probability of switching wells) was constrained to be monotonically increasing in your current arsenic level and monotonically decreasing in your current distance to the closest safe well. These constraints seem reasonable enough, but when we actually fit the model we found that doing Bayesian inference with the constraint pulled the estimate, not just toward monotonicity, but to a strong increase (for the increasing relation) or a strong decrease (for the decreasing relation). This makes sense from a statistical standpoint because if you restrict a parameter to be nonnegative, any posterior dis
5 0.77459133 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
6 0.74401551 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits
7 0.74162883 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
8 0.73571557 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
9 0.73303443 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?
10 0.71573699 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values
11 0.71141732 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2
12 0.70965302 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
13 0.70716983 1363 andrew gelman stats-2012-06-03-Question about predictive checks
14 0.6960277 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings
15 0.6936574 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes
18 0.68046659 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
19 0.67903334 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making
20 0.67233485 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc
topicId topicWeight
[(13, 0.037), (16, 0.054), (21, 0.05), (24, 0.076), (27, 0.017), (36, 0.02), (63, 0.015), (66, 0.014), (68, 0.189), (80, 0.015), (84, 0.069), (86, 0.011), (96, 0.02), (99, 0.273)]
simIndex simValue blogId blogTitle
same-blog 1 0.92265266 1284 andrew gelman stats-2012-04-26-Modeling probability data
Introduction: Rafael Huber writes: I conducted an experiment in which subjects where asked to estimate the probability of a certain event given a number of information (like a wheater forecaster or a stockmarket trader). These probability estimates are the dependent variable of my experiment. My goal is to model the data with a (hierarchical) Bayesian regression. A linear equation with all the presented information (quantified as log odds) defines the mu of a normal likelihood. The tau as precision is another free parameter. y[r] ~ dnorm( mu[r] , tau[ subj[r] ] ) mu[r] <- b0[ subj[r] ] + b1[ subj[r] ] * x1[r] + b2[ subj[r] ] * x2[r] + b3[ subj[r] ] * x3[r] My problem is that I do not believe that the normal is the correct probability distribution to model probability data (‌ because the error is limited). However, until now nobody was able to tell me how I can correctly model probability data. My reply: You can take the logit of the data before analyzing them. That is assuming there
2 0.91919082 913 andrew gelman stats-2011-09-16-Groundhog day in August?
Introduction: A colleague writes: Due to my similar interest in plagiarism , I went to The Human Cultural and Social Landscape session. [The recipient of the American Statistical Association's Founders Award in 2002] gave the first talk in the session instead of Yasmin Said, which was modestly attended (20 or so people) and gave a sociology talk with no numbers — and no attribution to where these ideas (on Afghanistan culture) came from. Would it really have hurt to give the source of this? I’m on board with plain laziness for this one. I think he may have mentioned a number of his collaborators at the beginning, and all he talked about were cultural customs and backgrounds, no science to speak of. It’s kind of amazing to me that he actually showed up at JSM, but of course if he had any shame, he wouldn’t have repeatedly stolen copied without proper attribution in the first place. It’s not even like Doris Kearns Goodwin who reportedly produced a well-written book out of it!
3 0.91885507 877 andrew gelman stats-2011-08-29-Applying quantum probability to political science
Introduction: As we’ve discussed on occasion, conditional probability (“Boltzmann statistics,” in physics jargon) is false at the atomic level. (It’s false at the macroscopic level too, but with discrepancies too small to be detected directly most of the time.) Occasionally I’ve speculated on how quantum probability (that is, the laws of uncertainty that hold in the real world) might be applied to social science research. I’ve made no progress but remain intrigued by the idea. Chris Zorn told me he recently went to a meeting on applications of non-Kolmogorovian / quantum probability to social & human phenomena. Here’s his paper (with Charles Smith), “Some Quantum-Like Features of Mass Politics in Two-Party Systems,” which begins: We [Smith and Zorn] expand the substantive terrain of QI’s reach by illuminating a body of political theory that to date has been elaborated in strictly classical language and formalisms but has complex features that seem to merit generalizations of the prob
4 0.91741711 622 andrew gelman stats-2011-03-21-A possible resolution of the albedo mystery!
Introduction: Remember that bizarre episode in Freakonomics 2, where Levitt and Dubner went to the Batcave-like lair of a genius billionaire who told them that “the problem with solar panels is that they’re black .” I’m not the only one who wondered at the time: of all the issues to bring up about solar power, why that one? Well, I think I’ve found the answer in this article by John Lanchester: In 2004, Nathan Myhrvold, who had, five years earlier, at the advanced age of forty, retired from his job as Microsoft’s chief technology officer, began to contribute to the culinary discussion board egullet.org . . . At the time he grew interested in sous vide, there was no book in English on the subject, and he resolved to write one. . . . broadened it further to include information about the basic physics of heating processes, then to include the physics and chemistry of traditional cooking techniques, and then to include the science and practical application of the highly inventive new techniq
5 0.91673952 958 andrew gelman stats-2011-10-14-The General Social Survey is a great resource
Introduction: See, for example, this report by Deborah Carr on changing attitudes about marital infidelity: Two great things about the General Social Survey are: (1) the data are freely available online , and (2) the same questions have been asked since 1972 so you get a nice long series.
6 0.90735888 924 andrew gelman stats-2011-09-24-“Income can’t be used to predict political opinion”
7 0.89598626 1674 andrew gelman stats-2013-01-15-Prior Selection for Vector Autoregressions
8 0.88954532 1568 andrew gelman stats-2012-11-07-That last satisfaction at the end of the career
9 0.88402766 36 andrew gelman stats-2010-05-16-Female Mass Murderers: Babes Behind Bars
10 0.88324577 47 andrew gelman stats-2010-05-23-Of home runs and grand slams
11 0.87257445 369 andrew gelman stats-2010-10-25-Misunderstanding of divided government
12 0.85422313 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer
13 0.85347486 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women
15 0.84823859 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”
16 0.83804452 626 andrew gelman stats-2011-03-23-Physics is hard
17 0.83800012 255 andrew gelman stats-2010-09-04-How does multilevel modeling affect the estimate of the grand mean?
18 0.83555442 1251 andrew gelman stats-2012-04-07-Mathematical model of vote operations
19 0.8279503 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work
20 0.82707715 186 andrew gelman stats-2010-08-04-“To find out what happens when you change something, it is necessary to change it.”