andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1156 knowledge-graph by maker-knowledge-mining

1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples


meta infos for this blog

Source: html

Introduction: This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate this point with three well-known theoretical examples from spatial statistics and time series. First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretic


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. [sent-1, score-0.5]

2 In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. [sent-2, score-1.102]

3 We illustrate this point with three well-known theoretical examples from spatial statistics and time series. [sent-3, score-0.505]

4 First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. [sent-4, score-1.044]

5 Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. [sent-5, score-0.537]

6 Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. [sent-6, score-1.981]

7 In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretically, even though in the past they may have been presented in detail in the context of specific examples. [sent-7, score-1.613]

8 But it’s only been cited 17 times (and four of those were by me), so I must have done something wrong. [sent-9, score-0.169]

9 In retrospect I think it would’ve made more sense to write it as three separate papers; then each might have had its own impact. [sent-10, score-0.297]

10 In any case, I hope the article provides some enjoyment and insight to those of you who click through. [sent-11, score-0.413]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('derived', 0.322), ('smoothing', 0.254), ('theoretically', 0.25), ('local', 0.242), ('restricted', 0.237), ('spatial', 0.227), ('uniform', 0.201), ('show', 0.176), ('principles', 0.173), ('invariance', 0.16), ('presented', 0.157), ('autoregressive', 0.153), ('lattice', 0.148), ('enjoyment', 0.14), ('violates', 0.129), ('papers', 0.115), ('absence', 0.115), ('three', 0.11), ('produces', 0.109), ('minimal', 0.107), ('retrospect', 0.105), ('averages', 0.104), ('insight', 0.101), ('forms', 0.097), ('illustrate', 0.096), ('contribution', 0.095), ('cited', 0.094), ('click', 0.09), ('bayesian', 0.088), ('interpret', 0.088), ('detail', 0.088), ('depends', 0.087), ('draw', 0.086), ('favorite', 0.085), ('provides', 0.082), ('separate', 0.082), ('third', 0.081), ('principle', 0.08), ('applications', 0.08), ('estimation', 0.08), ('allow', 0.079), ('four', 0.075), ('previous', 0.075), ('scale', 0.073), ('theoretical', 0.072), ('series', 0.072), ('specific', 0.071), ('love', 0.069), ('data', 0.069), ('context', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

Introduction: This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate this point with three well-known theoretical examples from spatial statistics and time series. First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretic

2 0.13600442 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

Introduction: Aki and I write : The very generality of the boostrap creates both opportunity and peril, allowing researchers to solve otherwise intractable problems but also sometimes leading to an answer with an inappropriately high level of certainty. We demonstrate with two examples from our own research: one problem where bootstrap smoothing was effective and led us to an improved method, and another case where bootstrap smoothing would not solve the underlying problem. Our point in these examples is not to disparage bootstrapping but rather to gain insight into where it will be more or less effective as a smoothing tool. An example where bootstrap smoothing works well Bayesian posterior distributions are commonly summarized using Monte Carlo simulations, and inferences for scalar parameters or quantities of interest can be summarized using 50% or 95% intervals. A interval for a continuous quantity is typically constructed either as a central probability interval (with probabili

3 0.12337342 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

Introduction: David Sparks writes: I am experimenting with the mapping/visualization of survey response data, with a particular focus on using transparency to convey uncertainty. See some examples here . Do you think the examples are successful at communicating both local values of the variable of interest, as well as the lack of information in certain places? Also, do you have any general advice for choosing an approach to spatially smoothing the data in a way that preserves local features, but prevents individual respondents from standing out? I have experimented a lot with smoothing in these maps, and the cost of preventing the Midwest and West from looking “spotty” is the oversmoothing of the Northeast. My quick impression is that the graphs are more pretty than they are informative. But “pretty” is not such a bad thing! The conveying-information part is more difficult: to me, the graphs seem to be displaying a somewhat confusing mix of opinion level and population density. Consider

4 0.10928909 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

Introduction: John Cook considers how people justify probability distribution assumptions: Sometimes distribution assumptions are not justified. Sometimes distributions can be derived from fundamental principles [or] . . . on theoretical grounds. For example, large samples and the central limit theorem together may justify assuming that something is normally distributed. Often the choice of distribution is somewhat arbitrary, chosen by intuition or for convenience, and then empirically shown to work well enough. Sometimes a distribution can be a bad fit and still work well, depending on what you’re asking of it. Cook continues: The last point is particularly interesting. It’s not hard to imagine that a poor fit would produce poor results. It’s surprising when a poor fit produces good results. And then he gives an example of an effective but inaccurate model used to model survival times in a clinical trial. Cook explains: The [poorly-fitting] method works well because of the q

5 0.10870104 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

Introduction: I’ve had a couple of email conversations in the past couple days on dependence in multivariate prior distributions. Modeling the degrees of freedom and scale parameters in the t distribution First, in our Stan group we’ve been discussing the choice of priors for the degrees-of-freedom parameter in the t distribution. I wrote that also there’s the question of parameterization. It does not necessarily make sense to have independent priors on the df and scale parameters. In some sense, the meaning of the scale parameter changes with the df. Prior dependence between correlation and scale parameters in the scaled inverse-Wishart model The second case of parameterization in prior distribution arose from an email I received from Chris Chatham pointing me to this exploration by Matt Simpson of the scaled inverse-Wishart prior distribution for hierarchical covariance matrices. Simpson writes: A popular prior for Σ is the inverse-Wishart distribution [ not the same as the

6 0.10660194 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

7 0.1061359 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

8 0.10239626 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

9 0.10217573 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

10 0.096708991 508 andrew gelman stats-2011-01-08-More evidence of growing nationalization of congressional elections

11 0.096209683 1469 andrew gelman stats-2012-08-25-Ways of knowing

12 0.095188826 1941 andrew gelman stats-2013-07-16-Priors

13 0.095095649 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

14 0.093591049 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

15 0.092088118 193 andrew gelman stats-2010-08-09-Besag

16 0.091606274 1431 andrew gelman stats-2012-07-27-Overfitting

17 0.091589533 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”

18 0.089394137 1392 andrew gelman stats-2012-06-26-Occam

19 0.088350296 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

20 0.088334456 1986 andrew gelman stats-2013-08-17-Somebody’s looking for a book on time series analysis in the style of Angrist and Pischke, or Gelman and Hill


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.176), (1, 0.1), (2, -0.019), (3, 0.027), (4, -0.013), (5, -0.024), (6, -0.021), (7, 0.014), (8, -0.016), (9, 0.014), (10, 0.045), (11, -0.014), (12, -0.028), (13, 0.012), (14, -0.018), (15, 0.007), (16, 0.038), (17, -0.002), (18, -0.009), (19, -0.015), (20, 0.002), (21, 0.021), (22, -0.012), (23, 0.018), (24, 0.029), (25, 0.016), (26, -0.015), (27, -0.022), (28, 0.013), (29, 0.013), (30, 0.004), (31, -0.029), (32, -0.028), (33, -0.024), (34, 0.02), (35, 0.002), (36, -0.032), (37, -0.006), (38, 0.027), (39, 0.012), (40, 0.026), (41, 0.031), (42, -0.069), (43, -0.009), (44, -0.013), (45, -0.012), (46, 0.022), (47, -0.04), (48, 0.009), (49, 0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96741086 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

Introduction: This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate this point with three well-known theoretical examples from spatial statistics and time series. First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretic

2 0.79085392 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes

Introduction: Deborah Mayo pointed me to this discussion by Christian Hennig of my recent article on Induction and Deduction in Bayesian Data Analysis. A couple days ago I responded to comments by Mayo, Stephen Senn, and Larry Wasserman. I will respond to Hennig by pulling out paragraphs from his discussion and then replying. Hennig: for me the terms “frequentist” and “subjective Bayes” point to interpretations of probability, and not to specific methods of inference. The frequentist one refers to the idea that there is an underlying data generating process that repeatedly throws out data and would approximate the assumed distribution if one could only repeat it infinitely often. Hennig makes the good point that, if this is the way you would define “frequentist” (it’s not how I’d define the term myself, but I’ll use Hennig’s definition here), then it makes sense to be a frequentist in some settings but not others. Dice really can be rolled over and over again; a sample survey of 15

3 0.78824145 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

Introduction: I sent a copy of my paper (coauthored with Cosma Shalizi) on Philosophy and the practice of Bayesian statistics in the social sciences to Richard Berk , who wrote: I read your paper this morning. I think we are pretty much on the same page about all models being wrong. I like very much the way you handle this in the paper. Yes, Newton’s work is wrong, but surely useful. I also like your twist on Bayesian methods. Makes good sense to me. Perhaps most important, your paper raises some difficult issues I have been trying to think more carefully about. 1. If the goal of a model is to be useful, surely we need to explore that “useful” means. At the very least, usefulness will depend on use. So a model that is useful for forecasting may or may not be useful for causal inference. 2. Usefulness will be a matter of degree. So that for each use we will need one or more metrics to represent how useful the model is. In what looks at first to be simple example, if the use is forecasting,

4 0.77408701 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

Introduction: Robert Bell pointed me to this post by Brad De Long on Bayesian statistics, and then I also noticed this from Noah Smith, who wrote: My impression is that although the Bayesian/Frequentist debate is interesting and intellectually fun, there’s really not much “there” there… despite being so-hip-right-now, Bayesian is not the Statistical Jesus. I’m happy to see the discussion going in this direction. Twenty-five years ago or so, when I got into this biz, there were some serious anti-Bayesian attitudes floating around in mainstream statistics. Discussions in the journals sometimes devolved into debates of the form, “Bayesians: knaves or fools?”. You’d get all sorts of free-floating skepticism about any prior distribution at all, even while people were accepting without question (and doing theory on) logistic regressions, proportional hazards models, and all sorts of strong strong models. (In the subfield of survey sampling, various prominent researchers would refuse to mode

5 0.76862109 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

Introduction: Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured for each student as scores derived from averaging a student’s responses on particular subsets of survey questions. (I suppose it is not uncontroversial to treat these scores as interval measures and would be interested to hear if you have any objections.) I am comparing genders on each construct. Researchers typically use t-tests to do so. To use a Bayesian approach I applied the programs written in R and JAGS by John Kruschke for estimating the difference of means: http://www.indiana.edu/~kruschke/BEST/ An issue in that analysis is that the distributions of student scores are not normal. There was skewness in some of the distributions and not always in

6 0.76619649 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

7 0.76480901 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

8 0.76382536 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

9 0.75724167 962 andrew gelman stats-2011-10-17-Death!

10 0.75724155 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

11 0.75566918 1157 andrew gelman stats-2012-02-07-Philosophy of Bayesian statistics: my reactions to Hendry

12 0.75452471 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

13 0.75334185 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles

14 0.75229496 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

15 0.7518847 1469 andrew gelman stats-2012-08-25-Ways of knowing

16 0.75106549 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

17 0.75049919 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

18 0.7482208 1091 andrew gelman stats-2011-12-29-Bayes in astronomy

19 0.74798453 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results

20 0.74479651 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.518), (24, 0.046), (79, 0.015), (89, 0.018), (98, 0.013), (99, 0.278)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99566931 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

Introduction: Jonathan Cantor points to this poll estimating rifle-armed QB Tim Tebow as America’s favorite pro athlete: In an ESPN survey of 1,502 Americans age 12 or older, three percent identified Tebow as their favorite professional athlete. Tebow finished in front of Kobe Bryant (2 percent), Aaron Rodgers (1.9 percent), Peyton Manning (1.8 percent), and Tom Brady (1.5 percent). Amusing. What this survey says to me is that there are no super-popular athletes who are active in America today. Which actually sounds about right. No Tiger Woods, no Magic Johnson, Muhammed Ali, John Elway, Pete Rose, Billie Jean King, etc etc. Tebow is an amusing choice, people might as well pick him now while he’s still on top. As a sports celeb, he’s like Bill Lee or the Refrigerator: colorful and a solid pro athlete, but no superstar. When you think about all the colorful superstar athletes of times gone by, it’s perhaps surprising that there’s nobody out there right now to play the role. I supp

2 0.99394673 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

3 0.9931981 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street

Introduction: Tyler Cowen links a blog by Samuel Arbesman mocking people who are so lazy that they take the elevator from 1 to 2. This reminds me of my own annoyance about a guy who worked in my building and did not take the elevator. (For the full story, go here and search on “elevator.”)

4 0.98885989 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog

Introduction: 1. I have the least stressful job in America (duh) 2. B-school prof in a parody of short-term thinking 3. The academic clock 4. I guessed wrong 5. 2012 Conceptual Development Lab Newsletter

5 0.98849022 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.

6 0.98816913 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

7 0.98327589 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

8 0.97730958 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

9 0.97304261 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

10 0.97085905 1487 andrew gelman stats-2012-09-08-Animated drought maps

11 0.96377981 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

12 0.95297968 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

13 0.95293838 1025 andrew gelman stats-2011-11-24-Always check your evidence

14 0.9505055 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

15 0.93792325 398 andrew gelman stats-2010-11-06-Quote of the day

16 0.93728703 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

17 0.93507427 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update

same-blog 18 0.92898536 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

19 0.92442513 1168 andrew gelman stats-2012-02-14-The tabloids strike again

20 0.92162591 387 andrew gelman stats-2010-11-01-Do you own anything that was manufactured in the 1950s and still is in regular, active use in your life?