andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-24 knowledge-graph by maker-knowledge-mining

24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences


meta infos for this blog

Source: html

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. [sent-1, score-0.78]

2 We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. [sent-3, score-2.252]

3 In the context of binary data, we consider a model in which the probability of error depends on the model prediction. [sent-4, score-0.717]

4 We show how to fit this model using a stochastic modification of deterministic optimization schemes. [sent-5, score-1.321]

5 We illustrate this with a simple theoretical example of item response data and with empirical examples from archeology and the psychology of choice. [sent-7, score-0.184]

6 It says that the article was “Received 10 April 2009, Received in revised form 20 August 2009, Accepted 20 August 2009,” but the backstory is that we wrote and submitted the first version of this article around 1997 or so. [sent-8, score-0.172]

7 It got rejected by several journals along the way, occasionally with reports that were encouraging enough to motivate us to add examples and clean it up in various ways. [sent-9, score-0.36]

8 I still like the paper: even though it’s 13 years old, I think it has some important ideas which still are not fully understood, most notably the bit about checking the fit of the model nontrivially. [sent-10, score-0.472]

9 The special issue of the journal also features articles by Adrian Raftery (the founding director of the CSSS), Steve Fienberg, Don Rubin, Jon Wakefield, and many others. [sent-11, score-0.494]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('deterministic', 0.547), ('model', 0.259), ('stochastic', 0.221), ('august', 0.156), ('fit', 0.141), ('special', 0.141), ('conference', 0.131), ('binary', 0.122), ('archeology', 0.109), ('fienberg', 0.109), ('wakefield', 0.109), ('issue', 0.108), ('center', 0.104), ('fitting', 0.102), ('backstory', 0.099), ('founding', 0.099), ('probabilistically', 0.099), ('received', 0.092), ('imperfectly', 0.092), ('iven', 0.092), ('mechelen', 0.092), ('occurrence', 0.092), ('quantification', 0.092), ('anniversary', 0.09), ('raftery', 0.09), ('celebrating', 0.088), ('errors', 0.088), ('adrian', 0.086), ('modification', 0.084), ('jon', 0.083), ('honor', 0.082), ('error', 0.077), ('encouraging', 0.076), ('director', 0.076), ('van', 0.076), ('examples', 0.075), ('revised', 0.073), ('notably', 0.072), ('occurring', 0.072), ('rejected', 0.071), ('accepting', 0.071), ('april', 0.07), ('spoke', 0.07), ('journal', 0.07), ('advantages', 0.07), ('motivate', 0.07), ('optimization', 0.069), ('probabilistic', 0.069), ('various', 0.068), ('exploration', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

2 0.77122879 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

3 0.39096999 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

4 0.1798256 214 andrew gelman stats-2010-08-17-Probability-processing hardware

Introduction: Lyric Semiconductor posted: For over 60 years, computers have been based on digital computing principles. Data is represented as bits (0s and 1s). Boolean logic gates perform operations on these bits. A processor steps through many of these operations serially in order to perform a function. However, today’s most interesting problems are not at all suited to this approach. Here at Lyric Semiconductor, we are redesigning information processing circuits from the ground up to natively process probabilities: from the gate circuits to the processor architecture to the programming language. As a result, many applications that today require a thousand conventional processors will soon run in just one Lyric processor, providing 1,000x efficiencies in cost, power, and size. Om Malik has some more information, also relating to the team and the business. The fundamental idea is that computing architectures work deterministically, even though the world is fundamentally stochastic.

5 0.17070433 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

6 0.15370676 1392 andrew gelman stats-2012-06-26-Occam

7 0.13663408 1431 andrew gelman stats-2012-07-27-Overfitting

8 0.13235174 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

9 0.13205217 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

10 0.12701778 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

11 0.12568295 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

12 0.12518893 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

13 0.12361172 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

14 0.12143529 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

15 0.12071382 2133 andrew gelman stats-2013-12-13-Flexibility is good

16 0.11993922 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

17 0.11824013 72 andrew gelman stats-2010-06-07-Valencia: Summer of 1991

18 0.11626564 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

19 0.11469099 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman

20 0.11310717 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.192), (1, 0.144), (2, -0.018), (3, 0.02), (4, 0.001), (5, 0.005), (6, -0.05), (7, -0.087), (8, 0.11), (9, 0.058), (10, 0.064), (11, 0.07), (12, -0.165), (13, -0.009), (14, -0.149), (15, -0.039), (16, 0.083), (17, -0.049), (18, -0.045), (19, -0.019), (20, 0.036), (21, -0.066), (22, 0.016), (23, -0.124), (24, -0.04), (25, 0.019), (26, -0.118), (27, -0.034), (28, 0.047), (29, -0.082), (30, -0.099), (31, 0.006), (32, -0.057), (33, -0.021), (34, -0.008), (35, -0.002), (36, -0.034), (37, -0.048), (38, 0.049), (39, -0.014), (40, -0.012), (41, -0.017), (42, 0.013), (43, 0.074), (44, 0.025), (45, 0.034), (46, -0.034), (47, -0.03), (48, -0.023), (49, 0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98006332 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

2 0.94261044 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

3 0.86643833 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

4 0.85681069 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

5 0.8415612 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

6 0.81537098 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

7 0.80899996 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

8 0.80242044 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

9 0.80010533 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

10 0.79079533 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

11 0.78206402 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

12 0.78169119 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

13 0.77458757 1234 andrew gelman stats-2012-03-28-The Supreme Court’s Many Median Justices

14 0.77087504 1392 andrew gelman stats-2012-06-26-Occam

15 0.76838797 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

16 0.76332325 1431 andrew gelman stats-2012-07-27-Overfitting

17 0.76265919 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

18 0.76168114 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

19 0.76092058 823 andrew gelman stats-2011-07-26-Including interactions or not

20 0.75647408 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.012), (6, 0.01), (14, 0.011), (15, 0.046), (16, 0.084), (21, 0.038), (24, 0.126), (34, 0.039), (55, 0.019), (56, 0.129), (70, 0.01), (76, 0.014), (81, 0.011), (86, 0.029), (97, 0.015), (99, 0.297)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9721725 933 andrew gelman stats-2011-09-30-More bad news: The (mis)reporting of statistical results in psychology journals

Introduction: Another entry in the growing literature on systematic flaws in the scientific research literature. This time the bad tidings come from Marjan Bakker and Jelte Wicherts, who write : Around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers’ expectations. Their research also had a qualitative component: To obtain a better understanding of the origins of the errors made in the reporting of statistics, we contacted the authors of the articles with errors in the second study and asked them to send us the raw data. Regrettably, only 24% of the authors shared their data, despite our request

2 0.96709561 267 andrew gelman stats-2010-09-09-This Friday afternoon: Applied Statistics Center mini-conference on risk perception

Introduction: We’re doing a new thing here at the Applied Statistics Center, throwing monthly Friday afternoon mini-conferences in the Playroom (inspired by our successful miniconference on statistical consulting a couple years ago). This Friday (10 Sept), 1-5pm : Come join us this Friday, September 10th for an engaging interdisciplinary discussion of risk perception at the individual and societal level, and the role it plays in current environmental, social, and health policy debates. All are welcome! “Risk Perception in Environmental Decision-Making” Elke Weber, Columbia Business School “Cultural Cognition and the Problem of Science Communication” Dan Kahan, Yale Law School Discussants include: Michael Gerrard, Columbia Law School David Epstein, Department of Political Science, Columbia University Andrew Gelman, Department of Statistics, Columbia University

same-blog 3 0.96335918 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

4 0.96226263 14 andrew gelman stats-2010-05-01-Imputing count data

Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.

5 0.96226251 1388 andrew gelman stats-2012-06-22-Americans think economy isn’t so bad in their city but is crappy nationally and globally

Introduction: Frank Newport of Gallup reports ( link from Jay Livingston): Americans become progressively less positive about economic conditions the farther away from home they look. Forty-nine percent rate economic conditions in their local area as excellent or good, but that drops to 25% when rating the U.S. economy, and to 13% when assessing the world as a whole. This is really wack: I can see how it might make sense for Americans to think conditions are worse in other countries, but it’s hard to see a rational reason for the Lake-Wobegon-like pattern of people thinking things are ok locally but not nationally. Gallup highlights the partisan breakdown, which I graphed here: Unsurprisingly (given that their party controls the presidency and one house of Congress), Democrats are more optimistic than Republicans. This is just the flip side of University of Chicago economist Casey Mulligan claiming in October 2008 that the economy is not that bad because “the current unem

6 0.96212924 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

7 0.96117628 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal

8 0.95879531 1158 andrew gelman stats-2012-02-07-The more likely it is to be X, the more likely it is to be Not X?

9 0.95132196 984 andrew gelman stats-2011-11-01-David MacKay sez . . . 12??

10 0.94088942 1045 andrew gelman stats-2011-12-07-Martyn Plummer’s Secret JAGS Blog

11 0.93927586 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

12 0.93281329 534 andrew gelman stats-2011-01-24-Bayes at the end

13 0.93035638 1842 andrew gelman stats-2013-05-05-Cleaning up science

14 0.92957288 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

15 0.92210627 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

16 0.92125821 2220 andrew gelman stats-2014-02-22-Quickies

17 0.92026544 2137 andrew gelman stats-2013-12-17-Replication backlash

18 0.92007393 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research

19 0.92003822 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

20 0.91928208 886 andrew gelman stats-2011-09-02-The new Helen DeWitt novel