andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-780 knowledge-graph by maker-knowledge-mining

780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data


meta infos for this blog

Source: html

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. [sent-1, score-1.3]

2 We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. [sent-2, score-2.873]

3 In the context of binary data, we consider a model in which the probability of error depends on the model prediction. [sent-3, score-0.909]

4 We show how to fit this model using a stochastic modification of deterministic optimization schemes. [sent-4, score-1.526]

5 We illustrate this with a simple theoretical example of item response data and with empirical examples from archeology and the psychology of choice. [sent-6, score-0.296]

6 It didn’t get a lot of attention when it came out, but I still think it’s an excellent and widely applicable idea. [sent-8, score-0.21]

7 Lots of people are running around out there fitting deterministic prediction models, and our paper describes a method for taking such models and interpreting them probabilistically. [sent-9, score-1.104]

8 By turning a predictive model into a generative probabilistic model, we can check model fit and point toward potential improvements (which might be implementable in the original deterministic framework). [sent-10, score-1.924]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('deterministic', 0.662), ('model', 0.319), ('stochastic', 0.234), ('fitting', 0.162), ('fit', 0.149), ('probabilistic', 0.145), ('binary', 0.13), ('archeology', 0.116), ('jeroen', 0.116), ('imperfectly', 0.098), ('iven', 0.098), ('mechelen', 0.098), ('occurrence', 0.098), ('quantification', 0.098), ('applicable', 0.095), ('errors', 0.093), ('check', 0.092), ('modification', 0.089), ('generative', 0.088), ('models', 0.087), ('error', 0.081), ('van', 0.081), ('turning', 0.076), ('interpreting', 0.076), ('occurring', 0.076), ('accepting', 0.075), ('advantages', 0.074), ('improvements', 0.074), ('optimization', 0.073), ('simpler', 0.07), ('de', 0.069), ('implicitly', 0.066), ('illustrate', 0.066), ('corresponding', 0.066), ('proposed', 0.063), ('explicitly', 0.063), ('implicit', 0.062), ('widely', 0.062), ('fits', 0.061), ('item', 0.061), ('describes', 0.061), ('depends', 0.06), ('framework', 0.059), ('easier', 0.057), ('paul', 0.056), ('prediction', 0.056), ('estimation', 0.054), ('ability', 0.054), ('attention', 0.053), ('empirical', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

2 0.77122879 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

3 0.45810711 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

4 0.19894508 214 andrew gelman stats-2010-08-17-Probability-processing hardware

Introduction: Lyric Semiconductor posted: For over 60 years, computers have been based on digital computing principles. Data is represented as bits (0s and 1s). Boolean logic gates perform operations on these bits. A processor steps through many of these operations serially in order to perform a function. However, today’s most interesting problems are not at all suited to this approach. Here at Lyric Semiconductor, we are redesigning information processing circuits from the ground up to natively process probabilities: from the gate circuits to the processor architecture to the programming language. As a result, many applications that today require a thousand conventional processors will soon run in just one Lyric processor, providing 1,000x efficiencies in cost, power, and size. Om Malik has some more information, also relating to the team and the business. The fundamental idea is that computing architectures work deterministically, even though the world is fundamentally stochastic.

5 0.18853104 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

Introduction: In response to my remarks on his online book, Think Bayes, Allen Downey wrote: I [Downey] have a question about one of your comments: My [Gelman's] main criticism with both books is that they talk a lot about inference but not so much about model building or model checking (recall the three steps of Bayesian data analysis). I think it’s ok for an introductory book to focus on inference, which of course is central to the data-analytic process—but I’d like them to at least mention that Bayesian ideas arise in model building and model checking as well. This sounds like something I agree with, and one of the things I tried to do in the book is to put modeling decisions front and center. But the word “modeling” is used in lots of ways, so I want to see if we are talking about the same thing. For example, in many chapters, I start with a simple model of the scenario, do some analysis, then check whether the model is good enough, and iterate. Here’s the discussion of modeling

6 0.1572741 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

7 0.15281923 1392 andrew gelman stats-2012-06-26-Occam

8 0.15138176 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

9 0.14874534 2133 andrew gelman stats-2013-12-13-Flexibility is good

10 0.13770115 1431 andrew gelman stats-2012-07-27-Overfitting

11 0.13756596 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

12 0.13686207 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

13 0.13586977 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

14 0.13503893 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

15 0.13356097 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

16 0.13287094 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

17 0.13164526 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

18 0.12908533 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

19 0.12631062 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

20 0.12587842 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.158), (1, 0.2), (2, 0.029), (3, 0.044), (4, 0.025), (5, 0.013), (6, -0.037), (7, -0.076), (8, 0.153), (9, 0.058), (10, 0.05), (11, 0.089), (12, -0.175), (13, -0.027), (14, -0.167), (15, -0.031), (16, 0.084), (17, -0.053), (18, -0.05), (19, -0.023), (20, 0.045), (21, -0.076), (22, -0.007), (23, -0.131), (24, -0.052), (25, 0.024), (26, -0.098), (27, -0.019), (28, 0.057), (29, -0.071), (30, -0.099), (31, 0.018), (32, -0.044), (33, -0.023), (34, 0.01), (35, 0.021), (36, -0.043), (37, -0.071), (38, 0.039), (39, -0.037), (40, -0.002), (41, -0.025), (42, 0.011), (43, 0.069), (44, 0.021), (45, 0.025), (46, -0.042), (47, -0.028), (48, -0.016), (49, 0.044)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98214734 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

2 0.96715713 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

3 0.89722282 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

Introduction: Daniel Lakeland asks , “Where do likelihoods come from?” He describes a class of problems where you have a deterministic dynamic model that you want to fit to data. The data won’t fit perfectly so, if you want to do Bayesian inference, you need to introduce an error model. This looks a little bit different from the usual way that models are presented in statistics textbooks, where the focus is typically on the random error process, not on the deterministic part of the model. A focus on the error process makes sense in some applications that have inherent randomness or variation (for example, genetics, psychology, and survey sampling) but not so much in the physical sciences, where the deterministic model can be complicated and is typically the essence of the study. Often in these sorts of studies, the staring point (and sometimes the ending point) is what the physicists call “nonlinear least squares” or what we would call normally-distributed errors. That’s what we did for our

4 0.88912296 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

5 0.8841604 2133 andrew gelman stats-2013-12-13-Flexibility is good

Introduction: If I made a separate post for each interesting blog discussion, we’d get overwhelmed. That’s why I often leave detailed responses in the comments section, even though I’m pretty sure that most readers don’t look in the comments at all. Sometimes, though, I think it’s good to bring such discussions to light. Here’s a recent example. Michael wrote : Poor predictive performance usually indicates that the model isn’t sufficiently flexible to explain the data, and my understanding of the proper Bayesian strategy is to feed that back into your original model and try again until you achieve better performance. Corey replied : It was my impression that — in ML at least — poor predictive performance is more often due to the model being too flexible and fitting noise. And Rahul agreed : Good point. A very flexible model will describe your training data perfectly and then go bonkers when unleashed on wild data. But I wrote : Overfitting comes from a model being flex

6 0.86478645 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models

7 0.83446974 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings

8 0.81656826 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

9 0.80413568 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

10 0.79123676 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

11 0.79025191 1431 andrew gelman stats-2012-07-27-Overfitting

12 0.78929818 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

13 0.78920567 823 andrew gelman stats-2011-07-26-Including interactions or not

14 0.78258413 1392 andrew gelman stats-2012-06-26-Occam

15 0.77721411 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

16 0.77203381 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

17 0.76846576 1234 andrew gelman stats-2012-03-28-The Supreme Court’s Many Median Justices

18 0.76699483 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

19 0.76669127 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

20 0.76168478 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.014), (16, 0.076), (21, 0.031), (24, 0.153), (34, 0.037), (35, 0.019), (41, 0.014), (56, 0.156), (65, 0.01), (70, 0.029), (76, 0.024), (86, 0.043), (97, 0.011), (99, 0.262)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95851761 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

2 0.94605082 14 andrew gelman stats-2010-05-01-Imputing count data

Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.

3 0.94403923 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

4 0.94334918 933 andrew gelman stats-2011-09-30-More bad news: The (mis)reporting of statistical results in psychology journals

Introduction: Another entry in the growing literature on systematic flaws in the scientific research literature. This time the bad tidings come from Marjan Bakker and Jelte Wicherts, who write : Around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers’ expectations. Their research also had a qualitative component: To obtain a better understanding of the origins of the errors made in the reporting of statistics, we contacted the authors of the articles with errors in the second study and asked them to send us the raw data. Regrettably, only 24% of the authors shared their data, despite our request

5 0.93700409 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal

Introduction: The story starts in September, when psychology professor Fred Oswald wrote me: I [Oswald] wanted to point out this paper in Science (Ramirez & Beilock, 2010) examining how students’ emotional writing improves their test performance in high-pressure situations. Although replication is viewed as the hallmark of research, this paper replicates implausibly large d-values and correlations across studies, leading me to be more suspicious of the findings (not less, as is generally the case). He also pointed me to this paper: Experimental disclosure and its moderators: A meta-analysis. Frattaroli, Joanne Psychological Bulletin, Vol 132(6), Nov 2006, 823-865. Disclosing information, thoughts, and feelings about personal and meaningful topics (experimental disclosure) is purported to have various health and psychological consequences (e.g., J. W. Pennebaker, 1993). Although the results of 2 small meta-analyses (P. G. Frisina, J. C. Borod, & S. J. Lepore, 2004; J. M. Smyth

6 0.93240476 1045 andrew gelman stats-2011-12-07-Martyn Plummer’s Secret JAGS Blog

7 0.92954803 1158 andrew gelman stats-2012-02-07-The more likely it is to be X, the more likely it is to be Not X?

8 0.92753398 1388 andrew gelman stats-2012-06-22-Americans think economy isn’t so bad in their city but is crappy nationally and globally

9 0.92570078 1929 andrew gelman stats-2013-07-07-Stereotype threat!

10 0.91958439 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

11 0.9183144 984 andrew gelman stats-2011-11-01-David MacKay sez . . . 12??

12 0.91544193 267 andrew gelman stats-2010-09-09-This Friday afternoon: Applied Statistics Center mini-conference on risk perception

13 0.90287149 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

14 0.90052021 534 andrew gelman stats-2011-01-24-Bayes at the end

15 0.88533437 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

16 0.88229406 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

17 0.88212115 1842 andrew gelman stats-2013-05-05-Cleaning up science

18 0.87971115 426 andrew gelman stats-2010-11-22-Postdoc opportunity here at Columbia — deadline soon!

19 0.87862694 1792 andrew gelman stats-2013-04-07-X on JLP

20 0.87781096 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling