andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-265 knowledge-graph by maker-knowledge-mining

265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models


meta infos for this blog

Source: html

Introduction: Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. Hey–this is one of my favorite topics!


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. [sent-1, score-0.711]

2 Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. [sent-2, score-0.472]

3 There are very many well-known techniques for visualising data, but far fewer for visualising models. [sent-3, score-1.352]

4 In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. [sent-4, score-1.513]

5 I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. [sent-5, score-0.815]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('visualising', 0.454), ('visualisation', 0.413), ('techniques', 0.225), ('ensembles', 0.195), ('noon', 0.195), ('sept', 0.18), ('dept', 0.166), ('neural', 0.163), ('wickham', 0.16), ('hadley', 0.152), ('monday', 0.146), ('talk', 0.138), ('strategies', 0.134), ('complexity', 0.128), ('volume', 0.123), ('model', 0.122), ('networks', 0.12), ('powerful', 0.12), ('broad', 0.119), ('fewer', 0.114), ('collection', 0.114), ('explore', 0.112), ('increases', 0.109), ('members', 0.108), ('tool', 0.106), ('display', 0.105), ('favorite', 0.104), ('demonstrate', 0.101), ('space', 0.097), ('fitting', 0.097), ('topics', 0.096), ('linear', 0.092), ('hey', 0.09), ('data', 0.084), ('discuss', 0.079), ('particularly', 0.079), ('process', 0.078), ('understanding', 0.075), ('examples', 0.071), ('end', 0.07), ('three', 0.067), ('far', 0.067), ('look', 0.053), ('models', 0.052), ('statistics', 0.043), ('say', 0.04), ('two', 0.039), ('many', 0.038), ('work', 0.035), ('one', 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

Introduction: Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. Hey–this is one of my favorite topics!

2 0.15831441 1499 andrew gelman stats-2012-09-16-Uri Simonsohn is speaking at Columbia tomorrow (Mon)

Introduction: Noon in the stat dept (room 903 School of Social Work, at 122/Amsterdam). He’ll be talking about ways of finding fishy p-values. See here and here for background. This stuff is cool and important.

3 0.10823748 1066 andrew gelman stats-2011-12-17-Ripley on model selection, and some links on exploratory model analysis

Introduction: This is really fun. I love how Ripley thinks, with just about every concept considered in broad generality while being connected to real-data examples. He’s a great statistical storyteller as well. . . . and Wickham on exploratory model analysis I came across Ripley’s slides in a reference from Hadley Wickham’s article on exploratory model analysis . I’ve been interested for awhile in statistical graphics for understanding fitted models (which is different than the usual use of graphics to visualize data or to understand discrepancies of data from models). Recently I’ve started using the term “exploratory model analysis,” and it seemed like such a natural phrase that I thought I’d google it and see what’s up. I found the above-linked paper by Hadley, which in turn refers to a paper by Antony Unwin, Chris Volinksy, and Sylvia Winkler that defines “exploratory modelling analysis” as “the evaluation and comparison of many models simultaneously.” That’s not exactly what I h

4 0.10470606 1392 andrew gelman stats-2012-06-26-Occam

Introduction: Cosma Shalizi and Larry Wasserman discuss some papers from a conference on Ockham’s Razor. I don’t have anything new to add on this so let me link to past blog entries on the topic and repost the following from 2004 : A lot has been written in statistics about “parsimony”—that is, the desire to explain phenomena using fewer parameters–but I’ve never seen any good general justification for parsimony. (I don’t count “Occam’s Razor,” or “Ockham’s Razor,” or whatever, as a justification. You gotta do better than digging up a 700-year-old quote.) Maybe it’s because I work in social science, but my feeling is: if you can approximate reality with just a few parameters, fine. If you can use more parameters to fold in more information, that’s even better. In practice, I often use simple models—because they are less effort to fit and, especially, to understand. But I don’t kid myself that they’re better than more complicated efforts! My favorite quote on this comes from Rad

5 0.10387743 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

Introduction: Last month I wrote : Computer scientists are often brilliant but they can be unfamiliar with what is done in the worlds of data collection and analysis. This goes the other way too: statisticians such as myself can look pretty awkward, reinventing (or failing to reinvent) various wheels when we write computer programs or, even worse, try to design software.Andrew MacNamara writes: Andrew MacNamara followed up with some thoughts: I [MacNamara] had some basic statistics training through my MBA program, after having completed an undergrad degree in computer science. Since then I’ve been very interested in learning more about statistical techniques, including things like GLM and censored data analyses as well as machine learning topics like neural nets, SVMs, etc. I began following your blog after some research into Bayesian analysis topics and I am trying to dig deeper on that side of things. One thing I have noticed is that there seems to be a distinction between data analysi

6 0.089149781 1431 andrew gelman stats-2012-07-27-Overfitting

7 0.089017309 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

8 0.084931366 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

9 0.079032443 2133 andrew gelman stats-2013-12-13-Flexibility is good

10 0.078560129 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

11 0.078019552 548 andrew gelman stats-2011-02-01-What goes around . . .

12 0.07737191 534 andrew gelman stats-2011-01-24-Bayes at the end

13 0.074684151 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

14 0.073530115 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?

15 0.07056094 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

16 0.066703185 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

17 0.066490814 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

18 0.066430651 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

19 0.066063315 699 andrew gelman stats-2011-05-06-Another stereotype demolished

20 0.06601882 1009 andrew gelman stats-2011-11-14-Wickham R short course


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.098), (1, 0.06), (2, -0.029), (3, 0.043), (4, 0.041), (5, 0.011), (6, -0.065), (7, -0.002), (8, 0.029), (9, 0.045), (10, 0.002), (11, 0.024), (12, -0.034), (13, -0.016), (14, -0.049), (15, -0.026), (16, 0.017), (17, -0.028), (18, 0.013), (19, -0.0), (20, -0.023), (21, -0.056), (22, -0.005), (23, -0.038), (24, -0.039), (25, 0.025), (26, -0.064), (27, -0.045), (28, 0.027), (29, -0.045), (30, -0.025), (31, -0.011), (32, -0.018), (33, 0.002), (34, 0.022), (35, 0.043), (36, 0.001), (37, -0.038), (38, 0.03), (39, 0.007), (40, -0.02), (41, -0.025), (42, -0.018), (43, 0.047), (44, -0.006), (45, 0.034), (46, -0.0), (47, -0.022), (48, -0.019), (49, -0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96196824 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

Introduction: Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. Hey–this is one of my favorite topics!

2 0.84692103 1197 andrew gelman stats-2012-03-04-“All Models are Right, Most are Useless”

Introduction: The above is the title of a talk that Thad Tarpey gave at the Joint Statistical Meetings in 2009. Here’s the abstract: Students of statistics are often introduced to George Box’s famous quote: “all models are wrong, some are useful.” In this talk I [Tarpey] argue that this quote, although useful, is wrong. A different and more positive perspective is to acknowledge that a model is simply a means of extracting information of interest from data. The truth is infinitely complex and a model is merely an approximation to the truth. If the approximation is poor or misleading, then the model is useless. In this talk I give examples of correct models that are not true models. I illustrate how the notion of a “wrong” model can lead to wrong conclusions. I’m curious what he had to say—maybe he could post the slides? P.S. And here they are !

3 0.80535662 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

Introduction: Last year I spoke at a conference celebrating the 10th anniversary of the University of Washington’s Center for Statistics and the Social Sciences, and just today a special issue of the journal Statistical Methodology came out in honor of the center’s anniversary. My article in the special issue actually has nothing to do with my talk at the conference; rather, it’s an exploration of an idea that Iven Van Mechelen and I had for understanding deterministic models probabilistically: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stocha

4 0.78381181 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series

Introduction: The “Canadian lynx data” is one of the famous examples used in time series analysis. And the usual models that are fit to these data in the statistics time-series literature, don’t work well. Cavan Reilly and Angelique Zeringue write : Reilly and Zeringue then present their analysis. Their simple little predator-prey model with a weakly informative prior way outperforms the standard big-ass autoregression models. Check this out: Or, to put it into numbers, when they fit their model to the first 80 years and predict to the next 34, their root mean square out-of-sample error is 1480 (see scale of data above). In contrast, the standard model fit to these data (the SETAR model of Tong, 1990) has more than twice as many parameters but gets a worse-performing root mean square error of 1600, even when that model is fit to the entire dataset. (If you fit the SETAR or any similar autoregressive model to the first 80 years and use it to predict the next 34, the predictions

5 0.76975286 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

Introduction: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider a model in which the probability of error depends on the model prediction. We show how to fit this model using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic model explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model’s parameter estimates, better estimation of the true model error rate, and the ability to check the fit of the model nontrivially. We illustrate this with a simple theoretical example of item response data and w

6 0.76365167 1066 andrew gelman stats-2011-12-17-Ripley on model selection, and some links on exploratory model analysis

7 0.75872105 1431 andrew gelman stats-2012-07-27-Overfitting

8 0.74901271 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

9 0.74682826 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

10 0.74106687 2007 andrew gelman stats-2013-09-03-Popper and Jaynes

11 0.73865306 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

12 0.72269702 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

13 0.7192542 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

14 0.71860516 2133 andrew gelman stats-2013-12-13-Flexibility is good

15 0.69452417 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

16 0.69173473 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks

17 0.69139588 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

18 0.6835317 1392 andrew gelman stats-2012-06-26-Occam

19 0.67926788 320 andrew gelman stats-2010-10-05-Does posterior predictive model checking fit with the operational subjective approach?

20 0.67854476 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.103), (24, 0.097), (26, 0.054), (65, 0.025), (69, 0.208), (84, 0.065), (86, 0.015), (90, 0.03), (99, 0.272)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90918338 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

Introduction: Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. Hey–this is one of my favorite topics!

2 0.90766811 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts

Introduction: Thomas Ferguson and Robert Johnson write : Financial crises are staggeringly costly. Only major wars rival them in the burdens they place on public finances. Taxpayers typically transfer enormous resources to banks, their stockholders, and creditors, while public debt explodes and the economy runs below full employment for years. This paper compares how relatively large, developed countries have handled bailouts over time. It analyzes why some have done better than others at containing costs and protecting taxpayers. The paper argues that political variables – the nature of competition within party systems and voting turnout – help explain why some countries do more than others to limit the moral hazards of bailouts. I know next to nothing about this topic, so I’ll just recommend you click through and read the article yourself. Here’s a bit more: Many recent papers have analyzed financial crises using large data bases filled with cases from all over the world. Our [Ferguson

3 0.88119221 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots

Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th

4 0.87953222 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?

Introduction: The second best thing about this story (from Tom Scocca) is that Anderson spells “Tweets” with a capital T. But the best thing is that Scocca is numerate—he compares numbers on the logarithmic scale: Reminding Lake that he only had 169 Twitter followers was the saddest gambit of all. Jon Lee Anderson has 17,866 followers. And Kim Kardashian has, as I write this, 17,489,892 followers. That is: Jon Lee Anderson is 1/1,000 as important on Twitter, by his own standard, as Kim Kardashian. He is 10 times closer to Mitch Lake than he is to Kim Kardashian. How often do we see a popular journalist who understands orders of magnitude? Good job, Tom Scocca! P.S. Based on his “little twerp” comment, I also wonder if Anderson suffers from tall person syndrome—that’s the problem that some people of above-average height have, that they think they’re more important than other people because they literally look down on them. Don’t get me wrong—I have lots of tall friends who are complete

5 0.87623239 158 andrew gelman stats-2010-07-22-Tenants and landlords

Introduction: Matthew Yglesias and Megan McArdle argue about the economics of landlord/tenant laws in D.C., a topic I know nothing about. But it did remind me of a few stories . . . 1. In grad school, I shared half of a two-family house with three other students. At some point, our landlord (who lived in the other half of the house) decided he wanted to sell the place, so he had a real estate agent coming by occasionally to show the house to people. She was just a flat-out liar (which I guess fits my impression based on screenings of Glengarry Glen Ross). I could never decide, when I was around and she was lying to a prospective buyer, whether to call her on it. Sometimes I did, sometimes I didn’t. 2. A year after I graduated, the landlord actually did sell the place but then, when my friends moved out, he refused to pay back their security deposit. There was some debate about getting the place repainted, I don’t remember the details. So they sued the landlord in Mass. housing court

6 0.86479008 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

7 0.86055279 656 andrew gelman stats-2011-04-11-Jonathan Chait and I agree about the importance of the fundamentals in determining presidential elections

8 0.85110164 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

9 0.84080046 856 andrew gelman stats-2011-08-16-Our new improved blog! Thanks to Cord Blomquist

10 0.83981115 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

11 0.83755487 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

12 0.83229494 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

13 0.82661456 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

14 0.8218562 198 andrew gelman stats-2010-08-11-Multilevel modeling in R on a Mac

15 0.82118762 2063 andrew gelman stats-2013-10-16-My talk 19h this evening

16 0.82102442 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

17 0.82019556 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

18 0.81906122 261 andrew gelman stats-2010-09-07-The $900 kindergarten teacher

19 0.81668925 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

20 0.81599212 867 andrew gelman stats-2011-08-23-The economics of the mac? A paradox of competition