andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-363 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. You know, the standard thing with each question on its own line, followed by 5 columns of numbers. Is there a good way to display this data graphically? OK, there’s no one best way, but can you point your readers to a few good examples? My reply: Some sort of small multiples. I’m thinking of lineplots. Maybe a grid of plots, each with three colored and labeled lines. For example, it might be a grid with 10 rows and 5 columns. To really know what to do, I’d have to have more sense of what’s being plotted. Feel free to contribute your ideas in the comments.
sentIndex sentText sentNum sentScore
1 Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. [sent-1, score-1.059]
2 You know, the standard thing with each question on its own line, followed by 5 columns of numbers. [sent-2, score-0.571]
3 Is there a good way to display this data graphically? [sent-3, score-0.381]
4 OK, there’s no one best way, but can you point your readers to a few good examples? [sent-4, score-0.367]
5 Maybe a grid of plots, each with three colored and labeled lines. [sent-7, score-0.924]
6 For example, it might be a grid with 10 rows and 5 columns. [sent-8, score-0.694]
7 To really know what to do, I’d have to have more sense of what’s being plotted. [sent-9, score-0.216]
8 Feel free to contribute your ideas in the comments. [sent-10, score-0.392]
wordName wordTfidf (topN-words)
[('grid', 0.419), ('likert', 0.297), ('graphically', 0.244), ('hoffman', 0.244), ('colored', 0.229), ('rows', 0.222), ('columns', 0.21), ('alex', 0.189), ('reviewing', 0.188), ('contribute', 0.186), ('labeled', 0.18), ('plots', 0.164), ('tables', 0.163), ('display', 0.15), ('followed', 0.131), ('scale', 0.127), ('bunch', 0.117), ('whole', 0.109), ('free', 0.109), ('line', 0.107), ('readers', 0.106), ('examples', 0.101), ('ok', 0.101), ('good', 0.098), ('ideas', 0.097), ('three', 0.096), ('know', 0.095), ('comments', 0.094), ('standard', 0.093), ('way', 0.093), ('feel', 0.092), ('thinking', 0.089), ('small', 0.088), ('reply', 0.086), ('best', 0.08), ('sense', 0.071), ('question', 0.07), ('sort', 0.068), ('thing', 0.067), ('maybe', 0.065), ('article', 0.058), ('point', 0.054), ('might', 0.053), ('really', 0.05), ('example', 0.049), ('writes', 0.045), ('data', 0.04), ('one', 0.029)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses
Introduction: Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. You know, the standard thing with each question on its own line, followed by 5 columns of numbers. Is there a good way to display this data graphically? OK, there’s no one best way, but can you point your readers to a few good examples? My reply: Some sort of small multiples. I’m thinking of lineplots. Maybe a grid of plots, each with three colored and labeled lines. For example, it might be a grid with 10 rows and 5 columns. To really know what to do, I’d have to have more sense of what’s being plotted. Feel free to contribute your ideas in the comments.
2 0.13836667 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
3 0.13132235 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,
4 0.12699585 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the
5 0.11206189 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
Introduction: Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think that just maybe, perhaps, it’s possible we aught to consider, I’m just throwing out the possibility that it might be that the procedure for correcting measurement error might, you now, be a little too strong?” I don’t know exactly what’s happening here, but it might be something that I’ve seen on occasion when fitting multilevel models using a point estimate for the group-level variance. It goes like this: measurement-error models are multilevel models, they involve the estimation of a distribution of a latent variable. When fitting multilevel models, it is possible to estimate the group-level variance to be zero, even though the group-level varia
7 0.10347816 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)
8 0.10172644 1116 andrew gelman stats-2012-01-13-Infographic on the economy
9 0.094665959 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
10 0.089599021 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?
11 0.089563966 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper
12 0.088529438 1062 andrew gelman stats-2011-12-16-Mr. Pearson, meet Mr. Mandelbrot: Detecting Novel Associations in Large Data Sets
13 0.086903714 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
14 0.07955844 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?
15 0.079387851 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data
16 0.078550875 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
19 0.077072203 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model
20 0.075833887 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
topicId topicWeight
[(0, 0.131), (1, -0.015), (2, -0.02), (3, 0.021), (4, 0.075), (5, -0.071), (6, 0.01), (7, 0.013), (8, 0.013), (9, 0.013), (10, 0.034), (11, -0.017), (12, 0.005), (13, -0.009), (14, 0.0), (15, -0.017), (16, -0.013), (17, 0.004), (18, 0.005), (19, 0.006), (20, 0.038), (21, -0.009), (22, 0.015), (23, 0.002), (24, 0.005), (25, -0.018), (26, 0.048), (27, 0.017), (28, -0.014), (29, 0.021), (30, 0.052), (31, -0.0), (32, 0.004), (33, -0.008), (34, -0.013), (35, -0.005), (36, 0.041), (37, 0.036), (38, -0.023), (39, -0.027), (40, 0.041), (41, 0.019), (42, -0.031), (43, 0.025), (44, -0.038), (45, 0.035), (46, -0.009), (47, -0.013), (48, -0.018), (49, -0.001)]
simIndex simValue blogId blogTitle
same-blog 1 0.94822824 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses
Introduction: Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. You know, the standard thing with each question on its own line, followed by 5 columns of numbers. Is there a good way to display this data graphically? OK, there’s no one best way, but can you point your readers to a few good examples? My reply: Some sort of small multiples. I’m thinking of lineplots. Maybe a grid of plots, each with three colored and labeled lines. For example, it might be a grid with 10 rows and 5 columns. To really know what to do, I’d have to have more sense of what’s being plotted. Feel free to contribute your ideas in the comments.
2 0.79478025 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system
Introduction: After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth. I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output. P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that: - print() doesn’t give enough information - summary() gives everything to a zillion decimal places and gives useless things like p-values - plot() gives a bunch
3 0.77147645 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
4 0.7671743 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
Introduction: Back in the 1700s—JennyD can correct me if I’m wrong here—there was no standard style for writing. You could be discursive, you could be descriptive, flowery, or terse. Direct or indirect, serious or funny. You could construct a novel out of letters or write a philosophical treatise in the form of a novel. Nowadays there are rules. You can break the rules, but then you’re Breaking. The. Rules. Which is a distinctive choice all its own. Consider academic writing. Serious works of economics or statistics tend to be written in a serious style in some version of plain academic English. The few exceptions (for example, by Tukey, Tufte, Mandelbrot, and Jaynes) are clearly exceptions, written in styles that are much celebrated but not so commonly followed. A serious work of statistics, or economics, or political science could be written in a highly unconventional form (consider, for example, Wallace Shawn’s plays), but academic writers in these fields tend to stick with the sta
Introduction: Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. I have a question about this graph you commonly see in the economics literature. It is of a concept called the Beveridge Curve [recently in the newspaper here ]. It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. A good primer is here . However, despite being one of the more interesting concepts in economics, the way it is displayed visually is nothing short of atrocious: These graphs are nothing short of unreadable and pretty much the standard (Brad Delong has linked to this graph above and it can appear like this in publication as well). I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? I like the second Cleveland Fed graph, but I ha
6 0.7570582 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
8 0.74681729 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
9 0.74364585 296 andrew gelman stats-2010-09-26-A simple semigraphic display
10 0.7430442 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand
11 0.73405641 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?
12 0.73230606 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
13 0.73109615 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave
14 0.72433925 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
15 0.72303361 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
16 0.71926004 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
18 0.70697004 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
19 0.70555931 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly
20 0.70485711 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
topicId topicWeight
[(4, 0.029), (16, 0.017), (24, 0.16), (37, 0.08), (42, 0.076), (47, 0.029), (76, 0.033), (77, 0.062), (95, 0.06), (99, 0.324)]
simIndex simValue blogId blogTitle
same-blog 1 0.94746697 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses
Introduction: Alex Hoffman writes: I am reviewing a article with a whole bunch of tables with likert scale responses. You know, the standard thing with each question on its own line, followed by 5 columns of numbers. Is there a good way to display this data graphically? OK, there’s no one best way, but can you point your readers to a few good examples? My reply: Some sort of small multiples. I’m thinking of lineplots. Maybe a grid of plots, each with three colored and labeled lines. For example, it might be a grid with 10 rows and 5 columns. To really know what to do, I’d have to have more sense of what’s being plotted. Feel free to contribute your ideas in the comments.
2 0.93529826 157 andrew gelman stats-2010-07-21-Roller coasters, charity, profit, hmmm
Introduction: Dan Kahan writes: Here is a very interesting article form Science that reports result of experiment that looked at whether people bought a product (picture of themselves screaming or vomiting on roller coaster) or paid more for it when told “1/2 to charity.” Answer was “buy more” but “pay lots less” than when alternative was fixed price w/ or w/o charity; and “buy more” & “pay more” if consumer could name own price & 1/2 went to charity than if none went to charity. Pretty interesting. But . . . What’s odd, I [Kahan] think, is the measure used to report the result. The paper (written by some really amazingly good social psychologists; I know this from other studies) goes on & on, w/ figures & tables, about how the amusement park’s “revenue,” “revenue per ride” & “profit” went up by large amount when it used “name your own price & 1/2 to charity.” Yet that result is dominated by random effects — the marginal cost & volume of sales are peculiar to the product being sold &
3 0.93526351 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?
Introduction: Bill Harris writes: On pp. 250-251 of BDA second edition, you write about multiple comparisons, and you write about stepwise regression on p. 405. How would you look at stepwise regression analyses in light of the multiple comparisons problem? Is there an issue? My reply: In this case I think the right approach is to keep all the coefs but partially pool them toward 0 (after suitable transformation). But then the challenge is coming up with a general way to construct good prior distributions. I’m still thinking about that one! Yet another approach is to put something together purely nonparametrically as with Bart.
4 0.93277341 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
Introduction: Jeff Walker writes: Your blog has skirted around the value of observational studies and chided folks for using causal language when they only have associations but I sense that you ultimately find value in these associations. I would love for you to expand this thought in a blog. Specifically: Does a measured association “suggest” a causal relationship? Are measured associations a good and efficient way to narrow the field of things that should be studied? Of all the things we should pursue, should we start with the stuff that has some largish measured association? Certainly many associations are not directly causal but due to joint association. Similarly, there must be many variables that are directly causally associated ( A -> B) but the effect, measured as an association, is masked by confounders. So if we took the “measured associations are worthwhile” approach, we’d never or rarely find the masked effects. But I’d also like to know if one is more likely to find a large causal
5 0.93215084 1645 andrew gelman stats-2012-12-31-Statistical modeling, causal inference, and social science
Introduction: Interesting discussion by Berk Ozler (which I found following links from Tyler Cowen) of a study by Erwin Bulte, Lei Pan, Joseph Hella, Gonne Beekman, and Salvatore di Falco that compares two agricultural experiments, one blinded and one unblinded. Bulte et al. find much different results in the two experiments and attribute the difference to expectation effects (when people know they’re receiving an experiment they behave differently); Ozler is skeptical and attributes the different outcomes to various practical differences in implementation of the two experiments. I’m reminded somehow of the notorious sham experiment on the dead chickens, a story that was good for endless discussion in my Bayesian statistics class last semester. I think we can all agree that dead chickens won’t exhibit a placebo effect. Live farmers, though, that’s another story. I don’t have any stake in this particular fight, but on quick reading I’m sympathetic to Ozler’s argument that this all is wel
6 0.93212163 2195 andrew gelman stats-2014-02-02-Microfoundations of macroeconomics
8 0.93131375 1231 andrew gelman stats-2012-03-27-Attention pollution
9 0.92981124 1692 andrew gelman stats-2013-01-25-Freakonomics Experiments
10 0.92970127 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
11 0.92917824 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq
12 0.9288947 259 andrew gelman stats-2010-09-06-Inbox zero. Really.
13 0.92884737 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
14 0.9287945 1363 andrew gelman stats-2012-06-03-Question about predictive checks
15 0.9284333 2018 andrew gelman stats-2013-09-12-Do you ever have that I-just-fit-a-model feeling?
16 0.92829847 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?
18 0.92823184 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?
20 0.9281112 117 andrew gelman stats-2010-06-29-Ya don’t know Bayes, Jack