andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1764 knowledge-graph by maker-knowledge-mining

1764 andrew gelman stats-2013-03-15-How do I make my graphs?

meta infos for this blog

Source: html

Introduction: Someone who wishes to remain anonymous writes: I’ve been following your blog a long time and enjoy your posts on visualization/statistical graphics matters. I don’t recall however you ever describing the details of your setup for plotting. I’m a new R user (convert from matplotlib) and would love to know your thoughts on the ideal setup: do you use mainly the R base? Do you use lattice? What do you think of ggplot2? etc. I found ggplot2 nearly indecipherable until a recent eureka moment, and I think its default theme is a waste tremendous ink (all those silly grey backgrounds and grids are really unnecessary), but if you customize that away it can be made to look like ordinary, pretty statistical graphs. Feel free to respond on your blog, but if you do, please remove my name from the post (my colleagues already make fun of me for thinking about visualization too much.) I love that last bit! Anyway, my response is that I do everything in base graphics (using my

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Someone who wishes to remain anonymous writes: I’ve been following your blog a long time and enjoy your posts on visualization/statistical graphics matters. [sent-1, score-0.405]

2 I don’t recall however you ever describing the details of your setup for plotting. [sent-2, score-0.404]

3 I’m a new R user (convert from matplotlib) and would love to know your thoughts on the ideal setup: do you use mainly the R base? [sent-3, score-0.43]

4 I found ggplot2 nearly indecipherable until a recent eureka moment, and I think its default theme is a waste tremendous ink (all those silly grey backgrounds and grids are really unnecessary), but if you customize that away it can be made to look like ordinary, pretty statistical graphs. [sent-7, score-1.428]

5 Feel free to respond on your blog, but if you do, please remove my name from the post (my colleagues already make fun of me for thinking about visualization too much. [sent-8, score-0.18]

6 Anyway, my response is that I do everything in base graphics (using my own defaults ), and usually I make a graph by using some previous graph as a template. [sent-10, score-0.859]

7 But the beautiful grids of maps (see here , for example), those I did by asking Yu-Sung, Daniel, Yair, etc. [sent-13, score-0.49]

8 , to make them, and then going back and forth making them better. [sent-14, score-0.177]

9 That reminds me: I have to finish a paper I’m writing with Yair about the details of what makes these graphs work. [sent-15, score-0.376]

10 We’ve talked about having these maps made by default in the “mrp” package but I don’t think we’re quite there yet. [sent-16, score-0.501]

11 If I could start over, maybe I’d use lattice or ggplot2. [sent-17, score-0.384]

12 I use what I’m comfortable with, but it’s not always so great. [sent-18, score-0.227]

13 I don’t know that I can make any general recommendations, except that once you have a graph you like, you can use it as a starting point for your next plot. [sent-19, score-0.51]

14 My own graphs have gradually improved over the decades (as I discuss in this presentation). [sent-20, score-0.33]

15 I think someone starting out now should be able to do much better. [sent-21, score-0.213]

16 I’ve purposely included some not-so-fancy graphs here (and some made by others working with me) to emphasize that statistical graphics is an everyday job, it’s not just about creating showcase pieces. [sent-24, score-0.878]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lattice', 0.247), ('grids', 0.24), ('setup', 0.215), ('graphics', 0.195), ('yair', 0.189), ('maps', 0.165), ('base', 0.163), ('graphs', 0.157), ('default', 0.147), ('graph', 0.147), ('eureka', 0.142), ('grey', 0.142), ('use', 0.137), ('customize', 0.134), ('starting', 0.133), ('showcase', 0.128), ('ink', 0.128), ('wishes', 0.12), ('love', 0.116), ('defaults', 0.114), ('details', 0.111), ('purposely', 0.11), ('tremendous', 0.11), ('made', 0.109), ('finish', 0.108), ('unnecessary', 0.104), ('convert', 0.103), ('mrp', 0.1), ('backgrounds', 0.098), ('ordinary', 0.097), ('mainly', 0.096), ('everyday', 0.096), ('gradually', 0.094), ('make', 0.093), ('theme', 0.09), ('anonymous', 0.09), ('comfortable', 0.09), ('waste', 0.088), ('remove', 0.087), ('beautiful', 0.085), ('forth', 0.084), ('creating', 0.083), ('recommendations', 0.083), ('pieces', 0.082), ('user', 0.081), ('talked', 0.08), ('someone', 0.08), ('improved', 0.079), ('daniel', 0.078), ('describing', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

2 0.20172839 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what

3 0.18794318 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.17025113 1661 andrew gelman stats-2013-01-08-Software is as software does

Introduction: We had a recent discussion about statistics packages where people talked about the structure and capabilities of different computer languages. One thing I wanted to add to this discussion is some sociology. To me, a statistics package is not just its code, it’s also its community, it’s what people do with it. R, for example, is nothing special for graphics (again, I think in retrospect my graphs would be better if I’d been making them in Fortran all these years); what makes R graphics work so well is that there’s a clear path from the numbers to the graphs, there’s a tradition in R of postprocessing. In comparison, consider Sas. I’ve never directly used Sas but whenever I’ve seen it used, whether by people working for me or with me or just people down the hall who left Sas output sitting in the printer, in all these cases there’s no postprocessing. It doesn’t look interactive at all. The user runs some procedure and then there are pages and pages and pages of output. The po

5 0.16458355 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

6 0.16387108 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

7 0.16245027 61 andrew gelman stats-2010-05-31-A data visualization manifesto

8 0.14871953 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!

9 0.14564888 1450 andrew gelman stats-2012-08-08-My upcoming talk for the data visualization meetup

10 0.1395826 1308 andrew gelman stats-2012-05-08-chartsnthings !

11 0.13791305 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

12 0.13517612 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

13 0.13409825 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

14 0.12703864 1604 andrew gelman stats-2012-12-04-An epithet I can live with

15 0.11873598 2279 andrew gelman stats-2014-04-02-Am I too negative?

16 0.11498006 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

17 0.11203987 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

18 0.11118901 676 andrew gelman stats-2011-04-23-The payoff: $650. The odds: 1 in 500,000.

19 0.10767397 319 andrew gelman stats-2010-10-04-“Who owns Congress”

20 0.10654274 2056 andrew gelman stats-2013-10-09-Mister P: What’s its secret sauce?

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.192), (1, -0.054), (2, -0.057), (3, 0.102), (4, 0.161), (5, -0.148), (6, -0.077), (7, 0.014), (8, -0.041), (9, -0.024), (10, 0.029), (11, -0.008), (12, 0.022), (13, 0.019), (14, 0.013), (15, -0.023), (16, -0.03), (17, -0.046), (18, -0.021), (19, 0.047), (20, 0.01), (21, -0.011), (22, -0.005), (23, 0.044), (24, -0.005), (25, -0.024), (26, 0.002), (27, 0.031), (28, -0.023), (29, -0.018), (30, 0.038), (31, -0.002), (32, 0.011), (33, -0.003), (34, -0.007), (35, -0.028), (36, 0.028), (37, 0.044), (38, 0.028), (39, 0.014), (40, -0.01), (41, -0.004), (42, 0.013), (43, -0.016), (44, -0.027), (45, -0.026), (46, 0.015), (47, 0.032), (48, 0.007), (49, 0.039)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96486884 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

2 0.87791854 1308 andrew gelman stats-2012-05-08-chartsnthings !

Introduction: Yair pointed me to this awesome blog of how the NYT people make their graphs. This blows away all other stat graphics blogs (including this one). Lots of examples from mockup to first tries to final version. I recognize a lot of what they’re doing from my own experience. Also from my experience it’s hard to get all these details down: once you have the final graph, it’s easy to forget how you go there.

3 0.86713696 319 andrew gelman stats-2010-10-04-“Who owns Congress”

Introduction: Curt Yeske pointed me to this . Wow–these graphs are really hard to read! The old me would’ve said that each of these graphs would be better replaced by a dotplot (or, better still, a series of lineplots showing time trends). The new me would still like the dotplots and lineplots, but I’d say it’s fine to have the eye-grabbing but hard-to-read graphs as is, and then to have the more informative statistical graphics underneath, as it were. The idea is, you’d click on the pretty but hard-to-read “infovis” graphs, and this would then reveal informative “full Cleveland” graphs. And then if you click again you’d get a spreadsheet with the raw numbers. That I’d like to see, as a new model for graphical presentation.

4 0.85836244 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

5 0.85055685 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.83077621 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

7 0.82653928 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

8 0.82531142 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

9 0.82029307 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

10 0.81965399 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

11 0.81955504 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

12 0.81870222 1661 andrew gelman stats-2013-01-08-Software is as software does

13 0.81471288 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

14 0.8094039 1604 andrew gelman stats-2012-12-04-An epithet I can live with

15 0.80698055 2038 andrew gelman stats-2013-09-25-Great graphs of names

16 0.78777188 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

17 0.78241587 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

18 0.78215146 2319 andrew gelman stats-2014-05-05-Can we make better graphs of global temperature history?

19 0.78112257 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

20 0.78005362 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(14, 0.023), (16, 0.095), (24, 0.127), (36, 0.013), (42, 0.037), (55, 0.038), (59, 0.186), (65, 0.01), (66, 0.028), (86, 0.033), (90, 0.011), (95, 0.027), (99, 0.271)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96449876 1716 andrew gelman stats-2013-02-09-iPython Notebook

Introduction: Burak Bayramli writes: I wanted to inform you on iPython Notebook technology – allowing markup, Python code to reside in one document. Someone ported one of your examples from ARM . iPynb file is actually a live document, can be downloaded and reran locally, hence change of code on document means change of images, results. Graphs (as well as text output) which are generated by the code, are placed inside the document automatically. No more referencing image files seperately. For now running notebooks locally require a notebook server, but that part can live “on the cloud” as part of an educational software. Viewers, such as nbviewer.ipython.org, do not even need that much, since all recent results of a notebook are embedded in the notebook itself. A lot of people are excited about this; Also out of nowhere, Alfred P. Sloan Foundation dropped a $1.15 million grant on the developers of ipython which provided some extra energy on the project. Cool. We’ll have to do that ex

2 0.9620378 214 andrew gelman stats-2010-08-17-Probability-processing hardware

Introduction: Lyric Semiconductor posted: For over 60 years, computers have been based on digital computing principles. Data is represented as bits (0s and 1s). Boolean logic gates perform operations on these bits. A processor steps through many of these operations serially in order to perform a function. However, today’s most interesting problems are not at all suited to this approach. Here at Lyric Semiconductor, we are redesigning information processing circuits from the ground up to natively process probabilities: from the gate circuits to the processor architecture to the programming language. As a result, many applications that today require a thousand conventional processors will soon run in just one Lyric processor, providing 1,000x efficiencies in cost, power, and size. Om Malik has some more information, also relating to the team and the business. The fundamental idea is that computing architectures work deterministically, even though the world is fundamentally stochastic.

3 0.94336998 763 andrew gelman stats-2011-06-13-Inventor of Connect Four dies at 91

Introduction: Obit here . I think I have a cousin with the same last name as this guy, so maybe we’re related by marriage in some way. (By that standard we’re also related to Marge Simpson and, I seem to recall, the guy who wrote the scripts for Dark Shadows.)

4 0.92582285 1599 andrew gelman stats-2012-11-30-“The scientific literature must be cleansed of everything that is fraudulent, especially if it involves the work of a leading academic”

Introduction: Someone points me to this report from Tilburg University on disgraced psychology researcher Diederik Stapel. The reports includes bits like this: When the fraud was first discovered, limiting the harm it caused for the victims was a matter of urgency. This was particularly the case for Mr Stapel’s former PhD students and postdoctoral researchers . . . However, the Committees were of the opinion that the main bulk of the work had not yet even started. . . . Journal publications can often leave traces that reach far into and even beyond scientific disciplines. The self-cleansing character of science calls for fraudulent publications to be withdrawn and no longer to proliferate within the literature. In addition, based on their initial impressions, the Committees believed that there were other serious issues within Mr Stapel’s publications . . . This brought into the spotlight a research culture in which this sloppy science, alongside out-and-out fraud, was able to remain undetected

same-blog 5 0.92536664 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

6 0.92427188 34 andrew gelman stats-2010-05-14-Non-academic writings on literature

7 0.9213202 853 andrew gelman stats-2011-08-14-Preferential admissions for children of elite colleges

8 0.91552365 965 andrew gelman stats-2011-10-19-Web-friendly visualizations in R

9 0.90886152 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

10 0.90818757 229 andrew gelman stats-2010-08-24-Bizarre twisty argument about medical diagnostic tests

11 0.90639412 517 andrew gelman stats-2011-01-14-Bayes in China update

12 0.89673072 1380 andrew gelman stats-2012-06-15-Coaching, teaching, and writing

13 0.89554417 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others

14 0.88644838 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?

15 0.88354605 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)

16 0.88257498 199 andrew gelman stats-2010-08-11-Note to semi-spammers

17 0.88077623 1415 andrew gelman stats-2012-07-13-Retractions, retractions: “left-wing enough to not care about truth if it confirms their social theories, right-wing enough to not care as long as they’re getting paid enough”

18 0.88040459 1377 andrew gelman stats-2012-06-13-A question about AIC

19 0.87969989 771 andrew gelman stats-2011-06-16-30 days of statistics

20 0.87396806 1190 andrew gelman stats-2012-02-29-Why “Why”?