andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-252 knowledge-graph by maker-knowledge-mining

252 andrew gelman stats-2010-09-02-R needs a good function to make line plots


meta infos for this blog

Source: html

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 More and more I’m thinking that line plots are great. [sent-1, score-0.661]

2 More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). [sent-2, score-1.9]

3 Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. [sent-3, score-0.757]

4 There’s a big step missing, though, and that is to be able to make these graphs as a default. [sent-4, score-0.299]

5 We have to figure out the right way to structure the data so these graphs come naturally. [sent-5, score-0.324]

6 Then when it’s all working, we can talk the Excel people into implementing our ideas. [sent-6, score-0.207]

7 I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. [sent-7, score-0.752]

8 Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. [sent-10, score-0.297]

9 If I am understanding what you mean by line plots, here are some examples with code . [sent-11, score-0.444]

10 In fact, that website is a tremendous resource for all things data viz in R. [sent-12, score-0.595]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('plots', 0.444), ('line', 0.217), ('viz', 0.189), ('lines', 0.183), ('conway', 0.181), ('comparisons', 0.169), ('grids', 0.169), ('domain', 0.158), ('tremendous', 0.155), ('resource', 0.155), ('accomplished', 0.15), ('graphs', 0.148), ('combinations', 0.147), ('dot', 0.147), ('scales', 0.147), ('drew', 0.145), ('phillips', 0.143), ('whoever', 0.142), ('implementing', 0.14), ('show', 0.139), ('lax', 0.138), ('excel', 0.137), ('microsoft', 0.13), ('gay', 0.119), ('apart', 0.119), ('copy', 0.111), ('trends', 0.108), ('specifically', 0.102), ('structure', 0.099), ('paid', 0.099), ('website', 0.096), ('plot', 0.096), ('asking', 0.095), ('google', 0.09), ('code', 0.087), ('missing', 0.084), ('per', 0.082), ('step', 0.081), ('happy', 0.081), ('common', 0.078), ('figure', 0.077), ('support', 0.075), ('tell', 0.075), ('understanding', 0.072), ('able', 0.07), ('examples', 0.068), ('talk', 0.067), ('ideas', 0.066), ('fact', 0.065), ('three', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

2 0.21962497 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

3 0.18184896 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

Introduction: After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth. I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output. P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that: - print() doesn’t give enough information - summary() gives everything to a zillion decimal places and gives useless things like p-values - plot() gives a bunch

4 0.16600323 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

5 0.16066885 800 andrew gelman stats-2011-07-13-I like lineplots

Introduction: These particular lineplots are called parallel coordinate plots.

6 0.14910835 2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA

7 0.14214659 1531 andrew gelman stats-2012-10-12-Elderpedia

8 0.13281499 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

9 0.13132235 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses

10 0.13033512 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

11 0.12703967 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

12 0.12385246 1800 andrew gelman stats-2013-04-12-Too tired to mock

13 0.12023096 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

14 0.11592511 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

15 0.11135892 965 andrew gelman stats-2011-10-19-Web-friendly visualizations in R

16 0.10441569 2087 andrew gelman stats-2013-11-03-The Employment Nondiscrimination Act is overwhelmingly popular in nearly every one of the 50 states

17 0.10358464 1808 andrew gelman stats-2013-04-17-Excel-bashing

18 0.10247018 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?

19 0.10243453 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

20 0.099609457 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.135), (1, -0.027), (2, -0.005), (3, 0.062), (4, 0.135), (5, -0.096), (6, -0.069), (7, 0.005), (8, -0.036), (9, 0.007), (10, -0.026), (11, -0.02), (12, -0.021), (13, 0.008), (14, 0.006), (15, -0.005), (16, 0.025), (17, -0.05), (18, -0.005), (19, -0.0), (20, -0.006), (21, 0.036), (22, -0.014), (23, -0.008), (24, -0.008), (25, -0.027), (26, 0.025), (27, -0.026), (28, 0.043), (29, 0.035), (30, 0.051), (31, -0.029), (32, -0.012), (33, -0.008), (34, -0.004), (35, -0.026), (36, 0.044), (37, 0.037), (38, 0.019), (39, -0.042), (40, 0.052), (41, 0.049), (42, -0.034), (43, 0.066), (44, -0.032), (45, -0.01), (46, -0.052), (47, 0.03), (48, 0.043), (49, -0.085)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97327566 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

2 0.83274144 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

3 0.76915687 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

Introduction: After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth. I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output. P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that: - print() doesn’t give enough information - summary() gives everything to a zillion decimal places and gives useless things like p-values - plot() gives a bunch

4 0.75531483 800 andrew gelman stats-2011-07-13-I like lineplots

Introduction: These particular lineplots are called parallel coordinate plots.

5 0.73321187 372 andrew gelman stats-2010-10-27-A use for tables (really)

Introduction: After our recent discussion of semigraphic displays, Jay Ulfelder sent along a semigraphic table from his recent book. He notes, “When countries are the units of analysis, it’s nice that you can use three-letter codes, so all the proper names have the same visual weight.” Ultimately I think that graphs win over tables for display. However in our work we spend a lot of time looking at raw data, often simply to understand what data we have. This use of tables has, I think, been forgotten in the statistical graphics literature. So I’d like to refocus the eternal tables vs. graphs discussion. If the goal is to present information, comparisons, relationships, models, data, etc etc, graphs win. Forget about tables. But . . . when you’re looking at your data, it can often help to see the raw numbers. Once you’re looking at numbers, it makes sense to organize them. Even a displayed matrix in R is a form of table, after all. And once you’re making a table, it can be sensible to

6 0.72941011 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

7 0.72214347 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

8 0.72176611 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

9 0.71525121 61 andrew gelman stats-2010-05-31-A data visualization manifesto

10 0.70047396 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

11 0.69517046 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

12 0.69249862 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

13 0.684605 488 andrew gelman stats-2010-12-27-Graph of the year

14 0.6831575 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

15 0.67903084 1116 andrew gelman stats-2012-01-13-Infographic on the economy

16 0.67793304 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

17 0.67700326 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

18 0.67244273 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

19 0.67054254 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

20 0.66889262 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(14, 0.063), (16, 0.128), (21, 0.066), (23, 0.02), (24, 0.185), (55, 0.057), (58, 0.023), (63, 0.019), (75, 0.018), (77, 0.029), (87, 0.035), (88, 0.019), (99, 0.232)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96035558 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

2 0.93140113 1471 andrew gelman stats-2012-08-27-Why do we never see a full decision analysis for a clinical trial?

Introduction: Peter Thall writes: Some years ago, after I gave a talk at Columbia that you attended, you told me that you would like to see a decision-theoretic analysis formulated and carried out to completion by Donald Berry, who had been quite vocal for some time about the importance of such a “fully Bayesian” analysis. I do not work with Berry. But, in recent years I have begun to do utility-based clinical trial design. The trial described in the attached paper [by Thall and Hoang Nguyen] enrolled its first child very recently. While the methodology is not terribly sophisticated, I consider this to be one of the most ethical trials that I have designed. The utility was elicited from the two trial PIs. When we had the pre-trial start-up meeting, a third oncologist looked hard at the utility table, and he said that he agreed completely with the numerical utilities. I doubt that this trial will cure this type of brain tumor, but I do think that the design gives the children a better

3 0.92731464 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

Introduction: I received the following message from “Patricia Lopez” of “Premium Link Ads”: Hello, I am interested in placing a text link on your page: http://andrewgelman.com/2011/07/super_sam_fuld/. The link would point to a page on a website that is relevant to your page and may be useful to your site visitors. We would be happy to compensate you for your time if it is something we are able to work out. The best way to reach me is through a direct response to this email. This will help me get back to you about the right link request. Please let me know if you are interested, and if not thanks for your time. Thanks. Usually I just ignore these, but after our recent discussion I decided to reply. I wrote: How much do you pay? But no answer. I wonder what’s going on? I mean, why bother sending the email in the first place if you’re not going to follow up?

4 0.92688549 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

Introduction: Vincent Yip writes: I have read your paper [with Kobi Abayomi and Marc Levy] regarding multiple imputation application. In order to diagnostic my imputed data, I used Kolmogorov-Smirnov (K-S) tests to compare the distribution differences between the imputed and observed values of a single attribute as mentioned in your paper. My question is: For example I have this attribute X with the following data: (NA = missing) Original dataset: 1, NA, 3, 4, 1, 5, NA Imputed dataset: 1, 2 , 3, 4, 1, 5, 6 a) in order to run the KS test, will I treat the observed data as 1, 3, 4,1, 5? b) and for the observed data, will I treat 1, 2 , 3, 4, 1, 5, 6 as the imputed dataset for the K-S test? or just 2 ,6? c) if I used m=5, I will have 5 set of imputed data sets. How would I apply K-S test to 5 of them and compare to the single observed distribution? Do I combine the 5 imputed data set into one by averaging each imputed values so I get one single imputed data and compare with the ob

5 0.92626643 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

Introduction: Unfortunately, when we deal with scientists, statisticians are often put in a setting reminiscent of Arrow’s paradox, where we are asked to provide estimates that are informative and unbiased and confidence statements that are correct conditional on the data and also on the underlying true parameter. [It's not generally possible for an estimate to do all these things at the same time -- ed.] Larry Wasserman feels that scientists are truly frequentist, and Don Rubin has told me how he feels that scientists interpret all statistical estimates Bayesianly. I have no doubt that both Larry and Don are correct. Voters want lower taxes and more services, and scientists want both Bayesian and frequency coverage; as the saying goes, everybody wants to go to heaven but nobody wants to die.

6 0.92568922 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

7 0.92460489 1871 andrew gelman stats-2013-05-27-Annals of spam

8 0.92386132 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

9 0.92222488 488 andrew gelman stats-2010-12-27-Graph of the year

10 0.92145646 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

11 0.92115438 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

12 0.92095304 807 andrew gelman stats-2011-07-17-Macro causality

13 0.92021412 503 andrew gelman stats-2011-01-04-Clarity on my email policy

14 0.91884685 1755 andrew gelman stats-2013-03-09-Plaig

15 0.91868252 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

16 0.91853583 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

17 0.91747528 1881 andrew gelman stats-2013-06-03-Boot

18 0.91722947 1959 andrew gelman stats-2013-07-28-50 shades of gray: A research story

19 0.91717881 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

20 0.91632712 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!