andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1669 knowledge-graph by maker-knowledge-mining

1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph


meta infos for this blog

Source: html

Introduction: The Organisation for Economic Co-operation and Development reports that the following project from Krisztina Szucs and Mate Cziner has won their visualization challenge, “launched in September 2012 to solicit visualisations based on the OECD’s data-rich Education at a Glance report”: (The graph is interactive. Click on the above image and click again to see the full version.) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . . . [The winning entry] takes a detailed look at public vs. private and men vs. women for selected countries . . . The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. I used boldface to highlight a p


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Click on the above image and click again to see the full version. [sent-2, score-0.113]

2 ) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . [sent-3, score-0.45]

3 [The winning entry] takes a detailed look at public vs. [sent-6, score-0.296]

4 The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. [sent-11, score-0.312]

5 Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. [sent-12, score-0.106]

6 To be frank, I don’t think many people will actually learn much at all about education or education statistics from playing with the Szucs and Cziner visualization, but it does a good job of selling the topic in the sense of making the numbers look potentially interesting and surprising. [sent-15, score-0.667]

7 More here from Patrick Love at the OECD, who writes: They chose Krisztina and Maté’s graph from over 30 entries because it “does a great job of breaking down the complex interplay between costs and returns into a form that is easy to compare”. [sent-16, score-0.794]

8 And also because “instead of the many-country approach used by most entries, the project takes a detailed look at public vs. [sent-17, score-0.387]

9 women for three selected countries (which you can change)”. [sent-19, score-0.442]

10 That I can’t see the benefit of, given that the graph is interactive. [sent-23, score-0.11]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('cziner', 0.35), ('szucs', 0.35), ('krisztina', 0.234), ('oecd', 0.213), ('entries', 0.185), ('education', 0.18), ('selected', 0.18), ('countries', 0.158), ('private', 0.125), ('detailed', 0.123), ('costs', 0.115), ('click', 0.113), ('visualization', 0.111), ('graph', 0.11), ('men', 0.109), ('mat', 0.106), ('visualisation', 0.106), ('sucking', 0.106), ('lauded', 0.106), ('women', 0.104), ('obscurity', 0.1), ('mate', 0.096), ('boldface', 0.096), ('organisation', 0.096), ('interplay', 0.096), ('takes', 0.091), ('project', 0.091), ('solicit', 0.09), ('chew', 0.09), ('judges', 0.088), ('look', 0.082), ('patrick', 0.081), ('glance', 0.079), ('encourages', 0.079), ('launched', 0.078), ('economic', 0.078), ('job', 0.078), ('investment', 0.077), ('paradoxically', 0.076), ('slope', 0.075), ('highlight', 0.074), ('numbers', 0.074), ('selling', 0.073), ('interactive', 0.072), ('antony', 0.072), ('chose', 0.07), ('returns', 0.07), ('format', 0.07), ('puzzle', 0.07), ('breaking', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

Introduction: The Organisation for Economic Co-operation and Development reports that the following project from Krisztina Szucs and Mate Cziner has won their visualization challenge, “launched in September 2012 to solicit visualisations based on the OECD’s data-rich Education at a Glance report”: (The graph is interactive. Click on the above image and click again to see the full version.) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . . . [The winning entry] takes a detailed look at public vs. private and men vs. women for selected countries . . . The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. I used boldface to highlight a p

2 0.11975062 709 andrew gelman stats-2011-05-13-D. Kahneman serves up a wacky counterfactual

Introduction: I followed a link from Tyler Cowen to this bit by Daniel Kahneman: Education is an important determinant of income — one of the most important — but it is less important than most people think. If everyone had the same education, the inequality of income would be reduced by less than 10%. When you focus on education you neglect the myriad other factors that determine income. The differences of income among people who have the same education are huge. I think I know what he’s saying–if you regress income on education and other factors, and then you take education out of the model, R-squared decreases by 10%. Or something like that. Not necessarily R-squared, maybe you fit the big model, then get predictions for everyone putting in the mean value for education and look at the sd of incomes or the Gini index or whatever. Or something else along those lines. My problem is with the counterfactual: “If everyone had the same education . . .” I have a couple problems with this

3 0.11406711 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

4 0.10048159 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

5 0.10026252 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics

Introduction: Antony Unwin writes: I [Unwin] find it an interesting exercise for students to ask them to write headlines (and subheadlines) for graphics, both for ones they have drawn themselves and for published ones. The results are sometimes depressing, often thought-provoking and occasionally highly entertaining. This seems like a great idea, both for teaching students how to read a graph and also for teaching how to make a graph. I’ve long said that when making a graph (or, for that matter, a table), you want to think about what message the reader will get out of it. “Displaying a bunch of numbers” doesn’t cut it.

6 0.098720364 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

7 0.094860688 91 andrew gelman stats-2010-06-16-RSS mess

8 0.088276088 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

9 0.086536273 225 andrew gelman stats-2010-08-23-Getting into hot water over hot graphics

10 0.086040512 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

11 0.085781828 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

12 0.082808167 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

13 0.081768662 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research

14 0.08124543 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

15 0.080447122 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

16 0.080083959 794 andrew gelman stats-2011-07-09-The quest for the holy graph

17 0.078798652 790 andrew gelman stats-2011-07-08-Blog in motion

18 0.078762047 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

19 0.077925868 196 andrew gelman stats-2010-08-10-The U.S. as welfare state

20 0.076666489 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.123), (1, -0.059), (2, 0.011), (3, 0.03), (4, 0.087), (5, -0.036), (6, -0.068), (7, 0.029), (8, -0.057), (9, 0.001), (10, -0.021), (11, -0.02), (12, -0.029), (13, 0.027), (14, -0.005), (15, 0.006), (16, 0.053), (17, 0.004), (18, -0.013), (19, 0.012), (20, -0.006), (21, 0.008), (22, -0.027), (23, 0.001), (24, 0.028), (25, 0.01), (26, -0.013), (27, 0.019), (28, -0.01), (29, -0.002), (30, -0.019), (31, -0.009), (32, -0.013), (33, -0.042), (34, 0.0), (35, 0.005), (36, 0.003), (37, -0.01), (38, 0.029), (39, 0.023), (40, -0.046), (41, -0.043), (42, 0.003), (43, -0.001), (44, 0.018), (45, 0.011), (46, 0.037), (47, 0.004), (48, -0.063), (49, 0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9631471 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

Introduction: The Organisation for Economic Co-operation and Development reports that the following project from Krisztina Szucs and Mate Cziner has won their visualization challenge, “launched in September 2012 to solicit visualisations based on the OECD’s data-rich Education at a Glance report”: (The graph is interactive. Click on the above image and click again to see the full version.) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . . . [The winning entry] takes a detailed look at public vs. private and men vs. women for selected countries . . . The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. I used boldface to highlight a p

2 0.7768147 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

3 0.74867851 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can

4 0.74582541 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

Introduction: Ricardo Pietrobon writes, regarding my post from last year on attitudes toward data graphics, Wouldn’t it be the case to start formally studying the usability of graphics from a cognitive perspective? with platforms such as the mechanical turk it should be fairly straightforward to test alternative methods and come to some conclusions about what might be more informative and what might better assist in supporting decisions. btw, my guess is that these two constructs might not necessarily agree with each other. And Jessica Hullman provides some background: Measuring success for the different goals that you hint at in your article is indeed challenging, and I don’t think that most visualization researchers would claim to have met this challenge (myself included). Visualization researchers may know the user psychology well when it comes to certain dimensions of a graph’s effectiveness (such as quick and accurate responses), but I wouldn’t agree with this statement as a gene

5 0.73289162 2205 andrew gelman stats-2014-02-10-More on US health care overkill

Introduction: Paul Alper writes: You recently posted my moving and widening the goalposts contention. In it, I mentioned “how diagnoses increase markedly while deaths are flatlined” indicating that we are being overdiagnosed and overtreated. Above are 5 frightening graphs which illustrate the phenomenon. Defenders of the system might (ludicrously) contend that it is precisely the aggressive medical care that is responsible for keeping the cancers under control. The prostate cancer graph is particularly interesting because it shows the peaking of the PSA-driven cause of treatment in the 1990s which then falls off as the evidence accumulates that the PSA was far from a perfect indicator. In contrast is the thyroid cancer which zooms skyward even as the death rate is absolutely (dead) flat. And of course here’s the famous cross-country comparison that some find “ schlocky ” but which I (and many others) find compelling :

6 0.73110682 488 andrew gelman stats-2010-12-27-Graph of the year

7 0.72484654 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

8 0.69441551 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

9 0.68898332 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

10 0.68780941 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

11 0.68704003 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

12 0.68284923 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

13 0.67965186 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

14 0.67626792 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)

15 0.67487442 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

16 0.6711275 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

17 0.6577934 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

18 0.65737408 671 andrew gelman stats-2011-04-20-One more time-use graph

19 0.65594506 1253 andrew gelman stats-2012-04-08-Technology speedup graph

20 0.65446615 289 andrew gelman stats-2010-09-21-“How segregated is your city?”: A story of why every graph, no matter how clear it seems to be, needs a caption to anchor the reader in some numbers


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.043), (9, 0.012), (16, 0.017), (24, 0.104), (41, 0.237), (53, 0.031), (69, 0.011), (72, 0.042), (84, 0.015), (86, 0.011), (87, 0.028), (95, 0.042), (99, 0.276)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.92097211 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

Introduction: John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write: 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . . In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . . The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes. They also find that, if you start with the prediction based on parents’ heigh

2 0.90452945 685 andrew gelman stats-2011-04-29-Data mining and allergies

Introduction: With all this data floating around, there are some interesting analyses one can do. I came across “The Association of Tree Pollen Concentration Peaks and Allergy Medication Sales in New York City: 2003-2008″ by Perry Sheffield . There they correlate pollen counts with anti-allergy medicine sales – and indeed find that two days after high pollen counts, the medicine sales are the highest. Of course, it would be interesting to play with the data to see *what* tree is actually causing the sales to increase the most. Perhaps this would help the arborists what trees to plant. At the moment they seem to be following a rather sexist approach to tree planting: Ogren says the city could solve the problem by planting only female trees, which don’t produce pollen like male trees do. City arborists shy away from females because many produce messy – or in the case of ginkgos, smelly – fruit that litters sidewalks. In Ogren’s opinion, that’s a mistake. He says the females only pro

same-blog 3 0.9005481 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

Introduction: The Organisation for Economic Co-operation and Development reports that the following project from Krisztina Szucs and Mate Cziner has won their visualization challenge, “launched in September 2012 to solicit visualisations based on the OECD’s data-rich Education at a Glance report”: (The graph is interactive. Click on the above image and click again to see the full version.) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . . . [The winning entry] takes a detailed look at public vs. private and men vs. women for selected countries . . . The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. I used boldface to highlight a p

4 0.8924979 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

Introduction: In politics we’re familiar with the non-apology apology (well described in Wikipedia as “a statement that has the form of an apology but does not express the expected contrition”). Here’s the scientific equivalent: the non-retraction retraction. Sanjay Srivastava points to an amusing yet barfable story of a pair of researchers who (inadvertently, I assume) made a data coding error and were eventually moved to issue a correction notice, but even then refused to fully admit their error. As Srivastava puts it, the story “ended up with Lew [Goldberg] and colleagues [Kibeom Lee and Michael Ashton] publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal.” From the comment on the erratum: In their “erratum and addendum,” Anderson and Ones (this issue) explained that we had brought their attention to the “potential” of a “possible” misalignment and described the results computed from re-aligned data as being based on a “post-ho

5 0.88146222 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati

6 0.87579978 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

7 0.87343025 516 andrew gelman stats-2011-01-14-A new idea for a science core course based entirely on computer simulation

8 0.87042201 2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density

9 0.85928464 1300 andrew gelman stats-2012-05-05-Recently in the sister blog

10 0.85439897 1013 andrew gelman stats-2011-11-16-My talk at Math for America on Saturday

11 0.85007244 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

12 0.84244001 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!

13 0.83693719 1816 andrew gelman stats-2013-04-21-Exponential increase in the number of stat majors

14 0.83295327 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

15 0.82685167 2202 andrew gelman stats-2014-02-07-Outrage of the week

16 0.82206905 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

17 0.81295013 830 andrew gelman stats-2011-07-29-Introductory overview lectures at the Joint Statistical Meetings in Miami this coming week

18 0.81199872 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

19 0.80383456 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

20 0.80304712 2222 andrew gelman stats-2014-02-24-On deck this week