andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2059 knowledge-graph by maker-knowledge-mining

2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA


meta infos for this blog

Source: html

Introduction: Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. visualizations to share the final conclusions - Viz for identifying promising features for use in modeling - Viz and statistical significance, especially when the data has plenty of dependence structure Also, these cascade visualizations are perhaps worth comparing with some of very large cascades on Facebook made by my colleagues Alex Dow, Lada Adamic, and Adrien Friggeri. I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. (I think that’s what Bob would call a “bike shed” comment.)


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. [sent-1, score-2.127]

2 I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. [sent-3, score-0.601]

3 (I think that’s what Bob would call a “bike shed” comment. [sent-4, score-0.104]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('viz', 0.693), ('info', 0.213), ('visualizations', 0.191), ('dow', 0.147), ('cascade', 0.147), ('aral', 0.147), ('sinan', 0.147), ('antarctica', 0.139), ('cascades', 0.139), ('adrien', 0.139), ('touches', 0.128), ('eckles', 0.128), ('shed', 0.121), ('boundaries', 0.116), ('promising', 0.112), ('greenland', 0.11), ('bike', 0.107), ('dependence', 0.103), ('facebook', 0.103), ('rid', 0.101), ('dean', 0.1), ('plenty', 0.099), ('ongoing', 0.098), ('identifying', 0.095), ('themes', 0.094), ('alex', 0.094), ('maps', 0.085), ('bob', 0.081), ('international', 0.08), ('final', 0.08), ('features', 0.075), ('goals', 0.073), ('structure', 0.073), ('conclusions', 0.07), ('comparing', 0.069), ('team', 0.067), ('country', 0.067), ('share', 0.067), ('significance', 0.065), ('colleagues', 0.061), ('call', 0.055), ('graphs', 0.054), ('worth', 0.054), ('especially', 0.053), ('modeling', 0.051), ('would', 0.049), ('several', 0.046), ('large', 0.044), ('interested', 0.043), ('made', 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA

Introduction: Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. visualizations to share the final conclusions - Viz for identifying promising features for use in modeling - Viz and statistical significance, especially when the data has plenty of dependence structure Also, these cascade visualizations are perhaps worth comparing with some of very large cascades on Facebook made by my colleagues Alex Dow, Lada Adamic, and Adrien Friggeri. I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. (I think that’s what Bob would call a “bike shed” comment.)

2 0.14910835 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

3 0.096200973 1604 andrew gelman stats-2012-12-04-An epithet I can live with

Introduction: Here . Indeed, I’d much rather be a legend than a myth. I just want to clarify one thing. Walter Hickey writes: [Antony Unwin and Andrew Gelman] collaborated on this presentation where they take a hard look at what’s wrong with the recent trends of data visualization and infographics. The takeaway is that while there have been great leaps in visualization technology, some of the visualizations that have garnered the highest praises have actually been lacking in a number of key areas. Specifically, the pair does a takedown of the top visualizations of 2008 as decided by the popular statistics blog Flowing Data. This is a fair summary, but I want to emphasize that, although our dislike of some award-winning visualizations is central to our argument, it is only the first part of our story. As Antony and I worked more on our paper, and especially after seeing the discussions by Robert Kosara, Stephen Few, Hadley Wickham, and Paul Murrell (all to appear in Journal of Computati

4 0.085490711 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

Introduction: Alberto Cairo tells a fascinating story about John Snow, H. W. Acland, and the Mythmaking Problem: Every human community—nations, ethnic and cultural groups, professional guilds—inevitably raises a few of its members to the status of heroes and weaves myths around them. . . . The visual display of information is no stranger to heroes and myth. In fact, being a set of disciplines with a relatively small amount of practitioners and researchers, it has generated a staggering number of heroes, perhaps as a morale-enhancing mechanism. Most of us have heard of the wonders of William Playfair’s Commercial and Political Atlas, Florence Nightingale’s coxcomb charts, Charles Joseph Minard’s Napoleon’s march diagram, and Henry Beck’s 1933 redesign of the London Underground map. . . . Cairo’s goal, I think, is not to disparage these great pioneers of graphics but rather to put their work in perspective, recognizing the work of their excellent contemporaries. I would like to echo Cairo’

5 0.076630354 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

Introduction: Ricardo Pietrobon writes, regarding my post from last year on attitudes toward data graphics, Wouldn’t it be the case to start formally studying the usability of graphics from a cognitive perspective? with platforms such as the mechanical turk it should be fairly straightforward to test alternative methods and come to some conclusions about what might be more informative and what might better assist in supporting decisions. btw, my guess is that these two constructs might not necessarily agree with each other. And Jessica Hullman provides some background: Measuring success for the different goals that you hint at in your article is indeed challenging, and I don’t think that most visualization researchers would claim to have met this challenge (myself included). Visualization researchers may know the user psychology well when it comes to certain dimensions of a graph’s effectiveness (such as quick and accurate responses), but I wouldn’t agree with this statement as a gene

6 0.068406351 2092 andrew gelman stats-2013-11-07-Data visualizations gone beautifully wrong

7 0.067432605 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

8 0.065696843 1536 andrew gelman stats-2012-10-16-Using economics to reduce bike theft

9 0.063891694 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

10 0.063452981 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

11 0.061613858 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

12 0.059781481 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

13 0.056889087 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

14 0.056598108 2038 andrew gelman stats-2013-09-25-Great graphs of names

15 0.055433318 1481 andrew gelman stats-2012-09-04-Cool one-day miniconference at Columbia Fri 12 Oct on computational and online social science

16 0.054176569 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

17 0.053502075 1685 andrew gelman stats-2013-01-21-Class on computational social science this semester, Fridays, 1:00-3:40pm

18 0.0510673 1056 andrew gelman stats-2011-12-13-Drawing to Learn in Science

19 0.047307454 349 andrew gelman stats-2010-10-18-Bike shelf

20 0.046919793 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.071), (1, -0.009), (2, -0.01), (3, 0.007), (4, 0.045), (5, -0.034), (6, -0.048), (7, 0.0), (8, -0.032), (9, 0.005), (10, -0.012), (11, -0.01), (12, 0.002), (13, -0.015), (14, 0.006), (15, 0.009), (16, -0.009), (17, -0.026), (18, 0.006), (19, 0.015), (20, 0.001), (21, 0.003), (22, 0.005), (23, 0.019), (24, -0.019), (25, -0.017), (26, 0.017), (27, 0.011), (28, 0.005), (29, -0.002), (30, 0.007), (31, 0.004), (32, -0.012), (33, 0.003), (34, 0.002), (35, 0.011), (36, 0.021), (37, 0.014), (38, -0.002), (39, 0.016), (40, -0.004), (41, 0.003), (42, -0.014), (43, 0.009), (44, -0.0), (45, 0.005), (46, 0.007), (47, -0.003), (48, -0.022), (49, -0.01)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93274301 2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA

Introduction: Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. visualizations to share the final conclusions - Viz for identifying promising features for use in modeling - Viz and statistical significance, especially when the data has plenty of dependence structure Also, these cascade visualizations are perhaps worth comparing with some of very large cascades on Facebook made by my colleagues Alex Dow, Lada Adamic, and Adrien Friggeri. I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. (I think that’s what Bob would call a “bike shed” comment.)

2 0.81029111 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

Introduction: Ricardo Pietrobon writes, regarding my post from last year on attitudes toward data graphics, Wouldn’t it be the case to start formally studying the usability of graphics from a cognitive perspective? with platforms such as the mechanical turk it should be fairly straightforward to test alternative methods and come to some conclusions about what might be more informative and what might better assist in supporting decisions. btw, my guess is that these two constructs might not necessarily agree with each other. And Jessica Hullman provides some background: Measuring success for the different goals that you hint at in your article is indeed challenging, and I don’t think that most visualization researchers would claim to have met this challenge (myself included). Visualization researchers may know the user psychology well when it comes to certain dimensions of a graph’s effectiveness (such as quick and accurate responses), but I wouldn’t agree with this statement as a gene

3 0.79883122 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.78645068 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

5 0.7832464 372 andrew gelman stats-2010-10-27-A use for tables (really)

Introduction: After our recent discussion of semigraphic displays, Jay Ulfelder sent along a semigraphic table from his recent book. He notes, “When countries are the units of analysis, it’s nice that you can use three-letter codes, so all the proper names have the same visual weight.” Ultimately I think that graphs win over tables for display. However in our work we spend a lot of time looking at raw data, often simply to understand what data we have. This use of tables has, I think, been forgotten in the statistical graphics literature. So I’d like to refocus the eternal tables vs. graphs discussion. If the goal is to present information, comparisons, relationships, models, data, etc etc, graphs win. Forget about tables. But . . . when you’re looking at your data, it can often help to see the raw numbers. Once you’re looking at numbers, it makes sense to organize them. Even a displayed matrix in R is a form of table, after all. And once you’re making a table, it can be sensible to

6 0.77107567 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

7 0.77046043 1604 andrew gelman stats-2012-12-04-An epithet I can live with

8 0.7636354 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

9 0.76259083 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

10 0.75939947 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes

11 0.75743216 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

12 0.75595903 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

13 0.74951673 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

14 0.74701548 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

15 0.74210566 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

16 0.73886138 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

17 0.72132522 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

18 0.71737748 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship

19 0.7049222 1308 andrew gelman stats-2012-05-08-chartsnthings !

20 0.7017951 863 andrew gelman stats-2011-08-21-Bad graph


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.019), (14, 0.104), (16, 0.025), (23, 0.02), (24, 0.065), (40, 0.023), (44, 0.059), (52, 0.035), (58, 0.019), (77, 0.15), (89, 0.063), (99, 0.287)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95223284 2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA

Introduction: Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. visualizations to share the final conclusions - Viz for identifying promising features for use in modeling - Viz and statistical significance, especially when the data has plenty of dependence structure Also, these cascade visualizations are perhaps worth comparing with some of very large cascades on Facebook made by my colleagues Alex Dow, Lada Adamic, and Adrien Friggeri. I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. (I think that’s what Bob would call a “bike shed” comment.)

2 0.90833801 1481 andrew gelman stats-2012-09-04-Cool one-day miniconference at Columbia Fri 12 Oct on computational and online social science

Introduction: One thing we do here at the Applied Statistics Center is hold mini-conferences. The next one looks really cool. It’s organized by Sharad Goel and Jake Hofman (Microsoft Research, formerly at Yahoo Research), David Park (Columbia University), and Sergei Vassilvitskii (Google). As with our other conferences, one of our goals is to mix the academic and nonacademic research communities. Here’s the website for the workshop, and here’s the announcement from the organizers: With an explosion of data on every aspect of our everyday existence — from what we buy, to where we travel, to who we know — we are able to observe human behavior with granularity largely thought impossible just a decade ago. The growth of such online activity has further facilitated the design of web-based experiments, enhancing both the scale and efficiency of traditional methods. Together these advances have created an unprecedented opportunity to address longstanding questions in the social sciences, rang

3 0.89781368 57 andrew gelman stats-2010-05-29-Roth and Amsterdam

Introduction: I used to think that fiction is about making up stories, but in recent years I’ve decided that fiction is really more of a method of telling true stories. One thing fiction allows you to do is explore what-if scenarios. I recently read two books that made me think about this: The Counterlife by Philip Roth and Things We Didn’t See Coming by Steven Amsterdam. Both books are explicitly about contingencies and possibilities: Roth’s tells a sequence of related but contradictory stories involving his Philip Roth-like (of course) protagonist, and Amsterdam’s is based on an alternative present/future. (I picture Amsterdam’s book as being set in Australia, but maybe I’m just imagining this based on my knowledge that the book was written and published in that country.) I found both books fascinating, partly because of the characters’ voices but especially because they both seemed to exemplify George Box’s dictum that to understand a system you have to perturb it. So, yes, literature an

4 0.89592493 978 andrew gelman stats-2011-10-28-Cool job opening with brilliant researchers at Yahoo

Introduction: Duncan Watts writes: The Human Social Dynamics Group in Yahoo Research is seeking highly qualified candidates for a post-doctoral research scientist position. The Human and Social Dynamics group is devoted to understanding the interplay between individual-level behavior (e.g. how people make decisions about what music they like, which dates to go on, or which groups to join) and the social environment in which individual behavior necessarily plays itself out. In particular, we are interested in: * Structure and evolution of social groups and networks * Decision making, social influence, diffusion, and collective decisions * Networking and collaborative problem solving. The intrinsically multi-disciplinary and cross-cutting nature of the subject demands an eclectic range of researchers, both in terms of domain-expertise (e.g. decision sciences, social psychology, sociology) and technical skills (e.g. statistical analysis, mathematical modeling, computer simulations, design o

5 0.89388573 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

Introduction: David Sparks writes: I am experimenting with the mapping/visualization of survey response data, with a particular focus on using transparency to convey uncertainty. See some examples here . Do you think the examples are successful at communicating both local values of the variable of interest, as well as the lack of information in certain places? Also, do you have any general advice for choosing an approach to spatially smoothing the data in a way that preserves local features, but prevents individual respondents from standing out? I have experimented a lot with smoothing in these maps, and the cost of preventing the Midwest and West from looking “spotty” is the oversmoothing of the Northeast. My quick impression is that the graphs are more pretty than they are informative. But “pretty” is not such a bad thing! The conveying-information part is more difficult: to me, the graphs seem to be displaying a somewhat confusing mix of opinion level and population density. Consider

6 0.89077431 1784 andrew gelman stats-2013-04-01-Wolfram on Mandelbrot

7 0.8866756 1561 andrew gelman stats-2012-11-04-Someone is wrong on the internet

8 0.88650477 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

9 0.88574743 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

10 0.88003623 1373 andrew gelman stats-2012-06-09-Cognitive psychology research helps us understand confusion of Jonathan Haidt and others about working-class voters

11 0.86765277 1071 andrew gelman stats-2011-12-19-“NYU Professor Claims He Was Fired for Giving James Franco a D”

12 0.86701328 93 andrew gelman stats-2010-06-17-My proposal for making college admissions fairer

13 0.8641119 1461 andrew gelman stats-2012-08-17-Graphs showing uncertainty using lighter intensities for the lines that go further from the center, to de-emphasize the edges

14 0.8632642 1809 andrew gelman stats-2013-04-17-NUTS discussed on Xi’an’s Og

15 0.86228985 128 andrew gelman stats-2010-07-05-The greatest works of statistics never published

16 0.86012018 2054 andrew gelman stats-2013-10-07-Bing is preferred to Google by people who aren’t like me

17 0.85968828 1948 andrew gelman stats-2013-07-21-Bayes related

18 0.85968006 216 andrew gelman stats-2010-08-18-More forecasting competitions

19 0.85874605 1604 andrew gelman stats-2012-12-04-An epithet I can live with

20 0.85736966 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery