andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2065 knowledge-graph by maker-knowledge-mining

2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect


meta infos for this blog

Source: html

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Robert Gonzalez reports on some beautiful graphs from John Nelson. [sent-1, score-0.087]

2 Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years. [sent-2, score-0.146]

3 Then the lights fade into faint pockets of pink. [sent-3, score-0.208]

4 I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. [sent-4, score-0.09]

5 Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. [sent-5, score-0.426]

6 The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. [sent-6, score-0.366]

7 And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where they live – let alone where they’re born. [sent-7, score-0.348]

8 But between the ages of 15 and 19, something interesting happens. [sent-8, score-0.105]

9 As Nelson writes on his blog: We are in the age-span where teens/young adults can choose where to live. [sent-9, score-0.097]

10 Immediately we see clusters of females and, to a lesser extent, clusters of males. [sent-11, score-0.242]

11 We also start to see the filling of Rikers Island with green dots as young men begin to populate the jail complex. [sent-17, score-0.113]

12 In their early twenties, for example, professional women tend to gather in Midtown Manhattan, while swaths of early-twenty-something masculinity emerge in places like the SUNY Maritime College, and Yeshiva University. [sent-22, score-0.107]

13 The forties and fifties are characterized by a re-segregation of genders, and a thinning population. [sent-26, score-0.212]

14 Women outnumber the remaining men at a rate of better than two to one. [sent-31, score-0.182]

15 Various retirement communities popular with women become apparent, almost as strongly as their geographic preferences in their teens and twenties. [sent-32, score-0.286]

16 These maps are a beautiful illustration of the Chris Rock effect . [sent-36, score-0.172]

17 But he says it so well that we get a shock of recognition, the joy of relearning what we already know, but hearing it in a new way that makes us think more deeply about all sorts of related topics. [sent-38, score-0.143]

18 Statisticians, following John Tukey and Bill Cleveland, emphasize the ability of graphical data displays to reveal things that we have never thought of before. [sent-40, score-0.085]

19 In contrast, graphics designers celebrate innovative designs and visual juxtapositions that reveal interesting aspects of data but without highlighting any particular comparisons. [sent-41, score-0.152]

20 It is way more truthful a means of presenting relative geographic dispersion and affiliation than, say, choropleth mapping, which will be the carto-whipping-post of 2013. [sent-48, score-0.318]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nelson', 0.431), ('rock', 0.3), ('gonzalez', 0.241), ('chris', 0.219), ('census', 0.134), ('clusters', 0.121), ('men', 0.113), ('women', 0.107), ('geographic', 0.106), ('ages', 0.105), ('choose', 0.097), ('counts', 0.09), ('beautiful', 0.087), ('maps', 0.085), ('reveal', 0.085), ('map', 0.08), ('segregated', 0.073), ('tract', 0.073), ('dispersion', 0.073), ('eras', 0.073), ('maritime', 0.073), ('adolescents', 0.073), ('teens', 0.073), ('vacant', 0.073), ('fifties', 0.073), ('relearning', 0.073), ('spans', 0.073), ('choropleth', 0.073), ('faint', 0.073), ('cartographers', 0.073), ('zoning', 0.073), ('thinning', 0.073), ('homogenous', 0.073), ('teen', 0.073), ('tracts', 0.073), ('recognizable', 0.073), ('continues', 0.071), ('already', 0.07), ('pockets', 0.069), ('morningside', 0.069), ('outnumber', 0.069), ('infants', 0.069), ('genders', 0.069), ('graphics', 0.067), ('evenly', 0.066), ('forties', 0.066), ('fade', 0.066), ('twenties', 0.066), ('affiliation', 0.066), ('midtown', 0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the

2 0.19028433 1495 andrew gelman stats-2012-09-13-Win $5000 in the Economist’s data visualization competition

Introduction: Michael Nelson points me to this . OK, $5,000 isn’t a lot of money (I’m not expecting Niall Ferguson in the competition), but I’m still glad to see this, given that the Economist is known for its excellent graphics.

3 0.16833539 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop

4 0.14731064 200 andrew gelman stats-2010-08-11-Separating national and state swings in voting and public opinion, or, How I avoided blogorific embarrassment: An agony in four acts

Introduction: I dodged a bullet the other day, blogorifically speaking. This is a (moderately) long story but there’s a payoff at the end for those of you who are interested in forecasting or understanding voting and public opinion at the state level. Act 1 It started when Jeff Lax made this comment on his recent blog entry: Nebraska Is All That Counts for a Party-Bucking Nelson Dem Senator On Blowback From His Opposition To Kagan: ‘Are They From Nebraska? Then I Don’t Care’ Fine, but 62% of Nebraskans with an opinion favor confirmation… 91% of Democrats, 39% of Republicans, and 61% of Independents. So I guess he only cares about Republican Nebraskans… I conferred with Jeff and then wrote the following entry for fivethirtyeight.com. There was a backlog of posts at 538 at the time, so I set it on delay to appear the following morning. Here’s my post (which I ended up deleting before it ever appeared): Party-Bucking Nelson May Be Nebraska-Bucking as Well Under the head

5 0.12309863 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

6 0.12211645 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

7 0.11968613 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

8 0.11563788 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

9 0.10654234 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

10 0.10334952 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

11 0.10110289 730 andrew gelman stats-2011-05-25-Rechecking the census

12 0.10051773 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

13 0.098506428 1810 andrew gelman stats-2013-04-17-Subway series

14 0.097190663 2091 andrew gelman stats-2013-11-06-“Marginally significant”

15 0.093873367 766 andrew gelman stats-2011-06-14-Last Wegman post (for now)

16 0.093329042 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

17 0.09161932 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

18 0.091104724 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

19 0.086917087 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

20 0.083670452 1534 andrew gelman stats-2012-10-15-The strange reappearance of Matthew Klam


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.141), (1, -0.047), (2, 0.011), (3, 0.023), (4, 0.067), (5, -0.064), (6, -0.046), (7, 0.042), (8, -0.018), (9, 0.011), (10, -0.034), (11, -0.011), (12, 0.003), (13, 0.016), (14, 0.004), (15, 0.009), (16, 0.033), (17, -0.011), (18, 0.015), (19, -0.012), (20, -0.034), (21, -0.034), (22, -0.031), (23, -0.003), (24, 0.011), (25, -0.02), (26, -0.031), (27, 0.026), (28, 0.021), (29, 0.007), (30, 0.009), (31, 0.023), (32, 0.019), (33, 0.005), (34, 0.023), (35, 0.068), (36, 0.009), (37, 0.004), (38, -0.014), (39, 0.013), (40, -0.065), (41, -0.018), (42, 0.021), (43, -0.046), (44, 0.034), (45, 0.038), (46, 0.007), (47, 0.048), (48, 0.023), (49, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94790614 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the

2 0.76389456 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

Introduction: I stumbled across a chart that’s in my opinion the best way to express a comparison of quantities through time: It compares the new PC companies, such as Apple, to traditional PC companies like IBM and Compaq, but on the same scale. If you’d like to see how iPads and other novelties compare, see here . I’ve tried to use the same type of visualization in my old work on legal data visualization . It comes from a new market research firm Asymco that also produced a very clean income vs expenses visualization (click to enlarge): While the first figure is pure perfection, Tufte purists might find the second one too colorful. But to a busy person, color helps tell things apart: when I know that pink means interest, it takes a fraction of the second to assess the situation. We live in 2012, not in 1712 to have to think black and white. Finally, they have a few other interesting uses of interactive visualization, such as cellular-broadband infrastructure around

3 0.72929162 1653 andrew gelman stats-2013-01-04-Census dotmap

Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.

4 0.72218037 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

Introduction: Ricardo Pietrobon writes, regarding my post from last year on attitudes toward data graphics, Wouldn’t it be the case to start formally studying the usability of graphics from a cognitive perspective? with platforms such as the mechanical turk it should be fairly straightforward to test alternative methods and come to some conclusions about what might be more informative and what might better assist in supporting decisions. btw, my guess is that these two constructs might not necessarily agree with each other. And Jessica Hullman provides some background: Measuring success for the different goals that you hint at in your article is indeed challenging, and I don’t think that most visualization researchers would claim to have met this challenge (myself included). Visualization researchers may know the user psychology well when it comes to certain dimensions of a graph’s effectiveness (such as quick and accurate responses), but I wouldn’t agree with this statement as a gene

5 0.71824896 794 andrew gelman stats-2011-07-09-The quest for the holy graph

Introduction: Eytan Adar writes: I was just going through the latest draft of your paper with Anthony Unwin . I heard part of it at the talk you gave (remotely) here at UMich. I’m curious about your discussion of the Baby Name Voyager . The tool in itself is simple, attractive, and useful. No argument from me there. It’s an awesome demonstration of how subtle interactions can be very helpful (click and it zooms, type and it filters… falls perfectly into the Shneiderman visualization mantra). It satisfies a very common use case: finding appropriate names for children. That said, I can’t help but feeling that what you are really excited about is the very static analysis on last letters (you spend most of your time on this). This analysis, incidentally, is not possible to infer from the interactive application (which doesn’t support this type of filtering and pivoting). In a sense, the two visualizations don’t have anything to do with each other (other than a shared context/dataset).

6 0.70286679 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

7 0.70014697 289 andrew gelman stats-2010-09-21-“How segregated is your city?”: A story of why every graph, no matter how clear it seems to be, needs a caption to anchor the reader in some numbers

8 0.69703996 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

9 0.6962738 1604 andrew gelman stats-2012-12-04-An epithet I can live with

10 0.69248551 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

11 0.69135916 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

12 0.68930125 2038 andrew gelman stats-2013-09-25-Great graphs of names

13 0.68762493 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

14 0.67237878 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

15 0.67215091 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

16 0.67209095 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

17 0.66678941 624 andrew gelman stats-2011-03-22-A question about the economic benefits of universities

18 0.66626143 396 andrew gelman stats-2010-11-05-Journalism in the age of data

19 0.66621876 685 andrew gelman stats-2011-04-29-Data mining and allergies

20 0.66450649 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.01), (2, 0.048), (4, 0.024), (9, 0.022), (10, 0.028), (13, 0.026), (16, 0.109), (21, 0.018), (24, 0.135), (27, 0.011), (32, 0.011), (41, 0.012), (48, 0.012), (52, 0.019), (53, 0.013), (61, 0.022), (64, 0.01), (67, 0.011), (72, 0.01), (79, 0.028), (80, 0.015), (85, 0.014), (96, 0.057), (97, 0.015), (99, 0.199)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94786739 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the

2 0.92108434 503 andrew gelman stats-2011-01-04-Clarity on my email policy

Introduction: I never read email before 4. That doesn’t mean I never send email before 4.

3 0.9175005 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

Introduction: Unfortunately, when we deal with scientists, statisticians are often put in a setting reminiscent of Arrow’s paradox, where we are asked to provide estimates that are informative and unbiased and confidence statements that are correct conditional on the data and also on the underlying true parameter. [It's not generally possible for an estimate to do all these things at the same time -- ed.] Larry Wasserman feels that scientists are truly frequentist, and Don Rubin has told me how he feels that scientists interpret all statistical estimates Bayesianly. I have no doubt that both Larry and Don are correct. Voters want lower taxes and more services, and scientists want both Bayesian and frequency coverage; as the saying goes, everybody wants to go to heaven but nobody wants to die.

4 0.9171651 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

5 0.91680896 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

Introduction: Vincent Yip writes: I have read your paper [with Kobi Abayomi and Marc Levy] regarding multiple imputation application. In order to diagnostic my imputed data, I used Kolmogorov-Smirnov (K-S) tests to compare the distribution differences between the imputed and observed values of a single attribute as mentioned in your paper. My question is: For example I have this attribute X with the following data: (NA = missing) Original dataset: 1, NA, 3, 4, 1, 5, NA Imputed dataset: 1, 2 , 3, 4, 1, 5, 6 a) in order to run the KS test, will I treat the observed data as 1, 3, 4,1, 5? b) and for the observed data, will I treat 1, 2 , 3, 4, 1, 5, 6 as the imputed dataset for the K-S test? or just 2 ,6? c) if I used m=5, I will have 5 set of imputed data sets. How would I apply K-S test to 5 of them and compare to the single observed distribution? Do I combine the 5 imputed data set into one by averaging each imputed values so I get one single imputed data and compare with the ob

6 0.91599458 681 andrew gelman stats-2011-04-26-Worst statistical graphic I have seen this year

7 0.91585225 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

8 0.91557646 807 andrew gelman stats-2011-07-17-Macro causality

9 0.91095459 1871 andrew gelman stats-2013-05-27-Annals of spam

10 0.90961218 488 andrew gelman stats-2010-12-27-Graph of the year

11 0.90823066 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

12 0.90815723 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

13 0.9059028 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

14 0.9050622 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

15 0.90467423 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

16 0.90422094 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

17 0.90368497 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

18 0.90118837 2121 andrew gelman stats-2013-12-02-Should personal genetic testing be regulated? Battle of the blogroll

19 0.90107906 2296 andrew gelman stats-2014-04-19-Index or indicator variables

20 0.9009825 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?