andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1366 knowledge-graph by maker-knowledge-mining

1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?


meta infos for this blog

Source: html

Introduction: In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. I’ll pose my question below, but first the graphs: In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job. One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work. This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. [sent-1, score-0.574]

2 The difference between the pink and blue lines represents segregation by job. [sent-3, score-1.243]

3 One thing I wonder is how these graphs would change if we redefine occupation. [sent-4, score-0.646]

4 (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”? [sent-5, score-0.224]

5 ) Finer or coarser classification would give different results, and I wonder how this would work. [sent-6, score-0.656]

6 This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. [sent-7, score-0.234]

7 I’m guessing that someone’s looked into this already and that there’s some research literature on the topic. [sent-8, score-0.228]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('segregation', 0.52), ('pink', 0.245), ('cohen', 0.239), ('professor', 0.222), ('graphs', 0.191), ('coarser', 0.173), ('blue', 0.169), ('steep', 0.163), ('lines', 0.158), ('finer', 0.151), ('redefine', 0.151), ('occupation', 0.151), ('workplace', 0.146), ('spike', 0.137), ('wonder', 0.129), ('tenured', 0.127), ('classification', 0.124), ('graph', 0.12), ('philip', 0.117), ('would', 0.115), ('force', 0.108), ('labor', 0.105), ('teacher', 0.1), ('bottom', 0.095), ('guessing', 0.093), ('jobs', 0.092), ('represents', 0.091), ('meant', 0.09), ('sex', 0.088), ('led', 0.086), ('women', 0.085), ('posts', 0.084), ('percentage', 0.081), ('mathematical', 0.079), ('criticism', 0.077), ('scientist', 0.075), ('statistics', 0.073), ('looked', 0.071), ('actual', 0.069), ('zero', 0.069), ('statistician', 0.068), ('statistical', 0.067), ('top', 0.065), ('literature', 0.064), ('claims', 0.064), ('place', 0.063), ('whatever', 0.062), ('university', 0.062), ('change', 0.06), ('difference', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

Introduction: In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. I’ll pose my question below, but first the graphs: In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job. One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work. This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that

2 0.16593796 739 andrew gelman stats-2011-05-31-When Did Girls Start Wearing Pink?

Introduction: That cute picture is of toddler FDR in a dress, from 1884. Jeanne Maglaty writes : A Ladies’ Home Journal article [or maybe from a different source, according to a commenter] in June 1918 said, “The generally accepted rule is pink for the boys, and blue for the girls. The reason is that pink, being a more decided and stronger color, is more suitable for the boy, while blue, which is more delicate and dainty, is prettier for the girl.” Other sources said blue was flattering for blonds, pink for brunettes; or blue was for blue-eyed babies, pink for brown-eyed babies, according to Paoletti. In 1927, Time magazine printed a chart showing sex-appropriate colors for girls and boys according to leading U.S. stores. In Boston, Filene’s told parents to dress boys in pink. So did Best & Co. in New York City, Halle’s in Cleveland and Marshall Field in Chicago. Today’s color dictate wasn’t established until the 1940s . . . When the women’s liberation movement arrived in the mid-1960s, w

3 0.12200461 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.10319942 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at  https://www.udacity.com/course/ud651  so anyone can take it for free. And Kaiser Fung has  reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course  here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what

5 0.093634695 2187 andrew gelman stats-2014-01-26-Twitter sucks, and people are gullible as f…

Introduction: Hey, and I did it in less than 140 characters! The above was my response to this item which David Hogg forwarded to me. The next thing you know, people are going to claim that women are three times as likely to wear red pink when . . . Naaah, forget about it, that would never happen. Hmmm, I think the above is not so savvy of me, to just go around insulting a whole bunch of people. So let me just say that becoming numerate is not as easy as it might seem. All of us can be gullible in areas outside of our expertise. Indeed, I’ve fallen for the occasional April Fool’s gag myself. And, maybe it’s not really right for me to say that “Twitter sucks.” Sure, the downside of Twitter is that people can just pass along a silly joke, not realizing it’s a joke at all. But the upside is, I hope, that once people have committed themselves and then realize they were mistaken, they’ll think harder the next time they see something like that. I hope the same thing goes with the “women

6 0.093608417 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

7 0.093380153 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

8 0.091644824 61 andrew gelman stats-2010-05-31-A data visualization manifesto

9 0.091643497 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”

10 0.091369286 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

11 0.088744283 319 andrew gelman stats-2010-10-04-“Who owns Congress”

12 0.086592853 1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

13 0.082831509 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

14 0.082255468 536 andrew gelman stats-2011-01-24-Trends in partisanship by state

15 0.077379569 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

16 0.077033766 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

17 0.074139111 1581 andrew gelman stats-2012-11-17-Horrible but harmless?

18 0.072827093 361 andrew gelman stats-2010-10-21-Tenure-track statistics job at Teachers College, here at Columbia!

19 0.072795562 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

20 0.07144285 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.125), (1, -0.056), (2, -0.019), (3, 0.017), (4, 0.061), (5, -0.077), (6, -0.071), (7, 0.061), (8, -0.055), (9, 0.012), (10, 0.001), (11, -0.001), (12, -0.001), (13, 0.02), (14, 0.024), (15, -0.001), (16, -0.011), (17, 0.016), (18, 0.002), (19, 0.003), (20, 0.018), (21, 0.029), (22, -0.016), (23, -0.028), (24, 0.035), (25, -0.006), (26, -0.039), (27, 0.028), (28, -0.018), (29, -0.013), (30, 0.031), (31, 0.011), (32, -0.068), (33, -0.004), (34, -0.03), (35, -0.003), (36, 0.008), (37, 0.004), (38, -0.057), (39, -0.013), (40, 0.013), (41, 0.005), (42, 0.024), (43, 0.03), (44, -0.028), (45, 0.0), (46, -0.017), (47, 0.005), (48, 0.022), (49, 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95831865 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

Introduction: In a discussion of workplace segregation, Philip Cohen posts some graphs that led me to a statistical question. I’ll pose my question below, but first the graphs: In a world of zero segregation of jobs by sex, the top graph above would have a spike at 50% (or, whatever the actual percentage is of women in the labor force) and, in the bottom graph, the pink and blue lines would be in the same place and would look like very steep S curves. The difference between the pink and blue lines represents segregation by job. One thing I wonder is how these graphs would change if we redefine occupation. (For example, is my occupation “mathematical scientist,” “statistician,” “teacher,” “university professor,” “statistics professor,” or “tenured statistics professor”?) Finer or coarser classification would give different results, and I wonder how this would work. This is not at all meant as a criticism of Cohen’s claims, it’s just a statistical question. I’m guessing that

2 0.78605509 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can

3 0.76653022 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

4 0.7644105 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on

5 0.76005936 488 andrew gelman stats-2010-12-27-Graph of the year

Introduction: From blogger Matthew Yglesias : There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”). There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special. It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features: - Clear title - Clearly labeled axes - Vertical axis goes to zero - The cities are in a sensible order (not, for example, alphabetical) - The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed. What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and e

6 0.7556172 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

7 0.75474083 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

8 0.74264419 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

9 0.74106097 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

10 0.73992699 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

11 0.73425603 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

12 0.73315591 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

13 0.73174232 61 andrew gelman stats-2010-05-31-A data visualization manifesto

14 0.73113132 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

15 0.7285642 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

16 0.72584444 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

17 0.72411948 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

18 0.71971542 671 andrew gelman stats-2011-04-20-One more time-use graph

19 0.71757931 2038 andrew gelman stats-2013-09-25-Great graphs of names

20 0.71201968 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.605), (24, 0.101), (47, 0.019), (99, 0.148)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98756331 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.

2 0.98193711 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update

Introduction: I checked and somebody went in and screwed up my fixes to the wikipedia page on Bayesian inference. I give up.

3 0.98023134 398 andrew gelman stats-2010-11-06-Quote of the day

Introduction: “A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model.”

4 0.96888494 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

5 0.95850837 1745 andrew gelman stats-2013-03-02-Classification error

Introduction: 15-2040 != 19-3010 (and, for that matter, 25-1022 != 25-1063).

6 0.95617563 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

7 0.94929194 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street

8 0.94314301 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

same-blog 9 0.93378586 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

10 0.93244284 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog

11 0.91730189 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

12 0.90852189 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays

13 0.90704197 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

14 0.88909727 1487 andrew gelman stats-2012-09-08-Animated drought maps

15 0.87232721 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

16 0.86482382 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

17 0.85916245 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

18 0.85848284 1025 andrew gelman stats-2011-11-24-Always check your evidence

19 0.84229165 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

20 0.82494748 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples