andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-404 knowledge-graph by maker-knowledge-mining

404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”


meta infos for this blog

Source: html

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. [sent-1, score-1.413]

2 Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. [sent-2, score-2.766]

3 The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. [sent-3, score-2.974]

4 I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to better follow the logic in your reasoning. [sent-4, score-0.616]

5 But some of you might be interested in the substance of the paper. [sent-5, score-0.165]

6 In any case, it’s pretty scary how a statistical adjustment can have such a large effect. [sent-6, score-0.253]

7 As Little and Rubin have pointed out, lack of any apparent adjustment itself corresponds to some strong and probably horrible assumptions. [sent-8, score-0.524]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('migration', 0.486), ('interstate', 0.469), ('change', 0.188), ('decrease', 0.173), ('imputation', 0.169), ('adjustment', 0.165), ('census', 0.164), ('procedures', 0.147), ('undocumented', 0.127), ('percent', 0.123), ('minneapolis', 0.121), ('reported', 0.118), ('unadjusted', 0.114), ('artifact', 0.111), ('kaplan', 0.106), ('fed', 0.104), ('imputed', 0.101), ('graphs', 0.099), ('bureau', 0.098), ('greg', 0.092), ('sam', 0.09), ('blogged', 0.09), ('scary', 0.088), ('apparent', 0.087), ('logic', 0.087), ('substance', 0.085), ('significantly', 0.084), ('reduced', 0.084), ('corresponds', 0.083), ('interested', 0.08), ('drop', 0.079), ('little', 0.079), ('briefly', 0.079), ('joint', 0.076), ('suggestion', 0.075), ('mine', 0.074), ('explains', 0.073), ('trends', 0.072), ('horrible', 0.072), ('rubin', 0.071), ('reader', 0.069), ('smaller', 0.069), ('surveys', 0.068), ('noticed', 0.063), ('allow', 0.063), ('lack', 0.062), ('behavior', 0.058), ('missing', 0.057), ('pointed', 0.055), ('quick', 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

2 0.12553284 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?

Introduction: Majid Ezzati writes: My research group is increasingly focusing on a series of problems that involve data that either have missingness or measurements that may have bias/error. We have at times developed our own approaches to imputation (as simple as interpolating a missing unit and as sophisticated as a problem-specific Bayesian hierarchical model) and at other times, other groups impute the data. The outputs are being used to investigate the basic associations between pairs of variables, Xs and Ys, in regressions; we may or may not interpret these as causal. I am contacting colleagues with relevant expertise to suggest good references on whether having imputed X and/or Y in a subsequent regression is correct or if it could somehow lead to biased/spurious associations. Thinking about this, we can have at least the following situations (these could all be Bayesian or not): 1) X and Y both measured (perhaps with error) 2) Y imputed using some data and a model and X measur

3 0.12240751 730 andrew gelman stats-2011-05-25-Rechecking the census

Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju

4 0.11489321 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

Introduction: Vishnu Ganglani writes: It appears that multiple imputation appears to be the best way to impute missing data because of the more accurate quantification of variance. However, when imputing missing data for income values in national household surveys, would you recommend it would be practical to maintain the multiple datasets associated with multiple imputations, or a single imputation method would suffice. I have worked on household survey projects (in Scotland) and in the past gone with suggesting single methods for ease of implementation, but with the availability of open source R software I am think of performing multiple imputation methodologies, but a bit apprehensive because of the complexity and also the need to maintain multiple datasets (ease of implementation). My reply: In many applications I’ve just used a single random imputation to avoid the awkwardness of working with multiple datasets. But if there’s any concern, I’d recommend doing parallel analyses on multipl

5 0.1128867 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

6 0.10105198 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

7 0.093987599 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

8 0.086849347 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

9 0.081531979 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

10 0.077034213 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class

11 0.07264547 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids

12 0.071247503 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

13 0.068181202 529 andrew gelman stats-2011-01-21-“City Opens Inquiry on Grading Practices at a Top-Scoring Bronx School”

14 0.066785917 1029 andrew gelman stats-2011-11-26-“To Rethink Sprawl, Start With Offices”

15 0.064942136 319 andrew gelman stats-2010-10-04-“Who owns Congress”

16 0.062717065 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

17 0.061288416 150 andrew gelman stats-2010-07-16-Gaydar update: Additional research on estimating small fractions of the population

18 0.059946183 2170 andrew gelman stats-2014-01-13-Judea Pearl overview on causal inference, and more general thoughts on the reexpression of existing methods by considering their implicit assumptions

19 0.059566699 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

20 0.059198827 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.114), (1, -0.012), (2, 0.021), (3, -0.003), (4, 0.046), (5, -0.033), (6, -0.031), (7, 0.013), (8, -0.003), (9, -0.017), (10, -0.004), (11, -0.016), (12, 0.004), (13, 0.001), (14, 0.02), (15, 0.042), (16, 0.006), (17, -0.006), (18, 0.01), (19, 0.001), (20, -0.011), (21, 0.061), (22, 0.0), (23, 0.014), (24, -0.011), (25, -0.015), (26, -0.0), (27, -0.008), (28, 0.053), (29, 0.013), (30, 0.027), (31, 0.006), (32, 0.006), (33, 0.044), (34, -0.042), (35, 0.017), (36, 0.059), (37, 0.011), (38, 0.002), (39, 0.026), (40, -0.027), (41, 0.012), (42, -0.006), (43, -0.014), (44, -0.014), (45, -0.007), (46, -0.018), (47, 0.013), (48, 0.007), (49, 0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94185698 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

2 0.77495164 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

Introduction: Vishnu Ganglani writes: It appears that multiple imputation appears to be the best way to impute missing data because of the more accurate quantification of variance. However, when imputing missing data for income values in national household surveys, would you recommend it would be practical to maintain the multiple datasets associated with multiple imputations, or a single imputation method would suffice. I have worked on household survey projects (in Scotland) and in the past gone with suggesting single methods for ease of implementation, but with the availability of open source R software I am think of performing multiple imputation methodologies, but a bit apprehensive because of the complexity and also the need to maintain multiple datasets (ease of implementation). My reply: In many applications I’ve just used a single random imputation to avoid the awkwardness of working with multiple datasets. But if there’s any concern, I’d recommend doing parallel analyses on multipl

3 0.70354861 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

Introduction: Aureliano Crameri writes: I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts. I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is

4 0.68220574 2059 andrew gelman stats-2013-10-12-Visualization, “big data”, and EDA

Introduction: Dean Eckles writes: Given your ongoing discussion of info viz for different goals, you might be interested in Sinan Aral’s new article : This touches on several info viz themes: - Viz for yourself (or your team) vs. visualizations to share the final conclusions - Viz for identifying promising features for use in modeling - Viz and statistical significance, especially when the data has plenty of dependence structure Also, these cascade visualizations are perhaps worth comparing with some of very large cascades on Facebook made by my colleagues Alex Dow, Lada Adamic, and Adrien Friggeri. I like those graphs but on the international maps I would make the country boundaries thinner and I would get rid of Greenland and Antarctica, they’re distracting. (I think that’s what Bob would call a “bike shed” comment.)

5 0.67975312 677 andrew gelman stats-2011-04-24-My NOAA story

Introduction: I recently learned we have some readers at the National Oceanic and Atmospheric Administration so I thought I’d share an old story. About 35 years ago my brother worked briefly as a clerk at NOAA in their D.C. (or maybe it was D.C.-area) office. His job was to enter the weather numbers that came in. He had a boss who was very orderly. At one point there was a hurricane that wiped out some weather station in the Caribbean, and his boss told him to put in the numbers anyway. My brother protested that they didn’t have the data, to which his boss replied: “I know what the numbers are.” Nowadays we call this sort of thing “imputation” and we like it. But not in the raw data! I bet nowadays they have an NA code.

6 0.6794036 527 andrew gelman stats-2011-01-20-Cars vs. trucks

7 0.67518902 730 andrew gelman stats-2011-05-25-Rechecking the census

8 0.67121446 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

9 0.66992885 1500 andrew gelman stats-2012-09-17-“2% per degree Celsius . . . the magic number for how worker productivity responds to warm-hot temperatures”

10 0.66979861 1511 andrew gelman stats-2012-09-26-What do statistical p-values mean when the sample = the population?

11 0.66972196 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

12 0.66965181 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!

13 0.66915429 849 andrew gelman stats-2011-08-11-The Reliability of Cluster Surveys of Conflict Mortality: Violent Deaths and Non-Violent Deaths

14 0.66332847 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

15 0.65450794 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

16 0.65370363 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

17 0.65070969 1691 andrew gelman stats-2013-01-25-Extreem p-values!

18 0.64397687 159 andrew gelman stats-2010-07-23-Popular governor, small state

19 0.63763624 544 andrew gelman stats-2011-01-29-Splitting the data

20 0.63506711 2319 andrew gelman stats-2014-05-05-Can we make better graphs of global temperature history?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.012), (16, 0.121), (21, 0.019), (24, 0.123), (25, 0.016), (35, 0.013), (55, 0.02), (86, 0.02), (93, 0.014), (95, 0.336), (96, 0.014), (99, 0.18)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95242935 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved

Introduction: When I first saw this graphic, I thought “boy, that’s great, sometimes the graphic practically makes itself.” Normally it’s hard to use lots of different colors to differentiate items of interest, because there’s usually not an intuitive mapping between color and item (e.g. for countries, or states, or whatever). But the colors of crayons, what could be more perfect? So this graphic seemed awesome. But, as they discovered after some experimentation at datapointed.net there is an even BETTER possibility here. Click the link to see. Crayola Crayon colors by year

2 0.94692516 876 andrew gelman stats-2011-08-28-Vaguely related to the coke-dumping story

Introduction: Underground norms from Jay Livingston. P.S. The Coke story is here (and is followed up in the comments).

3 0.94339204 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics

Introduction: Now here’s a foundation I (Bob) can get behind: Foundation for Open Access Statistics (FOAS) Their mission is to “promote free software, open access publishing, and reproducible research in statistics.” To me, that’s like supporting motherhood and apple pie ! FOAS spun out of and is partially designed to support the Journal of Statistical Software (aka JSS , aka JStatSoft ). I adore JSS because it (a) is open access, (b) publishes systems papers on statistical software, (c) has fast reviewing turnaround times, and (d) is free for authors and readers. One of the next items on my to-do list is to write up the Stan modeling language and submit it to JSS . As a not-for-profit with no visible source of income, they are quite sensibly asking for donations (don’t complain — it beats $3K author fees or not being able to read papers).

same-blog 4 0.91592985 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

5 0.89262599 520 andrew gelman stats-2011-01-17-R Advertised

Introduction: The R language is definitely going mainstream:

6 0.89096272 1973 andrew gelman stats-2013-08-08-For chrissake, just make up an analysis already! We have a lab here to run, y’know?

7 0.87311107 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

8 0.8517164 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

9 0.84534138 2101 andrew gelman stats-2013-11-15-BDA class 4 G+ hangout on air is on air

10 0.8188225 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

11 0.79818523 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America

12 0.79113144 1308 andrew gelman stats-2012-05-08-chartsnthings !

13 0.78476727 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

14 0.78330612 266 andrew gelman stats-2010-09-09-The future of R

15 0.7644282 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

16 0.7521193 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

17 0.75048244 1595 andrew gelman stats-2012-11-28-Should Harvard start admitting kids at random?

18 0.75022757 1667 andrew gelman stats-2013-01-10-When you SHARE poorly researched infographics…

19 0.74493343 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

20 0.71836388 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally