andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-730 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju
sentIndex sentText sentNum sentScore
1 Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. [sent-1, score-0.309]
2 Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. [sent-2, score-0.617]
3 How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? [sent-3, score-0.103]
4 How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? [sent-4, score-0.948]
5 So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat justified. [sent-6, score-0.753]
6 In those blocks alone, census takers appear to have missed dozens of New Yorkers and to have overestimated the number of vacant apartments. [sent-7, score-0.971]
7 In Brooklyn, on a block near Ocean Parkway between Midwood and Gravesend, where the census said nearly half of the 148 homes were vacant, a resident said the only vacancies were in a new 33-unit apartment building that is partially occupied. [sent-8, score-1.347]
8 On another block in Sheepshead Bay, Brooklyn, the number of vacancies on the block recorded by the Census Bureau far exceeds the number of unsold condominiums in a new apartment building. [sent-12, score-1.089]
9 Superintendents of other nearby buildings say those had few vacant apartments when the census was conducted. [sent-13, score-0.978]
10 Very impressive of the NYT to do their own survey rather than just reporting it as an amusing controversy. [sent-15, score-0.079]
11 The Times survey did not replicate the methods the Census Bureau uses, including mailing questionnaires and making up to five visits to addresses that have not returned the forms. [sent-17, score-0.079]
12 As a last resort, a census worker will consult with a landlord or neighbors and make a best guess about whether a home is occupied. [sent-18, score-0.467]
13 Often, though, owners of illegally divided houses are reluctant to disclose the number of tenants, who tend to include people who are in the country illegally and are leery of providing any information to the government. [sent-19, score-0.604]
14 City officials say as many as 80,000 residents appear to have been systematically overlooked in crowded immigrant neighborhoods like East Elmhurst and Jackson Heights in Queens and Sunset Park, Bay Ridge and Bensonhurst in Brooklyn. [sent-21, score-0.395]
15 Classrooms in those neighborhoods are overcrowded and “for rent” signs are rare. [sent-22, score-0.131]
16 Some demographers say the number of vacancies was not all that anomalous, given some overbuilding before the recession and a surge in foreclosures. [sent-23, score-0.57]
17 But of 500 houses or apartments on the three blocks surveyed by The Times, only four were in foreclosure or had been seized by the mortgage holder, according to an analysis conducted at The Times’s request by the Furman Center for Real Estate and Urban Policy at New York University. [sent-24, score-0.405]
18 Relying on earlier census surveys and evidence from the Postal Service and other sources, the city plans to formally ask the Census Bureau next month to review its findings. [sent-26, score-0.562]
19 Census officials have acknowledged that a processing glitch is one possibility for any pattern of population declines and increased vacancies in specific neighborhoods. [sent-27, score-0.618]
wordName wordTfidf (topN-words)
[('census', 0.411), ('vacancies', 0.309), ('vacant', 0.282), ('bureau', 0.245), ('queens', 0.212), ('apartments', 0.203), ('brooklyn', 0.152), ('illegally', 0.152), ('city', 0.151), ('block', 0.149), ('neighborhoods', 0.131), ('officials', 0.131), ('recorded', 0.126), ('elmhurst', 0.124), ('blocks', 0.123), ('surge', 0.106), ('percent', 0.103), ('number', 0.099), ('times', 0.09), ('homes', 0.087), ('apartment', 0.087), ('estate', 0.086), ('bay', 0.086), ('roberts', 0.084), ('residents', 0.082), ('buildings', 0.082), ('survey', 0.079), ('houses', 0.079), ('east', 0.077), ('new', 0.071), ('york', 0.07), ('divided', 0.069), ('said', 0.064), ('population', 0.064), ('increased', 0.058), ('landlord', 0.056), ('demographers', 0.056), ('glitch', 0.056), ('holder', 0.056), ('incredulous', 0.056), ('outrage', 0.056), ('postal', 0.056), ('takers', 0.056), ('yorkers', 0.056), ('ocean', 0.053), ('disclose', 0.053), ('occupied', 0.053), ('resident', 0.053), ('nearly', 0.052), ('overlooked', 0.051)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 730 andrew gelman stats-2011-05-25-Rechecking the census
Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju
2 0.1488575 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census
Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa
3 0.14815839 1653 andrew gelman stats-2013-01-04-Census dotmap
Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.
Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett
5 0.11961973 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census
Introduction: In his new book, “What is Your Race? The Census and Our Flawed Efforts to Classify Americans,” former Census Bureau director Ken Prewitt recommends taking the race question off the decennial census: He recommends gradual changes, integrating the race and national origin questions while improving both. In particular, he would replace the main “race” question by a “race or origin” question, with the instruction to “Mark one or more” of the following boxes: “White,” “Black, African Am., or Negro,” “Hispanic, Latino, or Spanish origin,” “American Indian or Alaska Native,” “Asian”, “Native Hawaiian or Other Pacific Islander,” and “Some other race or origin.” Then the next question is to write in “specific race, origin, or enrolled or principal tribe.” Prewitt writes: His suggestion is to go with these questions in 2020 and 2030, then in 2040 “drop the race question and use only the national origin question.” He’s also relying on the American Community Survey to gather a lo
6 0.11528709 150 andrew gelman stats-2010-07-16-Gaydar update: Additional research on estimating small fractions of the population
7 0.11280442 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?
8 0.10110289 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect
9 0.091966234 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)
11 0.081636779 1437 andrew gelman stats-2012-07-31-Paying survey respondents
12 0.077721559 1832 andrew gelman stats-2013-04-29-The blogroll
13 0.077143833 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys
14 0.07683862 673 andrew gelman stats-2011-04-20-Upper-income people still don’t realize they’re upper-income
15 0.074370682 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?
16 0.072122015 1810 andrew gelman stats-2013-04-17-Subway series
17 0.071507946 624 andrew gelman stats-2011-03-22-A question about the economic benefits of universities
18 0.071358114 1649 andrew gelman stats-2013-01-02-Back when 50 miles was a long way
19 0.069344223 1297 andrew gelman stats-2012-05-03-New New York data research organizations
20 0.06924364 529 andrew gelman stats-2011-01-21-“City Opens Inquiry on Grading Practices at a Top-Scoring Bronx School”
topicId topicWeight
[(0, 0.087), (1, -0.037), (2, 0.052), (3, -0.009), (4, 0.02), (5, 0.028), (6, -0.002), (7, 0.011), (8, -0.01), (9, -0.037), (10, -0.013), (11, -0.046), (12, -0.004), (13, 0.057), (14, -0.009), (15, 0.04), (16, 0.022), (17, 0.007), (18, 0.038), (19, 0.005), (20, -0.04), (21, 0.017), (22, -0.041), (23, 0.027), (24, -0.015), (25, -0.012), (26, -0.04), (27, 0.029), (28, 0.07), (29, 0.008), (30, 0.014), (31, -0.015), (32, 0.007), (33, 0.025), (34, -0.018), (35, 0.02), (36, 0.018), (37, -0.005), (38, 0.003), (39, 0.031), (40, -0.035), (41, 0.004), (42, 0.016), (43, 0.001), (44, -0.009), (45, 0.003), (46, -0.029), (47, 0.023), (48, -0.0), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.97468084 730 andrew gelman stats-2011-05-25-Rechecking the census
Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju
Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor
Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab
4 0.73128712 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick
Introduction: Cassie Murdoch points to a report from a corporate survey: Sixty-two percent of U.S. employees say it’s not likely they or a family member will be diagnosed with a serious illness like cancer, a survey indicates. The Aflac WorkForces Report, a survey of nearly 1,900 benefits decision-makers and more than 6,100 U.S. workers, also indicated 55 percent said they were not very or not at all likely to be diagnosed with a chronic illness, such as heart disease or diabetes. Here are some actual statistics: The American Cancer Society, Cancer Facts & Figures 2012, said 1-in-3 women and 1-in-2 men will be diagnosed with cancer at some point in their lives, and the National Safety Council, Injury Facts 2011 edition, says more than 38.9 million injuries occur in a year requiring medical treatment. The American Heart Association, Heart Disease & Stroke Statistics 2012, said 1-in-6 U.S. deaths were caused by coronary heart disease, Tillman said. And some details on the survey:
5 0.72673547 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census
Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa
6 0.71239376 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census
7 0.70253247 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq
9 0.68753749 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?
10 0.67302668 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones
11 0.66875696 1845 andrew gelman stats-2013-05-07-Is Felix Salmon wrong on free TV?
12 0.65745109 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?
13 0.64666885 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys
14 0.6376642 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked
15 0.63417804 1906 andrew gelman stats-2013-06-19-“Behind a cancer-treatment firm’s rosy survival claims”
16 0.63176626 1940 andrew gelman stats-2013-07-16-A poll that throws away data???
17 0.62098747 107 andrew gelman stats-2010-06-24-PPS in Georgia
18 0.61237115 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?
19 0.60964954 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
20 0.60396188 381 andrew gelman stats-2010-10-30-Sorry, Senator DeMint: Most Americans Don’t Want to Ban Gays from the Classroom
topicId topicWeight
[(2, 0.014), (9, 0.027), (15, 0.012), (16, 0.051), (21, 0.028), (24, 0.087), (35, 0.024), (43, 0.013), (45, 0.015), (55, 0.015), (63, 0.043), (64, 0.02), (80, 0.203), (82, 0.012), (95, 0.039), (98, 0.012), (99, 0.191)]
simIndex simValue blogId blogTitle
1 0.91128957 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency
Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian
2 0.9063285 1029 andrew gelman stats-2011-11-26-“To Rethink Sprawl, Start With Offices”
Introduction: According to this op-ed by Louise Mozingo, the fashion for suburban corporate parks is seventy years old: In 1942 the AT&T; Bell Telephone Laboratories moved from its offices in Lower Manhattan to a new, custom-designed facility on 213 acres outside Summit, N.J. The location provided space for laboratories and quiet for acoustical research, and new features: parking lots that allowed scientists and engineers to drive from their nearby suburban homes, a spacious cafeteria and lounge and, most surprisingly, views from every window of a carefully tended pastoral landscape designed by the Olmsted brothers, sons of the designer of Central Park. Corporate management never saw the city center in the same way again. Bell Labs initiated a tide of migration of white-collar workers, especially as state and federal governments conveniently extended highways into the rural edge. Just to throw some Richard Florida in the mix: Back in 1990, I turned down a job offer from Bell Labs, larg
same-blog 3 0.8867929 730 andrew gelman stats-2011-05-25-Rechecking the census
Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju
Introduction: The title of this blog post quotes the second line of the abstract of Goldstein et al.’s much ballyhooed 2008 tech report, Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings . The first sentence of the abstract is Individuals who are unaware of the price do not derive more enjoyment from more expensive wine. Perhaps not surprisingly, given the easy target wine snobs make, the popular press has picked up on the first sentence of the tech report. For example, the Freakonomics blog/radio entry of the same name quotes the first line, ignores the qualification, then concludes Wishing you the happiest of holiday seasons, and urging you to spend $15 instead of $50 on your next bottle of wine. Go ahead, take the money you save and blow it on the lottery. In case you’re wondering about whether to buy me a cheap or expensive bottle of wine, keep in mind I’ve had classical “wine training”. After ten minutes of training with some side by
5 0.83888745 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend
Introduction: A student journalist called me with some questions about when the U.S. would have a female president. At one point she asked if there were any surveys of whether people would vote for a woman. I suggested she try Google. I was by my computer anyway so typed “what percentage of americans would vote for a woman president” (without the quotation marks), and the very first hit was this from Gallup, from 2007: The Feb. 9-11, 2007, poll asked Americans whether they would vote for “a generally well-qualified” presidential candidate nominated by their party with each of the following characteristics: Jewish, Catholic, Mormon, an atheist, a woman, black, Hispanic, homosexual, 72 years of age, and someone married for the third time. Between now and the 2008 political conventions, there will be discussion about the qualifications of presidential candidates — their education, age, religion, race, and so on. If your party nominated a generally well-qualified person for president who happene
6 0.83217728 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates
7 0.82539821 1494 andrew gelman stats-2012-09-13-Watching the sharks jump
8 0.80704337 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics
9 0.80662751 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy
10 0.79412353 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
11 0.76085663 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe
13 0.74646139 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting
14 0.74412161 937 andrew gelman stats-2011-10-02-That advice not to work so hard
15 0.74255425 428 andrew gelman stats-2010-11-24-Flawed visualization of U.S. voting maybe has some good features
16 0.74153376 2103 andrew gelman stats-2013-11-16-Objects of the class “Objects of the class”
17 0.73899835 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature
18 0.73783487 140 andrew gelman stats-2010-07-10-SeeThroughNY
19 0.73771691 461 andrew gelman stats-2010-12-09-“‘Why work?’”
20 0.73746407 281 andrew gelman stats-2010-09-16-NSF crowdsourcing