andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-374 knowledge-graph by maker-knowledge-mining

374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.


meta infos for this blog

Source: html

Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 House of Representatives, a Californian in a tight race this year. [sent-3, score-0.774]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('representatives', 0.459), ('tight', 0.45), ('member', 0.334), ('race', 0.324), ('house', 0.288), ('speaking', 0.281), ('prediction', 0.275), ('mentioned', 0.273), ('recently', 0.181), ('said', 0.162)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”

2 0.17103444 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

Introduction: In his new book, “What is Your Race? The Census and Our Flawed Efforts to Classify Americans,” former Census Bureau director Ken Prewitt recommends taking the race question off the decennial census: He recommends gradual changes, integrating the race and national origin questions while improving both. In particular, he would replace the main “race” question by a “race or origin” question, with the instruction to “Mark one or more” of the following boxes: “White,” “Black, African Am., or Negro,” “Hispanic, Latino, or Spanish origin,” “American Indian or Alaska Native,” “Asian”, “Native Hawaiian or Other Pacific Islander,” and “Some other race or origin.” Then the next question is to write in “specific race, origin, or enrolled or principal tribe.” Prewitt writes: His suggestion is to go with these questions in 2020 and 2030, then in 2040 “drop the race question and use only the national origin question.” He’s also relying on the American Community Survey to gather a lo

3 0.10630821 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

Introduction: Hibbs, one of the original economy-and-elections guys, writes : The number of House seats won by the president’s party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-party’s candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President Obama’s margin of victory in 2008, and the weak growth of per capita real income during the …rst 6 quarters of the 111th Congress, the Democrat’s chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 o

4 0.10464676 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November

Introduction: They write : How many House seats will the Republicans gain in 2010? . . . Our methodology replicates that for our ultimately successful forecast of the 2006 midterm. Two weeks before Election Day in 2006, we posted a prediction that the Democrats would gain 32 seats and recapture the House majority. The Democrats gained 30 seats in 2006. Our current forecast for 2010 shows that the Republicans are likely to regain the House majority. . . . the most likely scenario is a Republican majority in the neighborhood of 229 seats versus 206 for the Democrats for a 50-seat loss for the Democrats . How do they do it? First, they predict the national two-party vote using the generic polls (asking voters which party they plan to vote for in the November congressional elections). Then they apply the national vote swing on a district-by-district level to predict the outcome in each district. They account for uncertainty in their predictions (I assume by using a model similar to what Gar

5 0.10112116 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

Introduction: David Shor writes: I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components: 1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055) 2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06) Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I’m not sure what the correct way to “blend” these models would be, either for individual or top-line seat estimates. In the mean-time, I’m just creating variance-weighted avera

6 0.09567254 158 andrew gelman stats-2010-07-22-Tenants and landlords

7 0.090272419 245 andrew gelman stats-2010-08-31-Predicting marathon times

8 0.08456897 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

9 0.083933339 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

10 0.080812171 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing

11 0.074083269 1771 andrew gelman stats-2013-03-19-“Ronald Reagan is a Statistician and Other Examples of Learning From Diverse Sources of Information”

12 0.07044211 312 andrew gelman stats-2010-10-02-“Regression to the mean” is fine. But what’s the “mean”?

13 0.069204979 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

14 0.065164596 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

15 0.063767105 1742 andrew gelman stats-2013-02-27-What is “explanation”?

16 0.063227721 1540 andrew gelman stats-2012-10-18-“Intrade to the 57th power”

17 0.062619507 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

18 0.061668061 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

19 0.06096616 1937 andrew gelman stats-2013-07-13-Meritocracy rerun

20 0.05913239 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.033), (1, -0.015), (2, 0.032), (3, 0.022), (4, -0.02), (5, 0.01), (6, -0.005), (7, -0.017), (8, 0.012), (9, -0.009), (10, 0.013), (11, 0.014), (12, 0.016), (13, -0.007), (14, -0.038), (15, 0.011), (16, -0.003), (17, 0.002), (18, 0.01), (19, 0.011), (20, -0.043), (21, 0.045), (22, -0.017), (23, 0.036), (24, 0.018), (25, 0.023), (26, -0.022), (27, -0.004), (28, 0.031), (29, 0.013), (30, 0.011), (31, 0.019), (32, 0.002), (33, 0.031), (34, 0.009), (35, -0.008), (36, 0.034), (37, -0.026), (38, 0.021), (39, 0.033), (40, -0.06), (41, -0.002), (42, 0.004), (43, 0.036), (44, -0.021), (45, -0.013), (46, 0.003), (47, -0.021), (48, 0.014), (49, 0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98888439 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”

2 0.57193214 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November

Introduction: They write : How many House seats will the Republicans gain in 2010? . . . Our methodology replicates that for our ultimately successful forecast of the 2006 midterm. Two weeks before Election Day in 2006, we posted a prediction that the Democrats would gain 32 seats and recapture the House majority. The Democrats gained 30 seats in 2006. Our current forecast for 2010 shows that the Republicans are likely to regain the House majority. . . . the most likely scenario is a Republican majority in the neighborhood of 229 seats versus 206 for the Democrats for a 50-seat loss for the Democrats . How do they do it? First, they predict the national two-party vote using the generic polls (asking voters which party they plan to vote for in the November congressional elections). Then they apply the national vote swing on a district-by-district level to predict the outcome in each district. They account for uncertainty in their predictions (I assume by using a model similar to what Gar

3 0.57151383 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections

Introduction: Yesterday at the sister blog , Nate Silver forecast that the Republicans have a two-thirds chance of regaining the House of Representatives in the upcoming election, with an expected gain of 45 House seats. Last month, Bafumi, Erikson, and Wlezien released their forecast that gives the Republicans an 80% chance of takeover and an expected gain of 50 seats. As all the above writers emphasize, these forecasts are full of uncertainty, so I treat the two predictions–a 45-seat swing or a 50-seat swing–as essentially identical at the national level. And, as regular readers know, as far back as a year ago , the generic Congressional ballot (those questions of the form, “Which party do you plan to vote for in November?”) was also pointing to big Republican gains. As Bafumi et al. point out, early generic polls are strongly predictive of the election outcome, but they need to be interpreted carefully. The polls move in a generally predictable manner during the year leading up to an

4 0.5563103 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

Introduction: Hibbs, one of the original economy-and-elections guys, writes : The number of House seats won by the president’s party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-party’s candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President Obama’s margin of victory in 2008, and the weak growth of per capita real income during the …rst 6 quarters of the 111th Congress, the Democrat’s chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 o

5 0.53623229 2005 andrew gelman stats-2013-09-02-“Il y a beaucoup de candidats démocrates, et leurs idéologies ne sont pas très différentes. Et la participation est imprévisible.”

Introduction: As I wrote a couple years ago: Even though statistical analysis has demonstrated that presidential elections are predictable given economic conditions and previous votes in the states . . . it certainly doesn’t mean that every election can be accurately predicted ahead of time. Presidential general election campaigns have several distinct features that distinguish them from most other elections: 1. Two major candidates; 2. The candidates clearly differ in their political ideologies and in their positions on economic issues; 3. The two sides have roughly equal financial and organizational resources; 4. The current election is the latest in a long series of similar contests (every four years); 5. A long campaign, giving candidates a long time to present their case and giving voters a long time to make up their minds. Other elections look different. . . . Or, as I said in reference to the current NYC mayoral election: Et selon Andrew Gelman, expert de l’universi

6 0.52917075 1570 andrew gelman stats-2012-11-08-Poll aggregation and election forecasting

7 0.50650012 1567 andrew gelman stats-2012-11-07-Election reports

8 0.48217544 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

9 0.48054257 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?

10 0.47966352 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

11 0.47882181 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

12 0.46648034 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

13 0.45680749 158 andrew gelman stats-2010-07-22-Tenants and landlords

14 0.4556022 730 andrew gelman stats-2011-05-25-Rechecking the census

15 0.45446032 1556 andrew gelman stats-2012-11-01-Recently in the sister blogs: special pre-election edition!

16 0.45435721 912 andrew gelman stats-2011-09-15-n = 2

17 0.44990981 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)

18 0.43920767 43 andrew gelman stats-2010-05-19-What do Tuesday’s elections tell us about November?

19 0.42732474 1407 andrew gelman stats-2012-07-06-Statistical inference and the secret ballot

20 0.42631775 1540 andrew gelman stats-2012-10-18-“Intrade to the 57th power”


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(24, 0.151), (50, 0.3), (77, 0.072), (99, 0.239)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90700114 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

Introduction: I was recently speaking with a member of the U.S. House of Representatives, a Californian in a tight race this year. I mentioned the fivethirtyeight.com prediction for him, and he said “fivethirtyeight.com? What’s that?”

2 0.88205409 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs

Introduction: As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core. But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1 will be that each parallel instance will use .RNG.name[1] , the Wichmann-Hill RNG. JAGS 2.2.0 includes a new lecuyer module (along with the glm module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer" , .RNG.name="lecuyer::lecuyer" , and .RNG.name=

3 0.80363274 729 andrew gelman stats-2011-05-24-Deviance as a difference

Introduction: Peng Yu writes: On page 180 of BDA2, deviance is defined as D(y,\theta)=-2log p(y|\theta). However, according to GLM 2/e by McCullagh and Nelder, deviance is the different of the log-likelihood of the full model and the base model (times 2) (see the equation on the wiki webpage). The english word ‘deviance’ implies the difference from a standard (in this case, the base model). I’m wondering what the rationale for your definition of deviance, which consists of only 1 term rather than 2 terms. My reply: Deviance is typically computed as a relative quantity; that is, people look at the difference in deviance. So the two definitions are equivalent.

4 0.77910292 1793 andrew gelman stats-2013-04-08-The Supreme Court meets the fallacy of the one-sided bet

Introduction: Doug Hartmann writes ( link from Jay Livingston): Justice Antonin Scalia’s comment in the Supreme Court hearings on the U.S. law defining marriage that “there’s considerable disagreement among sociologists as to what the consequences of raising a child in a single-sex family, whether that is harmful to the child or not.” Hartman argues that Scalia is factually incorrect—there is not actually “considerable disagreement among sociologists” on this issue—and quotes a recent report from the American Sociological Association to this effect. Assuming there’s no other considerable group of sociologists (Hartman knows of only one small group) arguing otherwise, it seems that Hartman has a point. Scalia would’ve been better off omitting the phrase “among sociologists”—then he’d have been on safe ground, because you can always find somebody to take a position on the issue. Jerry Falwell’s no longer around but there’s a lot more where he came from. Even among scientists, there’s

5 0.76971018 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there

Introduction: Jeff Ratto points me to this news article by Dean Baker reporting the work of three economists, Thomas Herndon, Michael Ash, and Robert Pollin, who found errors in a much-cited article by Carmen Reinhart and Kenneth Rogoff analyzing historical statistics of economic growth and public debt. Mike Konczal provides a clear summary; that’s where I got the above image. Errors in data processing and data analysis It turns out that Reinhart and Rogoff flubbed it. Herndon et al. write of “spreadsheet errors, omission of available data, weighting, and transcription.” The spreadsheet errors are the most embarrassing, but the other choices in data analysis seem pretty bad too. It can be tough to work with small datasets, so I have sympathy for Reinhart and Rogoff, but it does look like they were jumping to conclusions in their paper. Perhaps the urgency of the topic moved them to publish as fast as possible rather than carefully considering the impact of their data-analytic choi

6 0.74003458 707 andrew gelman stats-2011-05-12-Human nature can’t be changed (except when it can)

7 0.73801017 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model

8 0.73687434 194 andrew gelman stats-2010-08-09-Data Visualization

9 0.73052239 1636 andrew gelman stats-2012-12-23-Peter Bartlett on model complexity and sample size

10 0.72589934 232 andrew gelman stats-2010-08-25-Dodging the diplomats

11 0.72426558 1140 andrew gelman stats-2012-01-27-Educational monoculture

12 0.71796012 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

13 0.71711683 1112 andrew gelman stats-2012-01-11-A blog full of examples for your statistics class

14 0.71085596 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

15 0.69971943 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

16 0.69574428 61 andrew gelman stats-2010-05-31-A data visualization manifesto

17 0.6922183 1792 andrew gelman stats-2013-04-07-X on JLP

18 0.69038105 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

19 0.68467611 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

20 0.68466407 401 andrew gelman stats-2010-11-08-Silly old chi-square!