andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-358 knowledge-graph by maker-knowledge-mining

358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies


meta infos for this blog

Source: html

Introduction: Jason Roos sends along this article : On election days many of us see a colorful map of the U.S. where each tiny county has a color on the continuum between red and blue. So far we have not used such data to improve the effectiveness of marketing models. In this study, we show that we should. We demonstrate the usefulness of political data via an interesting application–the demand for movies. Using boxoffice data from 25 counties in the U.S. Midwest (21 quarters between 2000 and 2005) we show that by including political data one can improve out-of-sample predictions significantly. Specifically, we estimate the improvement in forecasts due to the addition of political data to be around $43 million per year for the entire U.S. theatrical market. Furthermore, when it comes to movies we depart from previous work in another way. While previous studies have relied on pre-determined movie genres, we estimate perceived movie attributes in a latent space and formulate viewers’ tastes as


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jason Roos sends along this article : On election days many of us see a colorful map of the U. [sent-1, score-0.097]

2 where each tiny county has a color on the continuum between red and blue. [sent-3, score-0.385]

3 So far we have not used such data to improve the effectiveness of marketing models. [sent-4, score-0.299]

4 We demonstrate the usefulness of political data via an interesting application–the demand for movies. [sent-6, score-0.303]

5 Midwest (21 quarters between 2000 and 2005) we show that by including political data one can improve out-of-sample predictions significantly. [sent-9, score-0.68]

6 Specifically, we estimate the improvement in forecasts due to the addition of political data to be around $43 million per year for the entire U. [sent-10, score-0.417]

7 Furthermore, when it comes to movies we depart from previous work in another way. [sent-13, score-0.312]

8 While previous studies have relied on pre-determined movie genres, we estimate perceived movie attributes in a latent space and formulate viewers’ tastes as ideal points. [sent-14, score-1.335]

9 Using perceived attributes improves the out-of-sample predictions even further (by around $93 million per year). [sent-15, score-0.841]

10 Furthermore, the latent dimensions that we identify are not only effective in improving predictions, they are also quite insightful about the nature of movies. [sent-16, score-0.255]

11 Let’s start by replacing Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 by graphs. [sent-20, score-0.088]

12 I mean, why not just print out a core dump in hex and cut out the middleman? [sent-24, score-0.181]

13 In all seriousness, the paper looks interesting and I’m sure would hugely benefit by some plots of data and fitted models. [sent-25, score-0.188]

14 On a more specific note, I wonder if the authors can shed any light on the controversial question of the Brokeback Mountain’s popularity in Republican-voting areas in “red states” (search Kaus Brokeback Mountain for more than you could possibly want to read on this topic). [sent-26, score-0.186]

15 I never saw the movie, nor have I followed the debates about its box office, but I recall there being some controversy on the topic. [sent-27, score-0.08]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('brokeback', 0.27), ('movie', 0.217), ('mountain', 0.214), ('attributes', 0.186), ('predictions', 0.178), ('furthermore', 0.173), ('perceived', 0.162), ('latent', 0.148), ('tables', 0.135), ('genres', 0.123), ('improve', 0.121), ('million', 0.118), ('depart', 0.116), ('middleman', 0.116), ('continuum', 0.111), ('red', 0.111), ('previous', 0.108), ('insightful', 0.107), ('tastes', 0.107), ('whassup', 0.107), ('kaus', 0.104), ('usefulness', 0.104), ('midwest', 0.101), ('dump', 0.101), ('shed', 0.101), ('per', 0.1), ('data', 0.1), ('political', 0.099), ('improves', 0.097), ('quarters', 0.097), ('relied', 0.097), ('colorful', 0.097), ('jason', 0.093), ('formulate', 0.093), ('viewers', 0.093), ('counties', 0.092), ('midterm', 0.092), ('turnout', 0.088), ('hugely', 0.088), ('replacing', 0.088), ('movies', 0.088), ('seriousness', 0.087), ('show', 0.085), ('popularity', 0.085), ('county', 0.084), ('topic', 0.082), ('controversy', 0.08), ('print', 0.08), ('tiny', 0.079), ('marketing', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

Introduction: Jason Roos sends along this article : On election days many of us see a colorful map of the U.S. where each tiny county has a color on the continuum between red and blue. So far we have not used such data to improve the effectiveness of marketing models. In this study, we show that we should. We demonstrate the usefulness of political data via an interesting application–the demand for movies. Using boxoffice data from 25 counties in the U.S. Midwest (21 quarters between 2000 and 2005) we show that by including political data one can improve out-of-sample predictions significantly. Specifically, we estimate the improvement in forecasts due to the addition of political data to be around $43 million per year for the entire U.S. theatrical market. Furthermore, when it comes to movies we depart from previous work in another way. While previous studies have relied on pre-determined movie genres, we estimate perceived movie attributes in a latent space and formulate viewers’ tastes as

2 0.091624402 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

3 0.089384422 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

Introduction: This post is by Phil. Psychologists perform experiments on Canadian undergraduate psychology students and draws conclusions that (they believe) apply to humans in general; they publish in Science. A drug company decides to embark on additional trials that will cost tens of millions of dollars based on the results of a careful double-blind study….whose patients are all volunteers from two hospitals. A movie studio holds 9 screenings of a new movie for volunteer viewers and, based on their survey responses, decides to spend another $8 million to re-shoot the ending.  A researcher interested in the effect of ventilation on worker performance conducts a months-long study in which ventilation levels are varied and worker performance is monitored…in a single building. In almost all fields of research, most studies are based on convenience samples, or on random samples from a larger population that is itself a convenience sample. The paragraph above gives just a few examples.  The benefit

4 0.083738618 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events

Introduction: Mike Cohen writes: The recent events in Egypt raise an interesting statistical question. It is of course common for news stations like CNN to interview various officials and policy experts to find out what is likely to happen next. The obvious response of people like us is why ask such people when they didn’t foresee a month ago that these dynamic events were about to happen. One would instead like to hear from those experts that did predict that something was about to happen in Tunisia, and Egypt, and Jordan, and maybe Yemen, etc. Well, are there such people? My friend Bob Burton says that of course one can find such people in the sense that they made such predictions, but that is like finding counties that have voted for the President in the last five elections, big deal, or psychics that predicted the last assassination, again big deal. There is a good deal of truth in that. However, it seems like we do a little better. There are two points to make. First, there is an i

5 0.082670502 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr

6 0.082237415 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

7 0.081987731 536 andrew gelman stats-2011-01-24-Trends in partisanship by state

8 0.081125133 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes

9 0.080105059 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

10 0.079234481 428 andrew gelman stats-2010-11-24-Flawed visualization of U.S. voting maybe has some good features

11 0.079024121 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

12 0.077842399 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

13 0.075547561 2197 andrew gelman stats-2014-02-04-Peabody here.

14 0.07332667 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

15 0.073219232 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

16 0.07254982 474 andrew gelman stats-2010-12-18-The kind of frustration we could all use more of

17 0.071611375 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

18 0.070987351 389 andrew gelman stats-2010-11-01-Why it can be rational to vote

19 0.070987351 1565 andrew gelman stats-2012-11-06-Why it can be rational to vote

20 0.069478199 2369 andrew gelman stats-2014-06-11-“I can’t drive home now. Not just yet. First I need to go to Utrecht.”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.157), (1, -0.022), (2, 0.058), (3, 0.011), (4, 0.043), (5, -0.018), (6, -0.043), (7, -0.017), (8, -0.022), (9, 0.002), (10, 0.024), (11, -0.035), (12, 0.003), (13, -0.019), (14, -0.027), (15, 0.035), (16, 0.026), (17, -0.01), (18, 0.013), (19, -0.023), (20, -0.021), (21, 0.024), (22, -0.014), (23, 0.002), (24, 0.022), (25, -0.007), (26, -0.002), (27, -0.003), (28, 0.021), (29, 0.03), (30, 0.007), (31, -0.019), (32, -0.022), (33, -0.084), (34, 0.009), (35, -0.007), (36, -0.001), (37, -0.022), (38, 0.02), (39, -0.013), (40, -0.02), (41, -0.004), (42, -0.049), (43, 0.007), (44, 0.013), (45, 0.012), (46, -0.001), (47, 0.005), (48, 0.018), (49, -0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94670433 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

Introduction: Jason Roos sends along this article : On election days many of us see a colorful map of the U.S. where each tiny county has a color on the continuum between red and blue. So far we have not used such data to improve the effectiveness of marketing models. In this study, we show that we should. We demonstrate the usefulness of political data via an interesting application–the demand for movies. Using boxoffice data from 25 counties in the U.S. Midwest (21 quarters between 2000 and 2005) we show that by including political data one can improve out-of-sample predictions significantly. Specifically, we estimate the improvement in forecasts due to the addition of political data to be around $43 million per year for the entire U.S. theatrical market. Furthermore, when it comes to movies we depart from previous work in another way. While previous studies have relied on pre-determined movie genres, we estimate perceived movie attributes in a latent space and formulate viewers’ tastes as

2 0.78166074 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots

Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th

3 0.77753878 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”

Introduction: The National Climatic Data Center has tentatively announced that 2010 is, get this, “tied” for warmest on record. Presumably they mean it’s tied to the precision that they quote (1.12 F above the 20th-century average). The uncertainty in the measurements, as well as some fuzziness about exactly what is being measured (how much of the atmosphere, and the oceans) makes these global-average things really suspect. For instance, if there’s more oceanic turnover one year, that can warm the deep ocean but cool the shallow ocean and atmosphere, so even though the heat content of the atmosphere-ocean system goes up, some of these “global-average” estimates can go down. The reverse can happen too. And of course there are various sources of natural variability that are not, these days, what most people are most interested in. So everybody who knows about the climate professes to hate the emphasis on climate records. And yet, they’re irresistible. I’m sure we’ll see the usual clamor of som

4 0.76541626 2181 andrew gelman stats-2014-01-21-The Commissar for Traffic presents the latest Five-Year Plan

Introduction: What do Paul Samuelson and the U.S. Department of Transportation have in common? Phil Price points us to this news article by Clark Williams-Derry: As the State Smart Transportation Initiative at the University of Wisconsin points out, the US Department of Transportation has been making the virtually identical vehicle travel forecasts for well over a decade. All of those forecasts project rapid and incessant growth in vehicle travel for as far as the eye can see. Meanwhile, actual traffic volumes have flattened out, and may actually be falling. Each of the rising colored lines represents a forecast from a different year. The black line represents actual traffic trends on US roads—which never rose as quickly as the forecasters had predicted, and actually started a modest decline in 2007. I’d like to see a label on the y-axis, and I’d recommend labeling the x-axis at 5-year intervals rather than every year, but the point seems pretty clear. Williams-Derry continues:

5 0.74867034 1500 andrew gelman stats-2012-09-17-“2% per degree Celsius . . . the magic number for how worker productivity responds to warm-hot temperatures”

Introduction: Solomon Hsiang shares some bad news: Persistently reduced labor productivity may be one of the largest economic impacts of anthropogenic climate change. . . . Two percent per degree Celsius . . . That’s the magic number for how worker productivity responds to warm/hot temperatures. In  my 2010 PNAS paper , I [Hsiang] found that labor-intensive sectors of national economies decreased output by roughly 2.4% per degree C and argued that this looked suspiously like it came from reductions in worker output. Using a totally different method and dataset, Matt Neidell and Josh Graff Zivin found that labor supply in micro data fell by 1.8% per degree C. Both responses kicked in at around 26C. Chris Sheehan just sent me  this NYT article on air conditioning , where they mention this neat natural experiment: [I]n the past year, [Japan] became an unwitting laboratory to study even more extreme air-conditioning abstinence, and the results have not been encouraging. After th

6 0.74222845 500 andrew gelman stats-2011-01-03-Bribing statistics

7 0.72914273 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

8 0.72768044 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve

9 0.72324717 68 andrew gelman stats-2010-06-03-…pretty soon you’re talking real money.

10 0.71571738 200 andrew gelman stats-2010-08-11-Separating national and state swings in voting and public opinion, or, How I avoided blogorific embarrassment: An agony in four acts

11 0.71534842 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

12 0.71025658 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

13 0.70693398 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm

14 0.70242143 2308 andrew gelman stats-2014-04-27-White stripes and dead armadillos

15 0.69743633 685 andrew gelman stats-2011-04-29-Data mining and allergies

16 0.69725639 245 andrew gelman stats-2010-08-31-Predicting marathon times

17 0.69528973 1201 andrew gelman stats-2012-03-07-Inference = data + model

18 0.69271886 1397 andrew gelman stats-2012-06-27-Stand Your Ground laws and homicides

19 0.68409228 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

20 0.68307424 925 andrew gelman stats-2011-09-26-Ethnicity and Population Structure in Personal Naming Networks


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.013), (9, 0.053), (15, 0.019), (16, 0.096), (21, 0.063), (24, 0.091), (54, 0.149), (63, 0.043), (84, 0.021), (86, 0.039), (92, 0.01), (99, 0.255)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94611448 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

Introduction: Jason Roos sends along this article : On election days many of us see a colorful map of the U.S. where each tiny county has a color on the continuum between red and blue. So far we have not used such data to improve the effectiveness of marketing models. In this study, we show that we should. We demonstrate the usefulness of political data via an interesting application–the demand for movies. Using boxoffice data from 25 counties in the U.S. Midwest (21 quarters between 2000 and 2005) we show that by including political data one can improve out-of-sample predictions significantly. Specifically, we estimate the improvement in forecasts due to the addition of political data to be around $43 million per year for the entire U.S. theatrical market. Furthermore, when it comes to movies we depart from previous work in another way. While previous studies have relied on pre-determined movie genres, we estimate perceived movie attributes in a latent space and formulate viewers’ tastes as

2 0.93553901 322 andrew gelman stats-2010-10-06-More on the differences between drugs and medical devices

Introduction: Someone who works in statistics in the pharmaceutical industry (but prefers to remain anonymous) sent me this update to our discussion on the differences between approvals of drugs and medical devices: The ‘substantial equivalence’ threshold is a very outdated. Basically the FDA has to follow federal law and the law is antiquated and leads to two extraordinarily different paths for device approval. You could have a very simple but first-in-kind device with an easy to understand physiological mechanism of action (e.g. the FDA approved a simple tiny stent that would relieve pressure from a glaucoma patient’s eye this summer). This device would require a standard (likely controlled) trial at the one-sided 0.025 level. Even after the trial it would likely go to a panel where outside experts (e.g.practicing & academic MDs and statisticians) hear evidence from the company and FDA and vote on its safety and efficacy. FDA would then rule, consider the panel’s vote, on whether to appro

3 0.92372668 1938 andrew gelman stats-2013-07-14-Learning how to speak

Introduction: I’ve been trying to reduce my American accent when speaking French. I tried taping my voice and playing it back, but that didn’t help. I couldn’t actually tell that I had a strong accent by listening to myself. My own voice is just too familiar to me. Then Malecki told me about the international phonetic alphabet, which is just great. And there’s even a convenient website that translates. For example, le loup est revenu -> lə lu ε ʀəvny I stared at Malecki’s mouth while he said the phrase, and I finally understood the difference between the two different “oo” sounds. That evening at home I tried it out on the local expert and he laughed at my attempts but grudgingly admitted I was getting better. On about the 10th try, after watching him say it over and over and staring at his mouth, I was finally able to do it! I know this is going to sound stupid to all you linguistics experts out there, but I had no idea that you could figure out how to speak better by staring at s

4 0.91558397 615 andrew gelman stats-2011-03-16-Chess vs. checkers

Introduction: Mark Palko writes : Chess derives most of its complexity through differentiated pieces; with checkers the complexity comes from the interaction between pieces. The result is a series of elegant graph problems where the viable paths change with each move of your opponent. To draw an analogy with chess, imagine if moving your knight could allow your opponent’s bishop to move like a rook. Add to that the potential for traps and manipulation that come with forced capture and you have one of the most remarkable games of all time. . . . It’s not unusual to hear masters of both chess and checkers (draughts) to admit that they prefer the latter. So why does chess get all the respect? Why do you never see a criminal mastermind or a Bond villain playing in a checkers tournament? Part of the problem is that we learn the game as children so we tend to think of it as a children’s game. We focus on how simple the rules are and miss how much complexity and subtlety you can get out of those ru

5 0.90225893 1676 andrew gelman stats-2013-01-16-Detecting cheating in chess

Introduction: Three different people have pointed me to this post by Ken Regan on statistical evaluation of claims of cheating in chess. So I figured I have to satisfy demand and post something on this. But I have nothing to say. All these topics interest me, but I somehow had difficulty reading through the entire post. I scanned through but what I really wanted to see was some data. Show me a scatterplot, then I’ll get interested. P.S. This is meant as no disparagement of Regan or his blog. I just couldn’t quite get into this particular example.

6 0.89582121 839 andrew gelman stats-2011-08-04-To commenters who are trying to sell something

7 0.89420116 1889 andrew gelman stats-2013-06-08-Using trends in R-squared to measure progress in criminology??

8 0.89221185 94 andrew gelman stats-2010-06-17-SAT stories

9 0.88483626 1105 andrew gelman stats-2012-01-08-Econ debate about prices at a fancy restaurant

10 0.88007295 1083 andrew gelman stats-2011-12-26-The quals and the quants

11 0.87850392 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

12 0.87704879 1237 andrew gelman stats-2012-03-30-Statisticians: When We Teach, We Don’t Practice What We Preach

13 0.87345243 1721 andrew gelman stats-2013-02-13-A must-read paper on statistical analysis of experimental data

14 0.86990368 2137 andrew gelman stats-2013-12-17-Replication backlash

15 0.86989522 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

16 0.8694194 571 andrew gelman stats-2011-02-13-A departmental wiki page?

17 0.86898041 867 andrew gelman stats-2011-08-23-The economics of the mac? A paradox of competition

18 0.86797774 537 andrew gelman stats-2011-01-25-Postdoc Position #1: Missing-Data Imputation, Diagnostics, and Applications

19 0.86586267 1473 andrew gelman stats-2012-08-28-Turing chess run update

20 0.8642987 675 andrew gelman stats-2011-04-22-Arrow’s other theorem