andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-752 knowledge-graph by maker-knowledge-mining

752 andrew gelman stats-2011-06-08-Traffic Prediction

meta infos for this blog

Source: html

Introduction: I always thought predicting traffic for a particular day and time would be something easily predicted from historic data with regression. Google Maps now has this feature: It would be good to actually include season, holiday and similar information: the predictions would be better. I wonder if one can find this data easily, or if others have done this work before.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I always thought predicting traffic for a particular day and time would be something easily predicted from historic data with regression. [sent-1, score-2.26]

2 Google Maps now has this feature: It would be good to actually include season, holiday and similar information: the predictions would be better. [sent-2, score-1.21]

3 I wonder if one can find this data easily, or if others have done this work before. [sent-3, score-0.701]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('easily', 0.362), ('historic', 0.34), ('holiday', 0.317), ('traffic', 0.305), ('season', 0.284), ('maps', 0.233), ('feature', 0.215), ('predicted', 0.215), ('predicting', 0.211), ('predictions', 0.195), ('google', 0.182), ('wonder', 0.15), ('include', 0.143), ('similar', 0.135), ('would', 0.133), ('day', 0.132), ('done', 0.122), ('others', 0.114), ('data', 0.109), ('information', 0.106), ('particular', 0.105), ('always', 0.104), ('thought', 0.1), ('find', 0.097), ('actually', 0.088), ('something', 0.078), ('work', 0.069), ('good', 0.066), ('time', 0.066), ('one', 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 752 andrew gelman stats-2011-06-08-Traffic Prediction

2 0.14523004 1109 andrew gelman stats-2012-01-09-Google correlate links statistics with minorities

Introduction: John Eppley asks what I make of this : Eppley is guessing the negative spikes are searches getting swamped by holiday season shoppers.

3 0.14107378 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

Introduction: WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev .

4 0.1243095 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays

Introduction: My Take a Number feature appears in today’s Times. And here are the graphs that I wish they’d had space to include! Original story here .

5 0.119518 988 andrew gelman stats-2011-11-02-Roads, traffic, and the importance in decision analysis of carefully examining your goals

Introduction: Sandeep Baliga writes : [In a recent study , Gilles Duranton and Matthew Turner write:] For interstate highways in metropolitan areas we [Duranton and Turner] ﬁnd that VKT (vehicle kilometers traveled) increases one for one with interstate highways, conﬁrming the fundamental law of highway congestion.’ Provision of public transit also simply leads to the people taking public transport being replaced by drivers on the road. Therefore: These ﬁndings suggest that both road capacity expansions and extensions to public transit are not appropriate policies with which to combat trafﬁc congestion. This leaves congestion pricing as the main candidate tool to curb trafﬁc congestion. To which I reply: Sure, if your goal is to curb traffic congestion . But what sort of goal is that? Thinking like a microeconomist, my policy goal is to increase people’s utility. Sure, traffic congestion is annoying, but there must be some advantages to driving on that crowded road or pe

6 0.11763766 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

7 0.1171422 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events

8 0.10251337 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

9 0.099413678 1508 andrew gelman stats-2012-09-23-Speaking frankly

10 0.098209873 1649 andrew gelman stats-2013-01-02-Back when 50 miles was a long way

11 0.096212842 2308 andrew gelman stats-2014-04-27-White stripes and dead armadillos

12 0.088909313 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

13 0.085296579 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

14 0.082679726 737 andrew gelman stats-2011-05-30-Memorial Day question

15 0.081928357 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

16 0.081136644 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

17 0.079057775 162 andrew gelman stats-2010-07-25-Darn that Lindsey Graham! (or, “Mr. P Predicts the Kagan vote”)

18 0.076663867 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

19 0.076589808 207 andrew gelman stats-2010-08-14-Pourquoi Google search est devenu plus raisonnable?

20 0.075322084 2181 andrew gelman stats-2014-01-21-The Commissar for Traffic presents the latest Five-Year Plan

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.119), (1, -0.008), (2, -0.005), (3, 0.031), (4, 0.051), (5, -0.017), (6, 0.002), (7, -0.014), (8, 0.005), (9, 0.003), (10, 0.029), (11, -0.011), (12, 0.021), (13, -0.032), (14, -0.054), (15, 0.038), (16, 0.038), (17, -0.018), (18, 0.025), (19, -0.001), (20, -0.028), (21, 0.025), (22, -0.01), (23, 0.018), (24, -0.015), (25, 0.001), (26, 0.016), (27, -0.024), (28, -0.002), (29, 0.015), (30, 0.05), (31, -0.043), (32, 0.002), (33, -0.007), (34, 0.016), (35, 0.02), (36, -0.011), (37, -0.013), (38, 0.007), (39, -0.006), (40, 0.02), (41, -0.001), (42, 0.019), (43, 0.027), (44, -0.032), (45, 0.007), (46, 0.048), (47, -0.038), (48, 0.03), (49, -0.082)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94537038 752 andrew gelman stats-2011-06-08-Traffic Prediction

2 0.74171728 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm

Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein

3 0.72315776 910 andrew gelman stats-2011-09-15-Google Refine

Introduction: Tools worth knowing about: Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase. A recent discussion on the Polmeth list about the ANES Cumulative File is a setting where I think Refine might help (admittedly 49760×951 is bigger than I’d really like to deal with in the browser with js… but on a subset yes). [I might write this example up later.] Go watch the screencast videos for Refine. Data-entry problems are rampant in stuff we all use — leading or trailing spaces; mixed decimal-indicators; different units or transformations used in the same column; mixed lettercase leading to false duplicates; that’s only the beginning. Refine certainly would help find duplicates, and it counts things for you too. Just counting rows is too much for researchers sometimes (see yesterday’s post )! Refine 2.0 adds some data-collection tools for

4 0.72070426 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

Introduction: Speaking of open data and google tools, see this post from Revolution R: How to use a Google Spreadsheet as data in R .

5 0.7165947 118 andrew gelman stats-2010-06-30-Question & Answer Communities

Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen

6 0.7163583 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

7 0.71551132 1559 andrew gelman stats-2012-11-02-The blog is back

8 0.70016551 1434 andrew gelman stats-2012-07-29-FindTheData.org

9 0.69890344 192 andrew gelman stats-2010-08-08-Turning pages into data

10 0.69456565 544 andrew gelman stats-2011-01-29-Splitting the data

11 0.69378263 253 andrew gelman stats-2010-09-03-Gladwell vs Pinker

12 0.68829161 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

13 0.68757546 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve

14 0.68637002 724 andrew gelman stats-2011-05-21-New search engine for data & statistics

15 0.68147492 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events

16 0.68068004 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

17 0.67866534 677 andrew gelman stats-2011-04-24-My NOAA story

18 0.67738068 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data

19 0.67639291 569 andrew gelman stats-2011-02-12-Get the Data

20 0.66863847 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(12, 0.136), (21, 0.06), (24, 0.275), (27, 0.051), (76, 0.054), (77, 0.052), (86, 0.038), (99, 0.177)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.88037789 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

Introduction: From Bannerjee and Duflo, “The Experimental Approach to Development Economics,” Annual Review of Economics (2009): One issue with the explicit acknowledgment of randomization as a fair way to allocate the program is that implementers may find that the easiest way to present it to the community is to say that an expansion of the program is planned for the control areas in the future (especially when such is indeed the case, as in phased-in design). I can’t quite figure out whether Bannerjee and Duflo are saying that they would lie and tell people that an expansion is planned when it isn’t, or whether they’re deploring that other people do it. I’m not bothered by a lot of the deception in experimental research–for example, I think the Milgram obedience experiment was just fine–but somehow the above deception bothers me. It just seems wrong to tell people that an expansion is planned if it’s not. P.S. Overall the article is pretty good. My only real problem with it is that

2 0.87769222 1706 andrew gelman stats-2013-02-04-Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

Introduction: Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the definition of “equitability” Reshef et al. propose is unsatisfiable—no nontrivial dependence measure, including MIC, has this property. Replicating the simulations in their paper with modestly larger data sets validates this finding. The heuristic notion of equitability, however, can be formalized instead as a self-consistency condition closely related to the Data Processing Inequality. Mutual information satisfies this new definition of equitability but MIC does not. We therefore propose that simply estimating mutual information will, in many cases, provide the sort of dependence measure Reshef et al. seek. For background, here are my two p

3 0.8776325 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

Introduction: In his new book, “What is Your Race? The Census and Our Flawed Efforts to Classify Americans,” former Census Bureau director Ken Prewitt recommends taking the race question off the decennial census: He recommends gradual changes, integrating the race and national origin questions while improving both. In particular, he would replace the main “race” question by a “race or origin” question, with the instruction to “Mark one or more” of the following boxes: “White,” “Black, African Am., or Negro,” “Hispanic, Latino, or Spanish origin,” “American Indian or Alaska Native,” “Asian”, “Native Hawaiian or Other Pacific Islander,” and “Some other race or origin.” Then the next question is to write in “specific race, origin, or enrolled or principal tribe.” Prewitt writes: His suggestion is to go with these questions in 2020 and 2030, then in 2040 “drop the race question and use only the national origin question.” He’s also relying on the American Community Survey to gather a lo

4 0.87742138 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree

5 0.8758595 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism

Introduction: Interesting discussion by Alex Tabarrok (following up on an article by Rebecca Solnit) on the continuum between voluntarism (or, more generally, non-cash transactions) and markets with monetary exchange. I just have a few comments of my own: 1. Solnit writes of “the iceberg economy,” which she characterizes as “based on gift economies, barter, mutual aid, and giving without hope of return . . . the relations between friends, between family members, the activities of volunteers or those who have chosen their vocation on principle rather than for profit.” I just wonder whether “barter” completely fits in here. Maybe it depends on context. Sometimes barter is an informal way of keeping track (you help me and I help you), but in settings of low liquidity I could imagine barter being simply an inefficient way of performing an economic transaction. 2. I am no expert on capitalism but my impression is that it’s not just about “competition and selfishness” but also is related to the

6 0.87568283 1479 andrew gelman stats-2012-09-01-Mothers and Moms

7 0.87536198 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

8 0.87477446 38 andrew gelman stats-2010-05-18-Breastfeeding, infant hyperbilirubinemia, statistical graphics, and modern medicine

9 0.87466472 938 andrew gelman stats-2011-10-03-Comparing prediction errors

10 0.87412024 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

11 0.8731091 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

12 0.87184322 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

13 0.8716898 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

14 0.87168723 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

15 0.87087083 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense

16 0.86905396 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

17 0.86877126 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

18 0.86871827 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

19 0.86864185 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

20 0.86693096 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample