andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-491 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Malecki’s right, this is very cool indeed. P.S. Is it really true that “4.5 million Parisians” ride the Metro every day? Even setting aside that not all the riders are Parisians, I’m guessing that 4.5 million is the number of rides, not the number of people who ride.
sentIndex sentText sentNum sentScore
1 Even setting aside that not all the riders are Parisians, I’m guessing that 4. [sent-6, score-0.663]
2 5 million is the number of rides, not the number of people who ride. [sent-7, score-0.644]
wordName wordTfidf (topN-words)
[('parisians', 0.622), ('ride', 0.4), ('million', 0.273), ('metro', 0.255), ('riders', 0.255), ('rides', 0.247), ('malecki', 0.228), ('number', 0.166), ('guessing', 0.152), ('cool', 0.135), ('aside', 0.132), ('setting', 0.124), ('day', 0.093), ('every', 0.088), ('true', 0.088), ('right', 0.064), ('really', 0.048), ('even', 0.043), ('people', 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 491 andrew gelman stats-2010-12-29-Don’t try this at home
Introduction: Malecki’s right, this is very cool indeed. P.S. Is it really true that “4.5 million Parisians” ride the Metro every day? Even setting aside that not all the riders are Parisians, I’m guessing that 4.5 million is the number of rides, not the number of people who ride.
2 0.12221757 474 andrew gelman stats-2010-12-18-The kind of frustration we could all use more of
Introduction: Nate writes : The Yankees have offered Jeter $45 million over three years — or $15 million per year. . . But that doesn’t mean that the process won’t be frustrating for Jeter, or that there won’t be a few hurt feelings along the way. . . . $45 million, huh? Even after taxes , that’s a lot of money!
3 0.10728284 1006 andrew gelman stats-2011-11-12-Val’s Number Scroll: Helping kids visualize math
Introduction: This looks cool.
4 0.092205547 1845 andrew gelman stats-2013-05-07-Is Felix Salmon wrong on free TV?
Introduction: Mark Palko writes : Salmon is dismissive of the claim that there are fifty million over-the-air television viewers: The 50 million number, by the way, should not be considered particularly reliable: it’s Aereo’s guess as to the number of people who ever watch free-to-air TV, even if they mainly watch cable or satellite. (Maybe they have a hut somewhere with an old rabbit-ear TV in it.) And he strongly suggests the number is not only smaller but shrinking. By comparison, here’s a story from the broadcasting news site TV News Check from June of last year (if anyone has more recent numbers please let me know): According to new research by GfK Media, the number of Americans now relying solely on over-the-air (OTA) television reception increased to almost 54 million, up from 46 million just a year ago. The recently completed survey also found that the demographics of broadcast-only households skew towards younger adults, minorities and lower-income families. As Palko says,
5 0.086023003 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys
Introduction: 21. A country is divided into three regions with populations of 2 million, 2 million, and 0.5 million, respectively. A survey is done asking about foreign policy opinions.. Somebody proposes taking a sample of 50 people from each reason. Give a reason why this non-proportional sample would not usually be done, and also a reason why it might actually be a good idea. Solution to question 20 From yesterday : 20. Explain in two sentences why we expect survey respondents to be honest about vote preferences but possibly dishonest about reporting unhealty behaviors. Solution: Respondents tend to be sincere about vote preferences because this affects the outcome of the poll, and people are motivated to have their candidate poll well. This motivation is typically not present in reporting behaviors; you have no particular reason for wanting to affect the average survey response.
6 0.082978718 2219 andrew gelman stats-2014-02-21-The world’s most popular languages that the Mac documentation hasn’t been translated into
7 0.075849339 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
8 0.074716404 1147 andrew gelman stats-2012-01-30-Statistical Murder
9 0.074592672 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys
10 0.072853707 1064 andrew gelman stats-2011-12-16-The benefit of the continuous color scale
11 0.072005667 1546 andrew gelman stats-2012-10-24-Hey—has anybody done this study yet?
12 0.071915247 975 andrew gelman stats-2011-10-27-Caffeine keeps your Mac awake
13 0.067145757 1905 andrew gelman stats-2013-06-18-There are no fat sprinters
14 0.066650853 1938 andrew gelman stats-2013-07-14-Learning how to speak
15 0.064356312 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers
16 0.063948885 924 andrew gelman stats-2011-09-24-“Income can’t be used to predict political opinion”
17 0.063463151 925 andrew gelman stats-2011-09-26-Ethnicity and Population Structure in Personal Naming Networks
18 0.061025057 153 andrew gelman stats-2010-07-17-Tenure-track position at U. North Carolina in survey methods and social statistics
19 0.0587634 894 andrew gelman stats-2011-09-07-Hipmunk FAIL: Graphics without content is not enough
20 0.057498448 2270 andrew gelman stats-2014-03-28-Creating a Lenin-style democracy
topicId topicWeight
[(0, 0.04), (1, -0.02), (2, 0.019), (3, 0.01), (4, 0.01), (5, 0.003), (6, 0.021), (7, -0.003), (8, -0.004), (9, -0.027), (10, -0.013), (11, -0.025), (12, 0.006), (13, 0.007), (14, -0.022), (15, 0.007), (16, 0.032), (17, -0.01), (18, 0.03), (19, 0.0), (20, -0.012), (21, -0.001), (22, -0.006), (23, 0.002), (24, -0.034), (25, -0.008), (26, -0.02), (27, 0.011), (28, 0.001), (29, 0.011), (30, -0.014), (31, -0.044), (32, 0.036), (33, -0.051), (34, 0.016), (35, -0.024), (36, -0.015), (37, -0.001), (38, -0.025), (39, 0.003), (40, -0.027), (41, -0.028), (42, -0.023), (43, -0.023), (44, 0.006), (45, -0.002), (46, -0.02), (47, -0.013), (48, 0.013), (49, -0.005)]
simIndex simValue blogId blogTitle
same-blog 1 0.96365821 491 andrew gelman stats-2010-12-29-Don’t try this at home
Introduction: Malecki’s right, this is very cool indeed. P.S. Is it really true that “4.5 million Parisians” ride the Metro every day? Even setting aside that not all the riders are Parisians, I’m guessing that 4.5 million is the number of rides, not the number of people who ride.
2 0.71813554 474 andrew gelman stats-2010-12-18-The kind of frustration we could all use more of
Introduction: Nate writes : The Yankees have offered Jeter $45 million over three years — or $15 million per year. . . But that doesn’t mean that the process won’t be frustrating for Jeter, or that there won’t be a few hurt feelings along the way. . . . $45 million, huh? Even after taxes , that’s a lot of money!
3 0.63166487 68 andrew gelman stats-2010-06-03-…pretty soon you’re talking real money.
Introduction: A New York Times article reports the opening of a half-mile section of bike path, recently built along the west side of Manhattan at a cost of $16M, or roughly $30 million per mile. That’s about $5700 per linear foot. Kinda sounds like a lot, doesn’t it? Well, $30 million per mile for about one car-lane mile is a lot, but it’s not out of line compared to other urban highway construction costs. The Doyle Drive project in San Francisco — a freeway to replace the current old and deteriorating freeway approach to the Golden Gate Bridge — is currently under way at $1 billion for 1.6 miles…but hey, it will have six lanes each way, so that isn’t so bad, at $50 million per lane-mile. And there are other components to the project, too, not just building the highway (there will also be bike paths, landscaping, on- and off-ramps, and so on). All in all it seems roughly in line with the New York bike lane project. Speaking of the Doyle Drive project, one expense was the cost of movin
4 0.63050187 1342 andrew gelman stats-2012-05-24-The Used TV Price is Too Damn High
Introduction: Rohin Dhar points me to this post : At Priceonomics, we’ve learned that our users don’t want to buy used products. Rather, they want to buy inexpensive products, and used items happen to be inexpensive. Let someone else eat the initial depreciation, Priceonomics users will swoop in later and get a good deal. . . . But if you want to buy a used television, you are in for a world of hurt. As you peruse through the Craigslist listings for used TVs, you may notice something surprising – the prices are kind of high. Do a quick check on Amazon and your suspicions will be confirmed; lots of people try to sell their used television for more than that same TV would cost brand new. . . . To test our suspicions that something was amiss in the used television market, we compared used TV prices to the prices of buying them new instead. . . . It turns out, people have very inflated expectations for how much they call sell their used TV. Only 3 of the 26 televisions we analyzed were discounte
Introduction: I was updating my Mac and noticed the following: Lots of obscure European languages there. That got me wondering: what’s the least obscure language not on the above list? Igbo? Swahili? Or maybe Tagalog? I did a quick google and found this list of languages by number of native speakers. Once you see the list, the answer is obvious: Hindi, first language of 295 million people, is not on Apple’s list. The next most popular languages not included: Bengali, Punjabi, Javanese, Wu, Telegu, Marathi, Tamil, Urdu. Wow: most of these are Indian! Then comes Persian and a bunch of others. It turns out that Tagalog, Igbo, and Swahili, are way down on this list with 28 million, 24 million, and 26 million native speakers, respectively. Only 26 million for Swahili? This made me want to check the list of languages by total number of speakers . The ranking of most of the languages isn’t much different, but Swahili is now #10, at 140 million. Hindi and Bengali are still th
6 0.60790771 1546 andrew gelman stats-2012-10-24-Hey—has anybody done this study yet?
7 0.60682291 1038 andrew gelman stats-2011-12-02-Donate Your Data to Science!
8 0.59691173 1845 andrew gelman stats-2013-05-07-Is Felix Salmon wrong on free TV?
9 0.58922768 1245 andrew gelman stats-2012-04-03-Redundancy and efficiency: In praise of Penn Station
10 0.58517617 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!
11 0.58001906 1127 andrew gelman stats-2012-01-18-The Fixie Bike Index
12 0.5712958 1536 andrew gelman stats-2012-10-16-Using economics to reduce bike theft
13 0.56964302 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
14 0.55209225 894 andrew gelman stats-2011-09-07-Hipmunk FAIL: Graphics without content is not enough
15 0.55194587 1057 andrew gelman stats-2011-12-14-Hey—I didn’t know that!
16 0.54415107 737 andrew gelman stats-2011-05-30-Memorial Day question
17 0.54409963 489 andrew gelman stats-2010-12-28-Brow inflation
18 0.53483975 1147 andrew gelman stats-2012-01-30-Statistical Murder
19 0.53033894 2238 andrew gelman stats-2014-03-09-Hipmunk worked
topicId topicWeight
[(9, 0.064), (16, 0.088), (24, 0.127), (53, 0.032), (86, 0.04), (92, 0.402), (99, 0.039)]
simIndex simValue blogId blogTitle
same-blog 1 0.89128613 491 andrew gelman stats-2010-12-29-Don’t try this at home
Introduction: Malecki’s right, this is very cool indeed. P.S. Is it really true that “4.5 million Parisians” ride the Metro every day? Even setting aside that not all the riders are Parisians, I’m guessing that 4.5 million is the number of rides, not the number of people who ride.
2 0.6278832 1166 andrew gelman stats-2012-02-13-Recently in the sister blog
Introduction: Lingsanity! What the sophisticates thought in September 2008 Political opinions of U.S. military The origin of essentialist reasoning
Introduction: Last year we discussed an important challenge in causal inference: The standard advice (given in many books, including ours) for causal inference is to control for relevant pre-treatment variables as much as possible. But, as Judea Pearl has pointed out, instruments (as in “instrumental variables”) are pre-treatment variables that we would not want to “control for” in a matching or regression sense. At first, this seems like a minor modification, with the new recommendation being to apply instrumental variables estimation using all pre-treatment instruments, and to control for all other pre-treatment variables. But that can’t really work as general advice. What about weak instruments or covariates that have some instrumental aspects? I asked Paul Rosenbaum for his thoughts on the matter, and he wrote the following: In section 18.2 of Design of Observational Studies (DOS), I [Rosenbaum] discuss “seemingly innocuous confounding” defined to be a covariate that predicts a su
4 0.47621644 2024 andrew gelman stats-2013-09-15-Swiss Jonah Lehrer update
Introduction: Nassim Taleb adds this link to the Dobelli story . I’m confused. I thought Swiss dudes were supposed to plagiarize their own stuff, not rip off other people’s. Whassup with that?
Introduction: A few months ago we discussed Ron Unz’s claim that Jews are massively overrepresented in Ivy League college admissions, not just in comparison to the general population of college-age Americans, but even in comparison to other white kids with comparable academic ability and preparation. Most of Unz’s article concerns admissions of Asian-Americans, and he also has a proposal to admit certain students at random (see my discussion in the link above). In the present post, I concentrate on the statistics about Jewish students, because this is where I have learned that his statistics are particularly suspect, with various numbers being off by factors of 2 or 4 or more. Unz’s article was discussed, largely favorably, by academic bloggers Tyler Cowen , Steve Hsu , and . . . me! Hsu writes: “Don’t miss the statistical supplement.” But a lot of our trust in those statistics seems to be misplaced. Some people have sent me some information showing serious problems with Unz’s methods
6 0.42674014 1563 andrew gelman stats-2012-11-05-Someone is wrong on the internet, part 2
7 0.40590209 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models
8 0.35562143 2073 andrew gelman stats-2013-10-22-Ivy Jew update
9 0.34849879 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results
10 0.3475543 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays
11 0.34422445 1108 andrew gelman stats-2012-01-09-Blogging, polemical and otherwise
12 0.33815426 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
13 0.33743361 1751 andrew gelman stats-2013-03-06-Janet Mertz’s response to “The Myth of American Meritocracy”
14 0.33562535 1785 andrew gelman stats-2013-04-02-So much artistic talent
15 0.33029297 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident
16 0.32392284 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
17 0.32361653 1008 andrew gelman stats-2011-11-13-Student project competition
18 0.31910408 1293 andrew gelman stats-2012-05-01-Huff the Magic Dragon
19 0.31687808 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
20 0.31620273 2225 andrew gelman stats-2014-02-26-A good comment on one of my papers