andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1250 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
sentIndex sentText sentNum sentScore
1 It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. [sent-2, score-0.988]
2 Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language. [sent-3, score-0.416]
3 ” Indeed this is the conventional ordering but nobody thinks it’s right anymore. [sent-4, score-0.996]
wordName wordTfidf (topN-words)
[('ordering', 0.453), ('etaoin', 0.312), ('shrdlu', 0.312), ('corpus', 0.264), ('berry', 0.246), ('opponent', 0.241), ('nick', 0.215), ('frequency', 0.196), ('english', 0.176), ('letter', 0.173), ('thinks', 0.168), ('conventional', 0.165), ('jeff', 0.161), ('strategy', 0.154), ('nobody', 0.139), ('fun', 0.135), ('personal', 0.133), ('pointed', 0.128), ('figure', 0.12), ('kind', 0.117), ('wonder', 0.116), ('indeed', 0.101), ('course', 0.089), ('following', 0.082), ('discussion', 0.076), ('right', 0.071), ('article', 0.061), ('know', 0.05), ('writes', 0.047), ('see', 0.041), ('also', 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
2 0.12042627 2211 andrew gelman stats-2014-02-14-The popularity of certain baby names is falling off the clifffffffffffff
Introduction: Ubs writes: I was looking at baby name data last night and I stumbled upon something curious. I follow the baby names blog occasionally but not regularly, so I’m not sure if it’s been noticed before. Let me present it like this: Take the statement… Of the top 100 boys and top 100 girls names, only ___% contain the letter __. I’m using the SSA baby names page, so that’s U.S. births, and I’m looking at the decade of 2000-2009 (so kids currently aged 4 to 13). Which letters would you expect to have the lowest rate of occurrence? As expected, the lowest score is for Q, which appears zero times. (Jacqueline ranks #104 for girls.) It’s the second lowest that surprised me. (… You can pause and try to guess now. Spoilers to follow.) Of the other big-point Scrabble letters, Z appears in four names (Elizabeth, Zachary, Mackenzie, Zoe) and X in six, of which five are closely related (Alexis, Alexander, Alexandra, Alexa, Alex, Xavier). J is heavily overrepresented, especial
3 0.10101911 429 andrew gelman stats-2010-11-24-“But you and I don’t learn in isolation either”
Introduction: Indeed.
Introduction: Indeed.
5 0.092461534 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
6 0.081692278 2119 andrew gelman stats-2013-12-01-Separated by a common blah blah blah
7 0.080677986 2177 andrew gelman stats-2014-01-19-“The British amateur who debunked the mathematics of happiness”
9 0.076277532 2356 andrew gelman stats-2014-06-02-On deck this week
10 0.072213233 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
11 0.071241923 227 andrew gelman stats-2010-08-23-Visualization magazine
12 0.061859142 2353 andrew gelman stats-2014-05-30-I posted this as a comment on a sociology blog
13 0.058948409 1318 andrew gelman stats-2012-05-13-Stolen jokes
14 0.057451472 688 andrew gelman stats-2011-04-30-Why it’s so relaxing to think about social issues
15 0.057184421 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
16 0.057070848 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics
17 0.056990139 538 andrew gelman stats-2011-01-25-Postdoc Position #2: Hierarchical Modeling and Statistical Graphics
18 0.056616247 1473 andrew gelman stats-2012-08-28-Turing chess run update
19 0.055962451 280 andrew gelman stats-2010-09-16-Meet Hipmunk, a really cool flight-finder that doesn’t actually work
20 0.055900272 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
topicId topicWeight
[(0, 0.064), (1, -0.027), (2, -0.021), (3, 0.01), (4, 0.006), (5, -0.019), (6, 0.015), (7, 0.001), (8, 0.009), (9, -0.004), (10, 0.014), (11, 0.001), (12, 0.03), (13, 0.024), (14, 0.007), (15, -0.001), (16, 0.016), (17, 0.008), (18, -0.029), (19, -0.034), (20, 0.001), (21, 0.024), (22, -0.003), (23, -0.008), (24, -0.02), (25, 0.017), (26, -0.018), (27, -0.011), (28, 0.004), (29, -0.013), (30, 0.025), (31, -0.017), (32, -0.019), (33, -0.029), (34, -0.001), (35, -0.006), (36, -0.017), (37, -0.028), (38, 0.011), (39, 0.012), (40, 0.009), (41, 0.006), (42, 0.02), (43, -0.054), (44, -0.016), (45, -0.013), (46, -0.01), (47, 0.007), (48, 0.001), (49, -0.012)]
simIndex simValue blogId blogTitle
same-blog 1 0.95303524 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
2 0.63697809 841 andrew gelman stats-2011-08-06-Twitteo killed the bloggio star . . . Not!
Introduction: Alex Braunstein writes: Thanks for the post . You drove >800 pageviews to my site. That’s >90% of what Robert Scoble’s tweet generated with 184k followers, which I find incredibly impressive. 800 doesn’t sound like so much to me, but I suppose if it’s the right 800 . . .
3 0.60939455 2068 andrew gelman stats-2013-10-18-G+ hangout for Bayesian Data Analysis course now! (actually, in 5 minutes)
Introduction: Here’s the link . When you’re on the hangout, please mute your own microphone! I’ll have the computer point at the blackboard. You can follow along with the slides: for the first hour for the second hour P.S. Apparently there is some limit on number of hangout participants (see comments). I didn’t know about that! Maybe next time will try “on air” hangout, I will have to learn more about this. Next week the teaching asst will do the course so no hangout, then in two weeks there is no class because it’s the day after Halloween and that’s a holiday around here. So we’ll resume this on Fri 8 Nov. See you then! P.P.S. Those of you who were able to join the hangout: Could you please let me know how the visual and sound quality were? Thanks.
4 0.60556775 1798 andrew gelman stats-2013-04-11-Continuing conflict over conflict statistics
Introduction: Mike Spagat sends along a serious presentation with an ironic title: 18.7 MILLION ANNIHILATED SAYS LEADING EXPERT IN PEER–REVIEWED JOURNAL: AN APPROVED, AUTHORITATIVE, SCIENTIFIC PRESENTATION MADE BY AN EXPERT He’ll be speaking on it at tomorrow’s meeting of the Catastrophes and Conflict Forum of the Royal Society of Medicine in London. All I can say is, it’s a long time since I’ve seen a slide presentation in portrait form. It brings me back to the days of transparency sheets.
5 0.59349108 1660 andrew gelman stats-2013-01-08-Bayesian, Permutable Symmetries
Introduction: Mike Betancourt sends along this paper . Could be interesting, no? Note the heavy tail on the CDF in Figure 3, exhibiting weakened median time since 1999. And, as you can see from the bibliography, the work draws on a variety of sources:
6 0.58815408 2237 andrew gelman stats-2014-03-08-Disagreeing to disagree
7 0.57447916 260 andrew gelman stats-2010-09-07-QB2
8 0.57073921 1676 andrew gelman stats-2013-01-16-Detecting cheating in chess
9 0.56756997 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update
10 0.56521666 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians
11 0.56408358 263 andrew gelman stats-2010-09-08-The China Study: fact or fallacy?
12 0.56386989 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
14 0.56166059 1573 andrew gelman stats-2012-11-11-Incredibly strange spam
15 0.55968314 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?
16 0.5573414 1503 andrew gelman stats-2012-09-19-“Poor Smokers in New York State Spend 25% of Income on Cigarettes, Study Finds”
17 0.5572226 2203 andrew gelman stats-2014-02-08-“Guys who do more housework get less sex”
19 0.55602914 685 andrew gelman stats-2011-04-29-Data mining and allergies
20 0.5541811 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
topicId topicWeight
[(5, 0.437), (16, 0.055), (24, 0.117), (60, 0.032), (63, 0.025), (99, 0.171)]
simIndex simValue blogId blogTitle
1 0.93772864 224 andrew gelman stats-2010-08-22-Mister P gets married
Introduction: Jeff, Justin, and I write : Gay marriage is not going away as a highly emotional, contested issue. Proposition 8, the California ballot measure that bans same-sex marriage, has seen to that, as it winds its way through the federal courts. But perhaps the public has reached a turning point. And check out the (mildly) dynamic graphics. The picture below is ok but for the full effect you have to click through and play the movie.
2 0.89251649 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm
Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein
3 0.88222206 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package
Introduction: Ossama Hamed writes in with a new dynamic graphing software: I have the pleasure to brief you on our Data Visualization software “Trend Compass”. TC is a new concept in viewing statistics and trends in an animated way by displaying in one chart 5 axis (X, Y, Time, Bubble size & Bubble color) instead of just the traditional X and Y axis. . . .
same-blog 4 0.84926939 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
5 0.84426326 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)
Introduction: This one was so beautiful I just had to repost it: From the New York Times, 9 Sept 1981: IF I COULD CHANGE PARK SLOPE If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me. The Park Slope ball is great. Can you guess what street it’s on? “Yes. My street. That’s Carroll Street.” – Jennifer Chatmon, second grade, P.S. 321 This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.
7 0.78980017 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
8 0.78152806 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
9 0.76392257 164 andrew gelman stats-2010-07-26-A very short story
10 0.73233098 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
11 0.71008265 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
12 0.70760238 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
14 0.68734187 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing
15 0.66889435 2194 andrew gelman stats-2014-02-01-Recently in the sister blog
16 0.65402901 123 andrew gelman stats-2010-07-01-Truth in headlines
17 0.65298307 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
18 0.64646697 1052 andrew gelman stats-2011-12-11-Rational Turbulence
19 0.61668986 764 andrew gelman stats-2011-06-14-Examining US Legislative process with “Many Bills”
20 0.613846 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign