andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1531 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: It’s good to remember that wikis aren’t just for looking up Dylan lyrics and the plots of old Three’s Company episodes.
sentIndex sentText sentNum sentScore
1 It’s good to remember that wikis aren’t just for looking up Dylan lyrics and the plots of old Three’s Company episodes. [sent-1, score-1.477]
wordName wordTfidf (topN-words)
[('lyrics', 0.502), ('dylan', 0.439), ('episodes', 0.439), ('plots', 0.295), ('company', 0.267), ('aren', 0.222), ('remember', 0.219), ('old', 0.202), ('three', 0.172), ('looking', 0.171), ('good', 0.088)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1531 andrew gelman stats-2012-10-12-Elderpedia
Introduction: It’s good to remember that wikis aren’t just for looking up Dylan lyrics and the plots of old Three’s Company episodes.
2 0.14844155 1579 andrew gelman stats-2012-11-16-Hacks, maps, and moon rocks: Recent items in the sister blog
Introduction: 1. Oh no . . . Obama is doooooomed!!!!!!!!!!! (Don’t worry, it’s just Pat Caddell and Doug Schoen talking) 2. Red-blue maps for different slices of the population 3. Picasso paintings, moon rocks, and hand-written Beatles lyrics
3 0.14214659 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,
4 0.10681077 800 andrew gelman stats-2011-07-13-I like lineplots
Introduction: These particular lineplots are called parallel coordinate plots.
5 0.095242791 111 andrew gelman stats-2010-06-26-Tough love as a style of writing
Introduction: Helen DeWitt links to an interview with Seth Godin, who makes some commonplace but useful observations on jobs and careers. It’s fine, but whenever I read this sort of thing, I get annoyed by the super-aggressive writing style. These internet guys–Seth Godin, Clay Shirky, Philip Greenspun, Jeff Jarvis, and so on–are always getting in your face, telling you how everything you thought was true was wrong. Some of the things these guys say are just silly (for example, Godin’s implication that Bob Dylan is more of a success than the Monkees because Dylan sells more tickets), other times they have interesting insights, but reading any of them for awhile just sets me on edge. I can’t take being shouted at, and I get a little tired of hearing over and over again that various people, industries, etc., are dinosaurs. Where does this aggressive style come from? My guess is that it’s coming from the vast supply of “business books” out there. These are books that are supposed to grab yo
6 0.079847969 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?
7 0.077498212 2268 andrew gelman stats-2014-03-26-New research journal on observational studies
8 0.075663142 725 andrew gelman stats-2011-05-21-People kept emailing me this one so I think I have to blog something
9 0.073549539 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses
10 0.072965883 1207 andrew gelman stats-2012-03-10-A quick suggestion
11 0.071558684 61 andrew gelman stats-2010-05-31-A data visualization manifesto
12 0.069781475 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A
13 0.068350084 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors
14 0.067950636 174 andrew gelman stats-2010-08-01-Literature and life
15 0.064767115 2057 andrew gelman stats-2013-10-10-Chris Chabris is irritated by Malcolm Gladwell
16 0.062600769 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
17 0.057700548 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system
18 0.056716077 1451 andrew gelman stats-2012-08-08-Robert Kosara reviews Ed Tufte’s short course
19 0.054884568 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data
20 0.054199692 2014 andrew gelman stats-2013-09-09-False memories and statistical analysis
topicId topicWeight
[(0, 0.038), (1, -0.023), (2, -0.012), (3, 0.021), (4, 0.027), (5, -0.014), (6, 0.001), (7, 0.01), (8, 0.01), (9, 0.005), (10, -0.009), (11, -0.004), (12, 0.004), (13, 0.004), (14, -0.016), (15, 0.008), (16, 0.012), (17, -0.016), (18, 0.006), (19, 0.0), (20, 0.002), (21, -0.008), (22, 0.001), (23, 0.016), (24, -0.012), (25, -0.011), (26, -0.022), (27, 0.014), (28, 0.0), (29, 0.019), (30, 0.009), (31, 0.002), (32, 0.01), (33, -0.015), (34, 0.013), (35, 0.009), (36, 0.001), (37, 0.009), (38, -0.012), (39, -0.006), (40, 0.018), (41, 0.006), (42, -0.005), (43, 0.063), (44, -0.03), (45, 0.005), (46, -0.038), (47, 0.008), (48, 0.005), (49, -0.021)]
simIndex simValue blogId blogTitle
same-blog 1 0.94908369 1531 andrew gelman stats-2012-10-12-Elderpedia
Introduction: It’s good to remember that wikis aren’t just for looking up Dylan lyrics and the plots of old Three’s Company episodes.
2 0.64819139 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics
Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’
3 0.64640146 1342 andrew gelman stats-2012-05-24-The Used TV Price is Too Damn High
Introduction: Rohin Dhar points me to this post : At Priceonomics, we’ve learned that our users don’t want to buy used products. Rather, they want to buy inexpensive products, and used items happen to be inexpensive. Let someone else eat the initial depreciation, Priceonomics users will swoop in later and get a good deal. . . . But if you want to buy a used television, you are in for a world of hurt. As you peruse through the Craigslist listings for used TVs, you may notice something surprising – the prices are kind of high. Do a quick check on Amazon and your suspicions will be confirmed; lots of people try to sell their used television for more than that same TV would cost brand new. . . . To test our suspicions that something was amiss in the used television market, we compared used TV prices to the prices of buying them new instead. . . . It turns out, people have very inflated expectations for how much they call sell their used TV. Only 3 of the 26 televisions we analyzed were discounte
4 0.62575513 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?
Introduction: I’m a physicist by training, statistical data analyst by trade. Although some of my work is pretty standard statistical analysis, more often I work somewhere in a gray area that includes physics, engineering, and statistics. I have very little formal statistics training but I do study in an academic-like way to learn techniques from the literature when I need to. I do some things well but there are big gaps in my stats knowledge compared to anyone who has gone to grad school in statistics. On the other hand, there are big gaps in most statisticians’ physics and engineering knowledge compared to anyone who has gone to grad school in physics. Generally my breadth and depth of knowledge is about right for the kind of work that I do, I think. But last week I was offered a consulting job that might be better done by someone with more conventional stats knowledge than I have. The job involves gene expression in different types of tumors, so it’s “biostatistics” by definition, but the
5 0.62370008 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,
6 0.61336988 835 andrew gelman stats-2011-08-02-“The sky is the limit” isn’t such a good thing
8 0.58007896 800 andrew gelman stats-2011-07-13-I like lineplots
9 0.57897902 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
10 0.57464468 1003 andrew gelman stats-2011-11-11-$
11 0.56988335 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system
12 0.56572753 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?
13 0.56258935 2080 andrew gelman stats-2013-10-28-Writing for free
14 0.561459 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else
15 0.55644375 1597 andrew gelman stats-2012-11-29-What is expected of a consultant
16 0.55466139 1559 andrew gelman stats-2012-11-02-The blog is back
17 0.55088091 1127 andrew gelman stats-2012-01-18-The Fixie Bike Index
18 0.54937541 1655 andrew gelman stats-2013-01-05-The statistics software signal
19 0.54800951 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress
20 0.54377449 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”
topicId topicWeight
[(24, 0.133), (52, 0.303), (99, 0.322)]
simIndex simValue blogId blogTitle
1 0.93108273 914 andrew gelman stats-2011-09-16-meta-infographic
Introduction: “Most Popular Infographics you can find around the web” by designer and illustrator Alberto Antoniazzi.
same-blog 2 0.90927649 1531 andrew gelman stats-2012-10-12-Elderpedia
Introduction: It’s good to remember that wikis aren’t just for looking up Dylan lyrics and the plots of old Three’s Company episodes.
3 0.89677668 485 andrew gelman stats-2010-12-25-Unlogging
Introduction: Catherine Bueker writes: I [Bueker] am analyzing the effect of various contextual factors on the voter turnout of naturalized Latino citizens. I have included the natural log of the number of Spanish Language ads run in each state during the election cycle to predict voter turnout. I now want to calculate the predicted probabilities of turnout for those in states with 0 ads, 500 ads, 1000 ads, etc. The problem is that I do not know how to handle the beta coefficient of the LN(Spanish language ads). Is there someway to “unlog” the coefficient? My reply: Calculate these probabilities for specific values of predictors, then graph the predictions of interest. Also, you can average over the other inputs in your model to get summaries. See this article with Pardoe for further discussion.
4 0.89522517 223 andrew gelman stats-2010-08-21-Statoverflow
Introduction: Skirant Vadali writes: I am writing to seek your help in building a community driven Q&A; website tentatively called called ‘Statistics Analysis’. I am neither a founder of this website nor do I have any financial stake in its success. By way of background to this website, please see Stackoverflow (http://stackoverflow.com/) and Mathoverflow (http://mathoverflow.net/). Stackoverflow is a Q&A; website targeted at software developers and is designed to help them ask questions and get answers from other developers. Mathoverflow is a Q&A; website targeted at research mathematicians and is designed to help them ask and answer questions from other mathematicians across the world. The success of both these sites in helping their respective communities is a strong indicator that sites designed along these lines are very useful. The company that runs Stackoverflow (who also host Mathoverflow.net) has recently decided to develop other community driven websites for various other topic are
Introduction: Mark Palko points me to a news article by Zack Beauchamp on Jason Richwine, the recent Ph.D. graduate from Harvard’s policy school who left the conservative Heritage Foundation after it came out that his Ph.D. thesis was said to be all about the low IQ’s of Hispanic immigrants. Heritage and others apparently thought this association could discredit their anti-immigration-reform position. Richwine’s mentor Charles Murray was unhappy about the whole episode. Beauchamp’s article is worth reading in that it provides some interesting background, in particular by getting into the details of the Ph.D. review process. In a sense, Beauchamp is too harsh. Flawed Ph.D. theses get published all the time. I’d say that most Ph.D. theses I’ve seen are flawed: usually the plan is to get the papers into shape later, when submitting them to journals. If a student doesn’t go into academia, the thesis typically just sits there and is rarely followed up on. I don’t know the statistics o
6 0.89209211 1301 andrew gelman stats-2012-05-05-Related to z-statistics
7 0.87938523 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions
8 0.87054795 104 andrew gelman stats-2010-06-22-Seeking balance
9 0.86771536 1246 andrew gelman stats-2012-04-04-Data visualization panel at the New York Public Library this evening!
10 0.86607152 889 andrew gelman stats-2011-09-04-The acupuncture paradox
11 0.84248328 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia
12 0.83987474 1020 andrew gelman stats-2011-11-20-No no no no no
13 0.8294735 786 andrew gelman stats-2011-07-04-Questions about quantum computing
14 0.82723451 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data
15 0.82413185 948 andrew gelman stats-2011-10-10-Combining data from many sources
16 0.82319331 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2
18 0.805511 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
19 0.79387593 2265 andrew gelman stats-2014-03-24-On deck this week
20 0.78227943 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer