andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-554 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Yesterday Aleks posted a proposal for a model makers’ Hippocratic Oath. I’d like to add two more items: 1. From Mark Palko : “Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk.” 2. In case you like to think of your methods as nonparametric or non-model-based: “Our method, just like any model, relies on assumptions which we have the duty to state and to check.” (Observant readers will see that I use “we” rather than “I” in these two items. Modeling is an inherently collaborative endeavor.
sentIndex sentText sentNum sentScore
1 Yesterday Aleks posted a proposal for a model makers’ Hippocratic Oath. [sent-1, score-0.528]
2 From Mark Palko : “Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk. [sent-3, score-0.886]
3 In case you like to think of your methods as nonparametric or non-model-based: “Our method, just like any model, relies on assumptions which we have the duty to state and to check. [sent-5, score-1.319]
4 ” (Observant readers will see that I use “we” rather than “I” in these two items. [sent-6, score-0.408]
wordName wordTfidf (topN-words)
[('hippocratic', 0.323), ('makers', 0.266), ('endeavor', 0.266), ('collaborative', 0.25), ('relies', 0.231), ('duty', 0.225), ('inherently', 0.214), ('nonparametric', 0.206), ('aleks', 0.198), ('proposal', 0.196), ('model', 0.191), ('palko', 0.184), ('build', 0.18), ('items', 0.175), ('describes', 0.169), ('yesterday', 0.157), ('range', 0.151), ('outside', 0.151), ('assumptions', 0.141), ('posted', 0.141), ('mark', 0.138), ('add', 0.125), ('two', 0.121), ('method', 0.118), ('readers', 0.116), ('modeling', 0.112), ('state', 0.108), ('like', 0.105), ('methods', 0.094), ('used', 0.078), ('go', 0.073), ('case', 0.069), ('rather', 0.066), ('use', 0.062), ('data', 0.044), ('see', 0.043), ('think', 0.035)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 554 andrew gelman stats-2011-02-04-An addition to the model-makers’ oath
Introduction: Yesterday Aleks posted a proposal for a model makers’ Hippocratic Oath. I’d like to add two more items: 1. From Mark Palko : “Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk.” 2. In case you like to think of your methods as nonparametric or non-model-based: “Our method, just like any model, relies on assumptions which we have the duty to state and to check.” (Observant readers will see that I use “we” rather than “I” in these two items. Modeling is an inherently collaborative endeavor.
2 0.16937192 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath
Introduction: Emanuel Derman and Paul Wilmott wonder how to get their fellow modelers to give up their fantasy of perfection. In a Business Week article they proposed, not entirely in jest, a model makers’ Hippocratic Oath: I will remember that I didn’t make the world and that it doesn’t satisfy my equations. Though I will use models boldly to estimate value, I will not be overly impressed by mathematics. I will never sacrifice reality for elegance without explaining why I have done so. Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension. Found via Abductive Intelligence .
3 0.16444728 227 andrew gelman stats-2010-08-23-Visualization magazine
Introduction: Aleks pointed me to this .
4 0.16410065 1318 andrew gelman stats-2012-05-13-Stolen jokes
Introduction: Fun stories here (from Kliph Nesteroff, link from Mark Palko).
5 0.15873945 1063 andrew gelman stats-2011-12-16-Suspicious histogram bars
Introduction: Aleks sent me this (I’m not sure from where):
6 0.14993902 602 andrew gelman stats-2011-03-06-Assumptions vs. conditions
8 0.12176157 281 andrew gelman stats-2010-09-16-NSF crowdsourcing
9 0.11602065 533 andrew gelman stats-2011-01-23-The scalarization of America
10 0.1154122 1302 andrew gelman stats-2012-05-06-Fun with google autocomplete
11 0.10984483 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
12 0.10866502 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
13 0.10704986 441 andrew gelman stats-2010-12-01-Mapmaking software
15 0.098045096 99 andrew gelman stats-2010-06-19-Paired comparisons
16 0.094385393 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
17 0.08853671 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion
18 0.087760225 1723 andrew gelman stats-2013-02-15-Wacky priors can work well?
19 0.085594609 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings
topicId topicWeight
[(0, 0.126), (1, 0.085), (2, 0.011), (3, 0.043), (4, 0.018), (5, 0.027), (6, -0.026), (7, 0.007), (8, 0.069), (9, 0.02), (10, 0.039), (11, 0.008), (12, -0.024), (13, 0.008), (14, -0.097), (15, 0.065), (16, 0.028), (17, 0.044), (18, -0.129), (19, -0.109), (20, -0.026), (21, -0.081), (22, 0.003), (23, -0.003), (24, -0.02), (25, -0.01), (26, -0.017), (27, -0.002), (28, -0.011), (29, 0.03), (30, -0.011), (31, -0.022), (32, 0.019), (33, 0.054), (34, 0.01), (35, 0.042), (36, -0.022), (37, 0.022), (38, 0.013), (39, 0.018), (40, -0.017), (41, 0.051), (42, 0.008), (43, -0.001), (44, 0.003), (45, 0.003), (46, -0.049), (47, -0.034), (48, -0.044), (49, 0.027)]
simIndex simValue blogId blogTitle
same-blog 1 0.90376222 554 andrew gelman stats-2011-02-04-An addition to the model-makers’ oath
Introduction: Yesterday Aleks posted a proposal for a model makers’ Hippocratic Oath. I’d like to add two more items: 1. From Mark Palko : “Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk.” 2. In case you like to think of your methods as nonparametric or non-model-based: “Our method, just like any model, relies on assumptions which we have the duty to state and to check.” (Observant readers will see that I use “we” rather than “I” in these two items. Modeling is an inherently collaborative endeavor.
2 0.66824371 441 andrew gelman stats-2010-12-01-Mapmaking software
Introduction: I can’t use this on my PC, but the link comes from Aleks, so maybe it’s something good!
3 0.64400476 1302 andrew gelman stats-2012-05-06-Fun with google autocomplete
Introduction: Aleks points us to this idea of labeling for news.
4 0.63921684 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency
Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian
5 0.63526618 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics
Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio
6 0.63020372 1064 andrew gelman stats-2011-12-16-The benefit of the continuous color scale
7 0.62963867 1063 andrew gelman stats-2011-12-16-Suspicious histogram bars
8 0.62332612 909 andrew gelman stats-2011-09-15-7 steps to successful infographics
9 0.61867225 281 andrew gelman stats-2010-09-16-NSF crowdsourcing
10 0.61633629 496 andrew gelman stats-2011-01-01-Tukey’s philosophy
11 0.61560923 1004 andrew gelman stats-2011-11-11-Kaiser Fung on how not to critique models
12 0.61421382 227 andrew gelman stats-2010-08-23-Visualization magazine
13 0.61347395 1958 andrew gelman stats-2013-07-27-Teaching is hard
14 0.60645527 1141 andrew gelman stats-2012-01-28-Using predator-prey models on the Canadian lynx series
15 0.60115993 2133 andrew gelman stats-2013-12-13-Flexibility is good
16 0.60092545 1392 andrew gelman stats-2012-06-26-Occam
18 0.59576505 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?
19 0.59380591 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers
20 0.59360641 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model
topicId topicWeight
[(15, 0.064), (16, 0.081), (53, 0.066), (81, 0.042), (83, 0.193), (85, 0.03), (86, 0.029), (99, 0.353)]
simIndex simValue blogId blogTitle
1 0.90337545 926 andrew gelman stats-2011-09-26-NYC
Introduction: Our downstairs neighbor hates us. She looks away from us when we see them on the street, if we’re coming into the building at the same time she doesn’t hold open the door, and if we’re in the elevator when it stops on her floor, she refuses to get on. On the other hand, if you’re a sociology professor in Chicago, one of your colleagues might try to run you over in a parking lot. So I guess I’m getting off easy.
Introduction: Thomas Basbøll points to this ten-year-old article from Anne-Wil Harzing on the consequences of sloppy citations. Harzing tells the story of an unsupported claim that is contradicted by published data but has been presented as fact in a particular area of the academic literature. She writes that “high expatriate failure rates [with "expatriate failure" defined as "the expatriate returning home before his/her contractual period of employment abroad expires"] were in fact a myth created by massive misquotations and careless copying of references.” Many papers claimed an expatriate failure rate of 25-40% (according to Harzing, this is much higher than the actual rate as estimated from empirical data), with this overly-high rate supported by a complicated link of references leading to . . . no real data. Hartzing reports the following published claims: Harvey (1996: 103): `The rate of failure of expatriate managers relocating overseas from United States based MNCs has been estima
3 0.89523339 1307 andrew gelman stats-2012-05-07-The hare, the pineapple, and Ed Wegman
Introduction: Commenters here are occasionally bothered that I spend so much time attacking frauds and plagiarists. See, for example, here and here . Why go on and on about these losers, given that there are more important problems in the world such as war, pestilence, hunger, and graphs where the y-axis doesn’t go all the way down to zero? Part of the story is that I do research for a living so I resent people who devalue research through misattribution or fraud, in the same way that rich people don’t like counterfeiters. What really bugs me, though, is when cheaters get caught and still don’t admit it. People like Hauser, Wegman, Fischer, and Weick get under my skin because they have the chutzpah to just deny deny deny. The grainy time-stamped videotape with their hand in the cookie jar is right there, and they’ll still talk around the problem. Makes me want to scream. This happens all the time . All. Over. The. Place. Everybody makes mistakes, and just about everybody does thing
same-blog 4 0.89419079 554 andrew gelman stats-2011-02-04-An addition to the model-makers’ oath
Introduction: Yesterday Aleks posted a proposal for a model makers’ Hippocratic Oath. I’d like to add two more items: 1. From Mark Palko : “Our model only describes the data we used to build it; if you go outside of that range, you do so at your own risk.” 2. In case you like to think of your methods as nonparametric or non-model-based: “Our method, just like any model, relies on assumptions which we have the duty to state and to check.” (Observant readers will see that I use “we” rather than “I” in these two items. Modeling is an inherently collaborative endeavor.
5 0.8928681 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c
Introduction: Brandon Behlendorf writes: I [Behlendorf] am replicating some previous research using OLS [he's talking about what we call "linear regression"---ed.] to regress a logged rate (to reduce skew) of Y on a number of predictors (Xs). Y is the count of a phenomena divided by the population of the unit of the analysis. The problem that I am encountering is that Y is composite count of a number of distinct phenomena [A+B+C], and these phenomena are not uniformly distributed across the sample. Most of the research in this area has conducted regressions either with Y or with individual phenomena [A or B or C] as the dependent variable. Yet it seems that if [A, B, C] are not uniformly distributed across the sample of units in the same proportion, then the use of Y would be biased, since as a count of [A+B+C] divided by the population, it would treat as equivalent units both [2+0.5+1.5] and [4+0+0]. My goal is trying to find a methodology which allows a researcher to regress Y on a
6 0.89044738 1456 andrew gelman stats-2012-08-13-Macro, micro, and conflicts of interest
7 0.8782354 649 andrew gelman stats-2011-04-05-Internal and external forecasting
8 0.86815822 645 andrew gelman stats-2011-04-04-Do you have any idea what you’re talking about?
9 0.86474055 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys
10 0.86166453 282 andrew gelman stats-2010-09-17-I can’t escape it
11 0.85799378 1554 andrew gelman stats-2012-10-31-It not necessary that Bayesian methods conform to the likelihood principle
12 0.85770494 1389 andrew gelman stats-2012-06-23-Larry Wasserman’s statistics blog
13 0.85739195 1704 andrew gelman stats-2013-02-03-Heuristics for identifying ecological fallacies?
14 0.8560037 1861 andrew gelman stats-2013-05-17-Where do theories come from?
15 0.85331863 1681 andrew gelman stats-2013-01-19-Participate in a short survey about the weight of evidence provided by statistics
16 0.85323811 108 andrew gelman stats-2010-06-24-Sometimes the raw numbers are better than a percentage
17 0.85293186 2329 andrew gelman stats-2014-05-11-“What should you talk about?”
18 0.85283589 1042 andrew gelman stats-2011-12-05-Timing is everything!
19 0.85082954 339 andrew gelman stats-2010-10-13-Battle of the NYT opinion-page economists
20 0.85020304 2070 andrew gelman stats-2013-10-20-The institution of tenure