andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-447 knowledge-graph by maker-knowledge-mining

447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.


meta infos for this blog

Source: html

Introduction: Posted by Phil Price: A blogger (can’t find his name anywhere on his blog) points to an article in the medical literature in 1994 that is…well, it’s shocking, is what it is. This is from the abstract: In Tai’s Model, the total area under a curve is computed by dividing the area under the curve between two designated values on the X-axis (abscissas) into small segments (rectangles and triangles) whose areas can be accurately calculated from their respective geometrical formulas. The total sum of these individual areas thus represents the total area under the curve. Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method (less than +/- 0.4%). Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin Yes, that’s right, this guy has rediscovered the trapezoidal rule. You know, that thing most readers of this blog were taught back in 1


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Posted by Phil Price: A blogger (can’t find his name anywhere on his blog) points to an article in the medical literature in 1994 that is…well, it’s shocking, is what it is. [sent-1, score-0.594]

2 The total sum of these individual areas thus represents the total area under the curve. [sent-3, score-1.191]

3 Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method (less than +/- 0. [sent-4, score-1.319]

4 Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin Yes, that’s right, this guy has rediscovered the trapezoidal rule. [sent-6, score-1.608]

5 You know, that thing most readers of this blog were taught back in 11th or 12th grade, and all med students were taught by freshman year in college. [sent-7, score-0.689]

6 The blogger finds this amusing, but I find it mostly upsetting and sad. [sent-8, score-0.334]

7 Which is sadder: (1) That this paper got past the referees, (2) that it has been cited dozens of times in the medical literature, including this year, (3) that, if the abstract is to be believed, many medical researchers DON’T use an accurate method to calculate the area under a curve. [sent-9, score-1.226]

8 I, too, have published results that I’ve later found were previously published by someone else. [sent-11, score-0.074]

9 But I’ve never done it with something that is taught in high school calculus. [sent-12, score-0.2]

10 And — I’m practically spluttering with indignation — if I wanted to calculate something like the area under a curve, I would at least first see if there is already a known way to do it! [sent-13, score-0.773]

11 I wouldn’t invent an obvious method, name it after myself, and send it to a journal, without it ever occurring to me that, gee, maybe someone else has thought about this already! [sent-14, score-0.287]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('area', 0.321), ('curve', 0.297), ('total', 0.285), ('areas', 0.219), ('taught', 0.2), ('medical', 0.171), ('calculate', 0.158), ('obtained', 0.153), ('blogger', 0.152), ('method', 0.134), ('triangles', 0.123), ('gee', 0.123), ('geometrical', 0.123), ('indignation', 0.123), ('trapezoidal', 0.123), ('abstract', 0.118), ('respective', 0.116), ('med', 0.116), ('metabolic', 0.116), ('rediscovered', 0.111), ('invent', 0.107), ('segments', 0.107), ('shocking', 0.107), ('designated', 0.107), ('upsetting', 0.104), ('dividing', 0.099), ('overestimated', 0.099), ('name', 0.099), ('freshman', 0.097), ('formulas', 0.097), ('practically', 0.093), ('literature', 0.091), ('calculated', 0.087), ('margin', 0.084), ('sum', 0.081), ('referees', 0.081), ('occurring', 0.081), ('anywhere', 0.081), ('computed', 0.08), ('finds', 0.078), ('graphic', 0.078), ('dozens', 0.078), ('established', 0.078), ('already', 0.078), ('accurately', 0.077), ('year', 0.076), ('researchers', 0.075), ('previously', 0.074), ('amusing', 0.073), ('believed', 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

Introduction: Posted by Phil Price: A blogger (can’t find his name anywhere on his blog) points to an article in the medical literature in 1994 that is…well, it’s shocking, is what it is. This is from the abstract: In Tai’s Model, the total area under a curve is computed by dividing the area under the curve between two designated values on the X-axis (abscissas) into small segments (rectangles and triangles) whose areas can be accurately calculated from their respective geometrical formulas. The total sum of these individual areas thus represents the total area under the curve. Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method (less than +/- 0.4%). Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin Yes, that’s right, this guy has rediscovered the trapezoidal rule. You know, that thing most readers of this blog were taught back in 1

2 0.15743043 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size

Introduction: As we get more data, we can fit more model. But at some point we become so overwhelmed by data that, for computational reasons, we can barely do anything at all. Thus, the curve above could be thought of as the product of two curves: a steadily increasing curve showing the statistical ability to fit more complex models with more data, and a steadily decreasing curve showing the computational feasibility of doing so.

3 0.12074624 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

Introduction: After reading all the comments here I remembered that I’ve actually written a paper on the generalized method of moments–including the bit about maximum likelihood being a special case. The basic idea is simple enough that it must have been rediscovered dozens of times by different people (sort of like the trapezoidal rule ). In our case, we were motivated to (independently) develop the (well-known, but not by me) generalized method of moments as a way of specifying an indirectly-parameterized prior distribution, rather than as a way of estimating parameters from direct data. But the math is the same.

4 0.11811539 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!

Introduction: Matthew Yglesias links approvingly to the following statement by Michael Mandel: Homeland Security accounts for roughly 90% of the increase in federal regulatory employment over the past ten years. Roughly 90%, huh? That sounds pretty impressive. But wait a minute . . . what if total federal regulatory employment had increased a bit less. Then Homeland Security could’ve accounted for 105% of the increase, or 500% of the increase, or whatever. The point is the change in total employment is the sum of a bunch of pluses and minuses. It happens that, if you don’t count Homeland Security, the total hasn’t changed much–I’m assuming Mandel’s numbers are correct here–and that could be interesting. The “roughly 90%” figure is misleading because, when written as a percent of the total increase, it’s natural to quickly envision it as a percentage that is bounded by 100%. There is a total increase in regulatory employment that the individual agencies sum to, but some margins are p

5 0.10431424 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

Introduction: The Stan Model of the Week showcases research using Stan to push the limits of applied statistics.  If you have a model that you would like to submit for a future post then send us an email . Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes, “Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances 1 billion light years), these light curves typicall

6 0.10252961 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

7 0.10096172 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

8 0.1000281 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

9 0.091097422 723 andrew gelman stats-2011-05-21-Literary blurb translation guide

10 0.088863067 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

11 0.08800292 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

12 0.083694011 712 andrew gelman stats-2011-05-14-The joys of working in the public domain

13 0.082689114 2220 andrew gelman stats-2014-02-22-Quickies

14 0.08173959 596 andrew gelman stats-2011-03-01-Looking for a textbook for a two-semester course in probability and (theoretical) statistics

15 0.080478944 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

16 0.079281166 73 andrew gelman stats-2010-06-08-Observational Epidemiology

17 0.078808613 2245 andrew gelman stats-2014-03-12-More on publishing in journals

18 0.077517577 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference

19 0.073668763 1321 andrew gelman stats-2012-05-15-A statistical research project: Weeding out the fraudulent citations

20 0.071655735 1881 andrew gelman stats-2013-06-03-Boot


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.152), (1, -0.015), (2, -0.016), (3, -0.032), (4, 0.031), (5, 0.004), (6, 0.03), (7, -0.009), (8, 0.003), (9, 0.01), (10, 0.056), (11, 0.016), (12, -0.035), (13, -0.006), (14, -0.033), (15, 0.001), (16, 0.034), (17, 0.004), (18, 0.0), (19, 0.003), (20, 0.02), (21, 0.019), (22, -0.002), (23, -0.028), (24, 0.033), (25, 0.005), (26, -0.036), (27, 0.008), (28, 0.016), (29, 0.013), (30, -0.001), (31, 0.04), (32, 0.011), (33, 0.002), (34, -0.019), (35, -0.003), (36, 0.042), (37, 0.047), (38, -0.004), (39, 0.005), (40, -0.002), (41, -0.048), (42, 0.019), (43, 0.018), (44, 0.04), (45, -0.037), (46, -0.037), (47, 0.043), (48, -0.012), (49, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96285605 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

Introduction: Posted by Phil Price: A blogger (can’t find his name anywhere on his blog) points to an article in the medical literature in 1994 that is…well, it’s shocking, is what it is. This is from the abstract: In Tai’s Model, the total area under a curve is computed by dividing the area under the curve between two designated values on the X-axis (abscissas) into small segments (rectangles and triangles) whose areas can be accurately calculated from their respective geometrical formulas. The total sum of these individual areas thus represents the total area under the curve. Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method (less than +/- 0.4%). Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin Yes, that’s right, this guy has rediscovered the trapezoidal rule. You know, that thing most readers of this blog were taught back in 1

2 0.72259682 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations

Introduction: In a rare Christmas-themed post here, I pass along this note from Alexander Berger at GiveWell : We just published a  blog post  following up on the *other* famous piece of evidence for deworming, the Miguel and Kremer experiment from Kenya. They shared data and code from their working paper (!) follow-up finding that deworming increases incomes ten years later, and we came out of the re-analysis feeling more confident in, though not wholly convinced by, the results. We’ve also just released  our new list of top charities  for giving season this year, which I think might be a good fit for your audience. We wrote a  blog post explaining our choices , and have also published extensive reviews of the top charities and the interventions on which they work. Perhaps the most interesting change since last year is the addition of GiveDirectly in the #2 spot; they do direct unconditional cash transfers to people living on less than a dollar a day in Kenya. We think it’s a remarkable mode

3 0.69073999 976 andrew gelman stats-2011-10-27-Geophysicist Discovers Modeling Error (in Economics)

Introduction: Continuing “heckle the press” month here at the blog, I (Bob) found the following “discovery” a little overplayed by David H. Freedman , who was writing for Scientific American in the following article and blog post: Blog: Why Economic Models are Always Wrong Article: A Formula for Economic Calamity The article’s paywalled, but the blog entry isn’t. Apparently, a geophysicist named Jonathan Carter (good luck finding him on the web given only that information) found that when he simulated from a complicated model, then fit the model to the simulated data, he sometimes got different results. What’s more, these differing estimates fit the data equally well but made different predictions on new data. Now we don’t know if the model was identifiable, had different local optima (i.e., multiple modes), how he fit the data, or really anything, but it doesn’t really matter. Reading the comments and article is a depressing exercise in the sociology of science, with clueles

4 0.6734885 1657 andrew gelman stats-2013-01-06-Lee Nguyen Tran Kim Song Shimazaki

Introduction: Andrew Lee writes: I am a recent M.A. graduate in sociology. I am primarily qualitative in method but have been moving in a more mixed-methods direction ever since I discovered sports analytics (Moneyball, Football Outsiders, Wages of Wins, etc.). For my thesis I studied Korean-Americans in education in the health professions through a comparison of Asian ethnic representation in Los Angeles-area medical and dental schools. I did this by counting up different Asian ethnic groups at UC Irvine, USC and Loma Linda University’s medical/dental schools using surnames as an identifier (I coded for ethnicity using an algorithm from the North American Association of Central Cancer Registries which correlated surnames with ethnicity: http://www.naaccr.org/Research/DataAnalysisTools.aspx). The coding was mostly easy, since “Nguyen” and “Tran” is always Vietnamese, “Kim” and “Song” is Korean, “Shimazaki” is Japanese, etc. Now, the first time around I found that Chinese-Americans and

5 0.67229933 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again

Introduction: Blogger Deep Climate looks at another paper by the 2002 recipient of the American Statistical Association’s Founders award. This time it’s not funny, it’s just sad. Here’s Wikipedia on simulated annealing: By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random “nearby” solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly “downhill” as T goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima—which are the bane of greedier methods. And here’s Wegman: During each step of the algorithm, the variable that will eventually represent the minimum is replaced by a random solution that is chosen according to a temperature

6 0.66987604 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

7 0.66973138 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.

8 0.66894597 1239 andrew gelman stats-2012-04-01-A randomized trial of the set-point diet

9 0.66847062 1585 andrew gelman stats-2012-11-20-“I know you aren’t the plagiarism police, but . . .”

10 0.66517997 675 andrew gelman stats-2011-04-22-Arrow’s other theorem

11 0.66400474 69 andrew gelman stats-2010-06-04-A Wikipedia whitewash

12 0.66102004 2137 andrew gelman stats-2013-12-17-Replication backlash

13 0.66056597 457 andrew gelman stats-2010-12-07-Whassup with phantom-limb treatment?

14 0.65751803 2220 andrew gelman stats-2014-02-22-Quickies

15 0.65381968 1683 andrew gelman stats-2013-01-19-“Confirmation, on the other hand, is not sexy”

16 0.65344226 2191 andrew gelman stats-2014-01-29-“Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With Univ. of London Professor Michael Spagat”

17 0.65171981 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

18 0.65092558 1254 andrew gelman stats-2012-04-09-In the future, everyone will publish everything.

19 0.64963692 1917 andrew gelman stats-2013-06-28-Econ coauthorship update

20 0.64619046 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.025), (16, 0.121), (21, 0.035), (24, 0.176), (35, 0.013), (41, 0.115), (42, 0.014), (52, 0.017), (53, 0.016), (61, 0.012), (62, 0.011), (86, 0.017), (95, 0.053), (99, 0.274)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96331215 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

Introduction: Posted by Phil Price: A blogger (can’t find his name anywhere on his blog) points to an article in the medical literature in 1994 that is…well, it’s shocking, is what it is. This is from the abstract: In Tai’s Model, the total area under a curve is computed by dividing the area under the curve between two designated values on the X-axis (abscissas) into small segments (rectangles and triangles) whose areas can be accurately calculated from their respective geometrical formulas. The total sum of these individual areas thus represents the total area under the curve. Validity of the model is established by comparing total areas obtained from this model to these same areas obtained from graphic method (less than +/- 0.4%). Other formulas widely applied by researchers under- or overestimated total area under a metabolic curve by a great margin Yes, that’s right, this guy has rediscovered the trapezoidal rule. You know, that thing most readers of this blog were taught back in 1

2 0.96106124 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

Introduction: Wayne Folta points me to “EigenBracket 2012: Using Graph Theory to Predict NCAA March Madness Basketball” and writes, “I [Folta] have got to believe that he’s simply re-invented a statistical method in a graph-ish context, but don’t know enough to judge.” I have not looked in detail at the method being presented here—I’m not much of college basketball fan—but I’d like to use this as an excuse to make one of my favorite general point, which is that a good way to characterize any statistical method is by what information it uses. The basketball ranking method here uses score differentials between teams in the past season. On the plus side, that is better than simply using one-loss records (which (a) discards score differentials and (b) discards information on who played whom). On the minus side, the method appears to be discretizing the scores (thus throwing away information on the exact score differential) and doesn’t use any external information such as external ratings. A

3 0.95865655 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

Introduction: In politics we’re familiar with the non-apology apology (well described in Wikipedia as “a statement that has the form of an apology but does not express the expected contrition”). Here’s the scientific equivalent: the non-retraction retraction. Sanjay Srivastava points to an amusing yet barfable story of a pair of researchers who (inadvertently, I assume) made a data coding error and were eventually moved to issue a correction notice, but even then refused to fully admit their error. As Srivastava puts it, the story “ended up with Lew [Goldberg] and colleagues [Kibeom Lee and Michael Ashton] publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal.” From the comment on the erratum: In their “erratum and addendum,” Anderson and Ones (this issue) explained that we had brought their attention to the “potential” of a “possible” misalignment and described the results computed from re-aligned data as being based on a “post-ho

4 0.94970691 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

Introduction: I love this stuff : This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach. I hope we can put it into Stan.

5 0.94374847 1300 andrew gelman stats-2012-05-05-Recently in the sister blog

Introduction: Culture war: The rules You can only accept capital punishment if you’re willing to have innocent people executed every now and then The politics of America’s increasing economic inequality

6 0.93501031 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

7 0.93476379 1923 andrew gelman stats-2013-07-03-Bayes pays!

8 0.93344402 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

9 0.93004894 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

10 0.92841458 1422 andrew gelman stats-2012-07-20-Likelihood thresholds and decisions

11 0.92811573 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

12 0.92775428 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

13 0.92703795 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

14 0.92691952 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

15 0.92538786 807 andrew gelman stats-2011-07-17-Macro causality

16 0.92517918 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!

17 0.92418939 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

18 0.92387128 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

19 0.923545 503 andrew gelman stats-2011-01-04-Clarity on my email policy

20 0.92351949 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science