andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-923 knowledge-graph by maker-knowledge-mining

923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?


meta infos for this blog

Source: html

Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. [sent-1, score-0.335]

2 Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. [sent-3, score-0.511]

3 People who are known unequivocally to have condition X show Y abnormalities in these tests. [sent-7, score-0.542]

4 Therefore, when people suspected of having condition X (or something kind of like it) are found to have Y abnormalities (similar or milder), the test is said to support the diagnosis of condition X (or something kind of like it). [sent-8, score-0.779]

5 The problem is that there is no true sensitivity and specificity information because of the lack of a gold standard for many diseases, and in part because condition X and abnormality Y are actually broad categories, rather than specific diseases or states. [sent-9, score-0.821]

6 The findings observed in one situation are extrapolated to broadly similar situations. [sent-10, score-0.207]

7 People who are known to have suffered partial nerve damage, trauma, polio, etc. [sent-13, score-0.326]

8 develop large sized electrical signals in muscles that are connected to the nerves. [sent-14, score-0.292]

9 So, when we see similar large signals in other people, we conclude that they too have suffered nerve injury, even if it is not due to trauma, or polio, or any other the known circumstances in which large signals were originally seen. [sent-15, score-1.02]

10 If I find a result that is greater than 2SD away from the reference population mean, classical statistics suggests I should think that there is only a 5% chance of finding this result (or worse) in a normal person. [sent-17, score-0.377]

11 From this, I am trained to conclude that this person is abnormal (95% probability). [sent-18, score-0.63]

12 Similarly, if I find 3 abnormalities in a sample of 20 from a patient that exceed a certain value, I am to conclude that this person is abnormal (how probable? [sent-19, score-0.942]

13 Isn’t this way of thinking akin to interpreting a really, really low p value as stronger evidence than just a low p value? [sent-23, score-0.449]

14 That is, confusing the probability of finding the evidence with the strength of the evidence? [sent-24, score-0.133]

15 With the second type of reference range, there is no such guidance. [sent-26, score-0.18]

16 For example, the upper limit of a value (3rd highest) for a test was set at 55 but the raw data show some normal subjects had values well over 100 and in one case, over 400, just not more than 2 over 55. [sent-27, score-0.827]

17 So, what does it mean if I see 5 out of 20 values over 55 compared with just 3 out of 20? [sent-28, score-0.267]

18 Am I to adjust the number of abnormal values needed to conclude abnormality to 4 (based on upper limit is 2/20 = 10%, so 10% of 30 is 3)? [sent-30, score-1.222]

19 Is a person with 3/20 values of 126, 89, and 279 more abnormal than someone with 3 values of 58, 63, and 74? [sent-31, score-1.002]

20 My reply: I’d prefer to either use the continuous variable as is, or else create a transformed scale using some clinical understanding. [sent-37, score-0.087]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('abnormal', 0.393), ('values', 0.267), ('abnormality', 0.236), ('abnormalities', 0.203), ('value', 0.183), ('reference', 0.18), ('condition', 0.18), ('range', 0.168), ('signals', 0.166), ('conclude', 0.162), ('polio', 0.157), ('trauma', 0.143), ('nerve', 0.135), ('normal', 0.133), ('specificity', 0.129), ('samples', 0.111), ('diseases', 0.111), ('exceed', 0.109), ('suffered', 0.104), ('sensitivity', 0.099), ('healthy', 0.097), ('limits', 0.092), ('obtained', 0.089), ('variable', 0.087), ('known', 0.087), ('limit', 0.082), ('upper', 0.082), ('stronger', 0.081), ('test', 0.08), ('highest', 0.079), ('person', 0.075), ('similar', 0.072), ('diseased', 0.072), ('geoffrey', 0.072), ('unequivocally', 0.072), ('evidence', 0.069), ('trouble', 0.068), ('bayesianly', 0.068), ('diagnosis', 0.068), ('extrapolated', 0.068), ('suspected', 0.068), ('observed', 0.067), ('true', 0.066), ('purportedly', 0.065), ('large', 0.064), ('finding', 0.064), ('setting', 0.063), ('sized', 0.062), ('supposedly', 0.06), ('low', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected

2 0.13800712 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

3 0.13339524 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

Introduction: Dan Lakeland writes: I have some questions about some basic statistical ideas and would like your opinion on them: 1) Parameters that manifestly DON’T exist: It makes good sense to me to think about Bayesian statistics as narrowing in on the value of parameters based on a model and some data. But there are cases where “the parameter” simply doesn’t make sense as an actual thing. Yet, it’s not really a complete fiction, like unicorns either, it’s some kind of “effective” thing maybe. Here’s an example of what I mean. I did a simple toy experiment where we dropped crumpled up balls of paper and timed their fall times. (see here: http://models.street-artists.org/?s=falling+ball ) It was pretty instructive actually, and I did it to figure out how to in a practical way use an ODE to get a likelihood in MCMC procedures. One of the parameters in the model is the radius of the spherical ball of paper. But the ball of paper isn’t a sphere, not even approximately. There’s no single valu

4 0.11530397 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

Introduction: From my new article in the journal Epidemiology: Sander Greenland and Charles Poole accept that P values are here to stay but recognize that some of their most common interpretations have problems. The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). A Bayesian interpretation based on a spike-and-slab model makes little sense in applied contexts in epidemiology, political science, and other fields in which true effects are typically nonzero and bounded (thus violating both the “spike” and the “slab” parts of the model). I find Greenland and Poole’s perspective t

5 0.10672116 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.

6 0.10165884 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

7 0.09555915 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

8 0.095429897 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

9 0.094054148 1605 andrew gelman stats-2012-12-04-Write This Book

10 0.091360323 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

11 0.088520341 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

12 0.086700186 1941 andrew gelman stats-2013-07-16-Priors

13 0.086685993 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

14 0.086467206 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

15 0.085586712 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

16 0.083360434 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

17 0.083265886 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

18 0.083137244 1691 andrew gelman stats-2013-01-25-Extreem p-values!

19 0.082999572 2258 andrew gelman stats-2014-03-21-Random matrices in the news

20 0.081295848 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.164), (1, 0.049), (2, 0.065), (3, -0.058), (4, 0.022), (5, -0.027), (6, 0.047), (7, 0.016), (8, 0.007), (9, -0.023), (10, -0.054), (11, -0.01), (12, 0.006), (13, -0.048), (14, -0.007), (15, 0.003), (16, 0.018), (17, -0.014), (18, 0.026), (19, -0.047), (20, 0.025), (21, 0.015), (22, 0.001), (23, -0.019), (24, 0.03), (25, 0.017), (26, 0.004), (27, -0.014), (28, -0.007), (29, 0.033), (30, 0.042), (31, 0.029), (32, 0.027), (33, 0.05), (34, -0.007), (35, -0.016), (36, 0.045), (37, 0.039), (38, -0.005), (39, 0.02), (40, -0.001), (41, -0.07), (42, 0.001), (43, 0.007), (44, -0.022), (45, -0.022), (46, 0.03), (47, 0.056), (48, 0.03), (49, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96908504 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected

2 0.80351084 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?

Introduction: This post was written by Phil. A medical company is testing a cancer drug. They get a 16 genetically identical (or nearly identical) rats that all have the same kind of tumor, give 8 of them the drug and leave 8 untreated…or maybe they give them a placebo, I don’t know; is there a placebo effect in rats?. Anyway, after a while the rats are killed and examined. If the tumors in the treated rats are smaller than the tumors in the untreated rats, then all of the rats have their blood tested for dozens of different proteins that are known to be associated with tumor growth or suppression. If there is a “significant” difference in one of the protein levels, then the working assumption is that the drug increases or decreases levels of that protein and that may be the mechanism by which the drug affects cancer. All of the above is done on many different cancer types and possibly several different types of rats. It’s just the initial screening: if things look promising, many more tests an

3 0.73617387 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

Introduction: Aureliano Crameri writes: I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts. I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is

4 0.73193347 1918 andrew gelman stats-2013-06-29-Going negative

Introduction: Troels Ring writes: I have measured total phosphorus, TP, on a number of dialysis patients, and also measured conventional phosphate, Pi. Now P is exchanged with the environment as Pi, so in principle a correlation between TP and Pi could perhaps be expected. I’m really most interested in the fraction of TP which is not Pi, that is TP-Pi. I would also expect that to be positively correlated with Pi. However, looking at the data using a mixed model an insignificant negative correlation is obtained. Then I thought, that since TP-Pi is bound to be small if Pi is large a negative correlation is almost dictated by the math even if the biology would have it otherwise in so far as the the TP-Pi, likely organic P, must someday have been Pi. Hence I thought about correcting the slight negative correlation between TP-Pi and Pi for the expected large negative correlation due to the math – to eventually recover what I came from: a positive correlation. People seems to agree that this thinki

5 0.72687626 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

Introduction: Yesterday we had a spirited discussion of the following conditional probability puzzle: “I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?” This reminded me of the principle, familiar from statistics instruction and the cognitive psychology literature, that the best way to teach these sorts of examples is through integers rather than fractions. For example, consider this classic problem: “10% of persons have disease X. You are tested for the disease and test positive, and the test has 80% accuracy. What is the probability that you have the disease?” This can be solved directly using conditional probability but it appears to be clearer to do it using integers: Start with 100 people. 10 will have the disease and 90 will not. Of the 10 with the disease, 8 will test positive and 2 will test negative. Of the 90 without the disease, 18 will test positive and 72% will test negative. (72% = 0.8*90.) So, out of the origin

6 0.72445339 53 andrew gelman stats-2010-05-26-Tumors, on the left, or on the right?

7 0.72037017 314 andrew gelman stats-2010-10-03-Disconnect between drug and medical device approval

8 0.71980906 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

9 0.70178044 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling

10 0.69104421 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

11 0.68224627 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

12 0.67720568 1881 andrew gelman stats-2013-06-03-Boot

13 0.67686039 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

14 0.67590874 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

15 0.67497432 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities

16 0.6735152 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

17 0.67147005 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

18 0.6701194 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

19 0.66756999 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

20 0.66631252 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.018), (16, 0.071), (17, 0.011), (21, 0.029), (24, 0.133), (47, 0.011), (63, 0.017), (65, 0.03), (69, 0.181), (73, 0.011), (84, 0.015), (86, 0.036), (89, 0.027), (97, 0.016), (99, 0.294)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95323336 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts

Introduction: Thomas Ferguson and Robert Johnson write : Financial crises are staggeringly costly. Only major wars rival them in the burdens they place on public finances. Taxpayers typically transfer enormous resources to banks, their stockholders, and creditors, while public debt explodes and the economy runs below full employment for years. This paper compares how relatively large, developed countries have handled bailouts over time. It analyzes why some have done better than others at containing costs and protecting taxpayers. The paper argues that political variables – the nature of competition within party systems and voting turnout – help explain why some countries do more than others to limit the moral hazards of bailouts. I know next to nothing about this topic, so I’ll just recommend you click through and read the article yourself. Here’s a bit more: Many recent papers have analyzed financial crises using large data bases filled with cases from all over the world. Our [Ferguson

2 0.9494797 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots

Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th

same-blog 3 0.938658 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected

4 0.93619508 158 andrew gelman stats-2010-07-22-Tenants and landlords

Introduction: Matthew Yglesias and Megan McArdle argue about the economics of landlord/tenant laws in D.C., a topic I know nothing about. But it did remind me of a few stories . . . 1. In grad school, I shared half of a two-family house with three other students. At some point, our landlord (who lived in the other half of the house) decided he wanted to sell the place, so he had a real estate agent coming by occasionally to show the house to people. She was just a flat-out liar (which I guess fits my impression based on screenings of Glengarry Glen Ross). I could never decide, when I was around and she was lying to a prospective buyer, whether to call her on it. Sometimes I did, sometimes I didn’t. 2. A year after I graduated, the landlord actually did sell the place but then, when my friends moved out, he refused to pay back their security deposit. There was some debate about getting the place repainted, I don’t remember the details. So they sued the landlord in Mass. housing court

5 0.92394406 265 andrew gelman stats-2010-09-09-Removing the blindfold: visualising statistical models

Introduction: Hadley Wickham’s talk for Monday 13 Sept at noon in the statistics dept: As the volume of data increases, so to does the complexity of our models. Visualisation is a powerful tool for both understanding how models work, and what they say about a particularly dataset. There are very many well-known techniques for visualising data, but far fewer for visualising models. In this talk I [Wichkam] will discuss three broad strategies for model visualisation: display the model in the data space; look all members of a collection; and explore the process of model fitting, not just the end result. I will demonstrate these techniques with two examples: neural networks, and ensembles of linear models. Hey–this is one of my favorite topics!

6 0.92238253 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

7 0.91977072 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?

8 0.91768581 656 andrew gelman stats-2011-04-11-Jonathan Chait and I agree about the importance of the fundamentals in determining presidential elections

9 0.91093969 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

10 0.91000611 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

11 0.90846431 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

12 0.90519077 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

13 0.90427434 856 andrew gelman stats-2011-08-16-Our new improved blog! Thanks to Cord Blomquist

14 0.90226686 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

15 0.89968514 198 andrew gelman stats-2010-08-11-Multilevel modeling in R on a Mac

16 0.89911413 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

17 0.89489591 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

18 0.89096189 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

19 0.88885534 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

20 0.88465071 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?