andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-939 knowledge-graph by maker-knowledge-mining

939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

meta infos for this blog

Source: html

Introduction: This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, 181.4785 and 210.823? Don’t do it! Objective is to fit about 3-7 tics at the optimal level of rounding. I use the following sequence: decimal rounding : fitting integer power and single-digit decimal i , rounding to i * 10^ power (example: 100 200 300) binary having power , fitting single-digit decimal i and binary b , rounding to 2* i /(1+ b ) * 10^ power (150 200 250) (optional) quaternary having power , fitting single-digit decimal i and quaternary q (0,1,2,3) round to 4* i /(1+ q ) * 10^ power (150 175 200) quinary having power , fitting single-digit decimal i and quinary f (0,1,2,3,4) round to 5* i /(1+ f ) * 10^ power (160 180 200)

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is a mini research note, not deserving of a paper, but perhaps useful to others. [sent-1, score-0.191]

2 It reinvents what has already appeared on this blog. [sent-2, score-0.039]

3 Let’s say we have a line chart with numbers between 152. [sent-3, score-0.212]

4 Objective is to fit about 3-7 tics at the optimal level of rounding. [sent-13, score-0.206]

5 Rounding can be adapted to ensure sufficient spacing between labels. [sent-15, score-0.246]

6 This rounding reduces the cognitive cost of interpretation and memorization of a chart, along with the linguistic cost of communication of findings. [sent-16, score-0.962]

7 Another application of rounding is communication of measurement tolerance or prediction error. [sent-17, score-0.88]

8 3434 mm, I’m indicating that the measurement was very precise. [sent-19, score-0.169]

9 But if I’m not so accurate, telling you that my measurement was 50mm indicates binary rounding, with the truth being somewhere between 25-75mm. [sent-20, score-0.702]

10 Telling you it was 75mm indicates quaternary rounding with the truth being somewhere between 60 and 90. [sent-21, score-1.259]

11 If I told you it was 80, you’d know the truth is somewhere between 70 and 90. [sent-22, score-0.301]

12 If I told you it was 85, well, then the ’5′ is subject to binary, quaternary or quinary rounding at the last digit. [sent-23, score-1.247]

13 If the plot is nonlinear, one can use exponential rounding to 10^ i (10 100 1000). [sent-24, score-0.646]

14 [Edit 10/3/2011] Added a link kindly provided by Brian Diggs. [sent-25, score-0.108]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('rounding', 0.584), ('quaternary', 0.343), ('decimal', 0.314), ('power', 0.259), ('quinary', 0.257), ('binary', 0.175), ('chart', 0.158), ('tics', 0.156), ('fitting', 0.146), ('truth', 0.119), ('somewhere', 0.119), ('measurement', 0.116), ('round', 0.108), ('indicates', 0.094), ('communication', 0.08), ('telling', 0.079), ('integer', 0.078), ('mini', 0.078), ('mm', 0.078), ('optional', 0.074), ('deserving', 0.074), ('spacing', 0.071), ('adapted', 0.071), ('cost', 0.069), ('kindly', 0.068), ('told', 0.063), ('linguistic', 0.062), ('edit', 0.062), ('exponential', 0.062), ('tolerance', 0.062), ('width', 0.058), ('ensure', 0.057), ('sequence', 0.055), ('reduces', 0.055), ('numbers', 0.054), ('brian', 0.053), ('indicating', 0.053), ('nonlinear', 0.051), ('optimal', 0.05), ('label', 0.049), ('sufficient', 0.047), ('objective', 0.045), ('cognitive', 0.043), ('act', 0.041), ('accurate', 0.04), ('provided', 0.04), ('reference', 0.039), ('perhaps', 0.039), ('appeared', 0.039), ('application', 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

2 0.12762345 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression

3 0.12638265 948 andrew gelman stats-2011-10-10-Combining data from many sources

Introduction: Mark Grote writes: I’d like to request general feedback and references for a problem of combining disparate data sources in a regression model. We’d like to model log crop yield as a function of environmental predictors, but the observations come from many data sources and are peculiarly structured. Among the issues are: 1. Measurement precision in predictors and outcome varies widely with data sources. Some observations are in very coarse units of measurement, due to rounding or even observer guesswork. 2. There are obvious clusters of observations arising from studies in which crop yields were monitored over successive years in spatially proximate communities. Thus some variables may be constant within clusters–this is true even for log yield, probably due to rounding of similar yields. 3. Cluster size and intra-cluster association structure (temporal, spatial or both) vary widely across the dataset. My [Grote's] intuition is that we can learn about central tendency

4 0.091682151 135 andrew gelman stats-2010-07-09-Rasmussen sez: “108% of Respondents Say . . .”

Introduction: The recent discussion of pollsters reminded me of a story from a couple years ago that perhaps is still relevant . . . I was looking up the governors’ popularity numbers on the web, and came across this page from Rasmussen Reports which shows Sarah Palin as the 3rd-most-popular governor. But then I looked more carefully. Janet Napolitano of Arizona was viewed as Excellent by 28% of respondents, Good by 27%, Fair by 26%, and Poor by 27%. That adds up to 108%! What’s going on? I’d think they would have a computer program to pipe the survey results directly into the spreadsheet. But I guess not, someone must be typing in these numbers one at a time. Another possibility is that they are altering their numbers by hand, and someone made a mistake with the Napolitano numbers, adding a few percent in one place and forgetting to subtract elsewhere. Or maybe there’s another explanation? P.S. Here are some thoughts from Mark Blumenthal P.P.S. I checked the Rasmussen link toda

5 0.083273701 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

Introduction: Brendan Nyhan sends me this article from the research-methods all-star team of Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek , Jonathan Flint, Emma Robinson, and Marcus Munafo: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles. I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads

6 0.077094406 986 andrew gelman stats-2011-11-01-MacKay update: where 12 comes from

7 0.06957005 137 andrew gelman stats-2010-07-10-Cost of communicating numbers

8 0.069212139 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

9 0.068629593 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

10 0.06633202 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

11 0.064550191 2036 andrew gelman stats-2013-09-24-“Instead of the intended message that being poor is hard, the takeaway is that rich people aren’t very good with money.”

12 0.060236577 1833 andrew gelman stats-2013-04-30-“Tragedy of the science-communication commons”

13 0.058093436 1949 andrew gelman stats-2013-07-21-Defensive political science responds defensively to an attack on social science

14 0.057895385 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

15 0.057340588 116 andrew gelman stats-2010-06-29-How to grab power in a democracy – in 5 easy non-violent steps

16 0.05688341 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

17 0.055391204 721 andrew gelman stats-2011-05-20-Non-statistical thinking in the US foreign policy establishment

18 0.054826465 328 andrew gelman stats-2010-10-08-Displaying a fitted multilevel model

19 0.054584216 232 andrew gelman stats-2010-08-25-Dodging the diplomats

20 0.053080834 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.059), (1, 0.007), (2, 0.007), (3, -0.006), (4, 0.032), (5, -0.017), (6, -0.003), (7, -0.008), (8, 0.012), (9, 0.018), (10, -0.004), (11, -0.005), (12, -0.014), (13, -0.007), (14, -0.015), (15, 0.007), (16, 0.002), (17, -0.004), (18, 0.004), (19, -0.009), (20, 0.007), (21, -0.009), (22, 0.014), (23, -0.015), (24, -0.005), (25, 0.011), (26, -0.02), (27, 0.007), (28, 0.007), (29, -0.024), (30, 0.006), (31, 0.043), (32, 0.003), (33, -0.022), (34, 0.014), (35, -0.01), (36, -0.005), (37, -0.009), (38, -0.006), (39, -0.016), (40, 0.035), (41, 0.008), (42, 0.008), (43, 0.019), (44, -0.004), (45, -0.023), (46, -0.015), (47, 0.014), (48, -0.003), (49, -0.003)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96190643 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

2 0.62008017 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

Introduction: Jay Jones writes: I recently came across your paper on average predictive comparisons ( Gelman and Pardoe, 2007 ) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D; center near Seattle). At the moment, I am using APC’s to help describe the results of a hierarchical multi-species model we fit to bird occupancy (presence/absence) data collected in the Oregon Coast Range. A question that came up in our study led me to consider whether the APC framework can be used for post-hoc combinations of inputs. For example, let’s say that after calculating the APC for each individual input in our model, we would like to look at some linear function f of two inputs of interest, u1 and u2. Naively, I would like to be able to plug this into the APC framework. For example, equation 5 in your paper might look something like this (for brevity, I’m omitting the summations): Numerator: w_ij * (E(y|u1_j, u2_j, v_i, the

3 0.61356169 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

Introduction: Jeff asked me what I thought of this recent AJPS article by Brian Greenhill, Michael Ward, and Audrey Sacks, “The Separation Plot: A New Visual Method for Evaluating the Fit of Binary Models.” It’s similar to a graph of observed vs. predicted values, but using color rather than the y-axis to display the observed values. It seems like it could be useful, also could be applied more generally to discrete-data regressions with more than two categories. When it comes to checking the model fit, I recommend binned residual plots, as discussed in this 2000 article with Yuri Goegebeur, Francis Tuerlinckx, and Iven Van Mechelen.

4 0.61208361 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

Introduction: John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write: 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . . In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . . The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes. They also find that, if you start with the prediction based on parents’ heigh

5 0.61132634 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.

Introduction: AT discusses [link broken; see P.P.S. below] a new paper of his that casts doubt on the robustness of the controversial Christakis and Fowler papers. AT writes that he ran some simulations of contagion on social networks and found that (a) in a simple model assuming the contagion of the sort hypothesized by Christakis and Fowler, their procedure would indeed give the sorts of estimates they found in their papers, but (b) in another simple model assuming a different sort of contagion, the C&F; estimation would give indistinguishable estimates. Thus, if you believe AT’s simulation model, C&F;’s procedure cannot statistically distinguish between two sorts of contagion (directional and simultaneous). I have not looked at AT’s paper so I can’t fully comment, but I don’t fully understand his method for simulating network connections. AT uses what he calls a “rewiring” model. This makes sense: as time progresses, we make new friends and lose old ones—but I am confused by the details

6 0.60091436 2258 andrew gelman stats-2014-03-21-Random matrices in the news

7 0.59763098 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

8 0.59477735 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

9 0.59233367 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

10 0.59197265 2156 andrew gelman stats-2014-01-01-“Though They May Be Unaware, Newlyweds Implicitly Know Whether Their Marriage Will Be Satisfying”

11 0.59175801 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

12 0.5916431 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics

13 0.59050471 706 andrew gelman stats-2011-05-11-The happiness gene: My bottom line (for now)

14 0.58999252 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

15 0.58926111 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

16 0.58706254 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath

17 0.58597952 507 andrew gelman stats-2011-01-07-Small world: MIT, asymptotic behavior of differential-difference equations, Susan Assmann, subgroup analysis, multilevel modeling

18 0.5854218 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?

19 0.58442545 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

20 0.58437604 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.012), (4, 0.019), (5, 0.056), (15, 0.048), (21, 0.045), (24, 0.038), (30, 0.024), (56, 0.024), (59, 0.025), (68, 0.011), (79, 0.325), (99, 0.227)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89155471 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

2 0.83520883 1538 andrew gelman stats-2012-10-17-Rust

Introduction: I happened to be referring to the path sampling paper today and took a look at Appendix A.2: I’m sure I could reconstruct all of this if I had to, but I certainly can’t read this sort of thing cold anymore.

3 0.8087734 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.

4 0.80574334 1515 andrew gelman stats-2012-09-29-Jost Haidt

Introduction: Research psychologist John Jost reviews the recent book, “The Righteous Mind,” by research psychologist Jonathan Haidt. Some of my thoughts on Haidt’s book are here . And here’s some of Jost’s review: Haidt’s book is creative, interesting, and provocative. . . . The book shines a new light on moral psychology and presents a bold, confrontational message. From a scientific perspective, however, I worry that his theory raises more questions than it answers. Why do some individuals feel that it is morally good (or necessary) to obey authority, favor the ingroup, and maintain purity, whereas others are skeptical? (Perhaps parenting style is relevant after all.) Why do some people think that it is morally acceptable to judge or even mistreat others such as gay or lesbian couples or, only a generation ago, interracial couples because they dislike or feel disgusted by them, whereas others do not? Why does the present generation “care about violence toward many more classes of victims

5 0.79908371 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes

Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r

6 0.78633428 1126 andrew gelman stats-2012-01-18-Bob on Stan

7 0.74389869 1048 andrew gelman stats-2011-12-09-Maze generation algorithms!

8 0.74099165 863 andrew gelman stats-2011-08-21-Bad graph

9 0.72861266 2139 andrew gelman stats-2013-12-19-Happy birthday

10 0.72465062 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

11 0.7093423 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

12 0.7075938 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

13 0.67336005 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

14 0.67112237 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

15 0.64134276 1229 andrew gelman stats-2012-03-25-Same old story

16 0.62765729 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

17 0.61875701 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

18 0.61533368 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

19 0.59951711 1880 andrew gelman stats-2013-06-02-Flame bait

20 0.59617567 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis