andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1879 knowledge-graph by maker-knowledge-mining

1879 andrew gelman stats-2013-06-01-Benford’s law and addresses


meta infos for this blog

Source: html

Introduction: One example we give to illustrate Benford’s law is the first digits of addresses. Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! P.S. The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but those are familiar problems with R defaults.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One example we give to illustrate Benford’s law is the first digits of addresses. [sent-1, score-1.038]

2 Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! [sent-2, score-0.498]

3 The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but those are familiar problems with R defaults. [sent-5, score-0.842]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('digits', 0.482), ('pena', 0.3), ('benford', 0.283), ('laffs', 0.283), ('bty', 0.27), ('par', 0.247), ('defaults', 0.241), ('square', 0.207), ('box', 0.177), ('illustrate', 0.17), ('familiar', 0.147), ('cool', 0.143), ('law', 0.141), ('shouldn', 0.138), ('graphics', 0.138), ('looked', 0.123), ('prefer', 0.122), ('works', 0.121), ('zero', 0.119), ('first', 0.116), ('survey', 0.105), ('distribution', 0.104), ('problems', 0.082), ('give', 0.079), ('go', 0.068), ('rather', 0.062), ('really', 0.05), ('example', 0.05), ('much', 0.046), ('one', 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1879 andrew gelman stats-2013-06-01-Benford’s law and addresses

Introduction: One example we give to illustrate Benford’s law is the first digits of addresses. Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! P.S. The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but those are familiar problems with R defaults.

2 0.32365879 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud

Introduction: This post is by Phil. I love this post by Jialan Wang. Wang “downloaded quarterly accounting data for all firms in Compustat, the most widely-used dataset in corporate finance that contains data on over 20,000 firms from SEC filings” and looked at the statistical distribution of leading digits in various pieces of financial information. As expected, the distribution is very close to what is predicted by Benford’s Law . Very close, but not identical. But does that mean anything? Benford’s “Law” isn’t really a law, it’s more of a rule or principle: it’s certainly possible for the distribution of leading digits in financial data — even a massive corpus of it — to deviate from the rule without this indicating massive fraud or error. But, aha, Wang also looks at how the deviation from Benford’s Law has changed with time, and looks at it by industry, and this is where things get really interesting and suggestive. I really can’t summarize any better than Wang did, so click on the first

3 0.11353359 1006 andrew gelman stats-2011-11-12-Val’s Number Scroll: Helping kids visualize math

Introduction: This looks cool.

4 0.11095616 379 andrew gelman stats-2010-10-29-Could someone please set this as the new R default in base graphics?

Introduction: par (mar=c(3,3,2,1), mgp=c(2,.7,0), tck=-.01) Thank you.

5 0.10658462 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression

6 0.10566536 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

7 0.080003962 304 andrew gelman stats-2010-09-29-Data visualization marathon

8 0.07508938 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

9 0.073468179 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

10 0.073161259 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

11 0.070388027 96 andrew gelman stats-2010-06-18-Course proposal: Bayesian and advanced likelihood statistical methods for zombies.

12 0.06813103 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

13 0.067480721 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results

14 0.065807626 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

15 0.064580634 153 andrew gelman stats-2010-07-17-Tenure-track position at U. North Carolina in survey methods and social statistics

16 0.062002178 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

17 0.061638184 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

18 0.061166964 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

19 0.060902312 488 andrew gelman stats-2010-12-27-Graph of the year

20 0.057441026 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.071), (1, 0.01), (2, 0.008), (3, 0.019), (4, 0.054), (5, -0.041), (6, 0.002), (7, 0.036), (8, -0.02), (9, -0.03), (10, 0.018), (11, -0.035), (12, 0.003), (13, 0.015), (14, -0.03), (15, -0.037), (16, 0.001), (17, -0.028), (18, 0.036), (19, 0.004), (20, 0.006), (21, -0.026), (22, -0.002), (23, 0.025), (24, -0.004), (25, 0.01), (26, -0.009), (27, 0.039), (28, 0.009), (29, 0.012), (30, 0.001), (31, 0.027), (32, 0.034), (33, -0.008), (34, -0.005), (35, -0.031), (36, -0.023), (37, -0.012), (38, 0.006), (39, -0.018), (40, 0.015), (41, 0.019), (42, 0.003), (43, -0.018), (44, 0.011), (45, 0.038), (46, 0.017), (47, -0.013), (48, 0.033), (49, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95557094 1879 andrew gelman stats-2013-06-01-Benford’s law and addresses

Introduction: One example we give to illustrate Benford’s law is the first digits of addresses. Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! P.S. The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but those are familiar problems with R defaults.

2 0.57536954 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud

Introduction: This post is by Phil. I love this post by Jialan Wang. Wang “downloaded quarterly accounting data for all firms in Compustat, the most widely-used dataset in corporate finance that contains data on over 20,000 firms from SEC filings” and looked at the statistical distribution of leading digits in various pieces of financial information. As expected, the distribution is very close to what is predicted by Benford’s Law . Very close, but not identical. But does that mean anything? Benford’s “Law” isn’t really a law, it’s more of a rule or principle: it’s certainly possible for the distribution of leading digits in financial data — even a massive corpus of it — to deviate from the rule without this indicating massive fraud or error. But, aha, Wang also looks at how the deviation from Benford’s Law has changed with time, and looks at it by industry, and this is where things get really interesting and suggestive. I really can’t summarize any better than Wang did, so click on the first

3 0.57050675 1851 andrew gelman stats-2013-05-11-Actually, I have no problem with this graph

Introduction: Tom Salvesen asks, is this the worst info-graphic of the year? I say, no. Nobody really cares about these numbers. It’s an amusing feature. The alternative would not be a better display of these data, the alternative would be some photo or cartoon. They’re just having fun. I wouldn’t give it any design awards but it’s fine, it is what it is.

4 0.56433696 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

Introduction: Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe , who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand. And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics. I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004 )–but I am concerned that our dialogue with the graphic

5 0.56280154 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization

Introduction: Alberto Cairo tells a fascinating story about John Snow, H. W. Acland, and the Mythmaking Problem: Every human community—nations, ethnic and cultural groups, professional guilds—inevitably raises a few of its members to the status of heroes and weaves myths around them. . . . The visual display of information is no stranger to heroes and myth. In fact, being a set of disciplines with a relatively small amount of practitioners and researchers, it has generated a staggering number of heroes, perhaps as a morale-enhancing mechanism. Most of us have heard of the wonders of William Playfair’s Commercial and Political Atlas, Florence Nightingale’s coxcomb charts, Charles Joseph Minard’s Napoleon’s march diagram, and Henry Beck’s 1933 redesign of the London Underground map. . . . Cairo’s goal, I think, is not to disparage these great pioneers of graphics but rather to put their work in perspective, recognizing the work of their excellent contemporaries. I would like to echo Cairo’

6 0.55930841 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

7 0.55464959 1661 andrew gelman stats-2013-01-08-Software is as software does

8 0.55063975 2038 andrew gelman stats-2013-09-25-Great graphs of names

9 0.5505901 266 andrew gelman stats-2010-09-09-The future of R

10 0.54985052 794 andrew gelman stats-2011-07-09-The quest for the holy graph

11 0.54681152 738 andrew gelman stats-2011-05-30-Works well versus well understood

12 0.54371619 319 andrew gelman stats-2010-10-04-“Who owns Congress”

13 0.54021907 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

14 0.53338772 2319 andrew gelman stats-2014-05-05-Can we make better graphs of global temperature history?

15 0.5329383 61 andrew gelman stats-2010-05-31-A data visualization manifesto

16 0.53151006 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

17 0.52988845 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

18 0.52941668 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

19 0.52844656 304 andrew gelman stats-2010-09-29-Data visualization marathon

20 0.52527761 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.058), (21, 0.038), (22, 0.023), (24, 0.149), (26, 0.153), (44, 0.137), (73, 0.046), (86, 0.028), (90, 0.045), (99, 0.163)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.8855204 1879 andrew gelman stats-2013-06-01-Benford’s law and addresses

Introduction: One example we give to illustrate Benford’s law is the first digits of addresses. Javier Marquez Pena had a survey and, just for laffs, he looked the distribution of first digits: Cool—it really works! P.S. The y-axis shouldn’t go below zero, and I’d much prefer an L-type graphics box (par(bty=”l”)) rather than the square, but those are familiar problems with R defaults.

2 0.82570839 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

Introduction: We’re happy to announce the availability of Stan and RStan versions 1.1.0, which are general tools for performing model-based Bayesian inference using the no-U-turn sampler, an adaptive form of Hamiltonian Monte Carlo. Information on downloading and installing and using them is available as always from Stan Home Page: http://mc-stan.org/ Let us know if you have any problems on the mailing lists or at the e-mails linked on the home page (please don’t use this web page). The full release notes follow. (R)Stan Version 1.1.0 Release Notes =================================== -- Backward Compatibility Issue * Categorical distribution recoded to match documentation; it now has support {1,...,K} rather than {0,...,K-1}. * (RStan) change default value of permuted flag from FALSE to TRUE for Stan fit S4 extract() method -- New Features * Conditional (if-then-else) statements * While statements -- New Functions * generalized multiply_lower_tri

3 0.81843758 559 andrew gelman stats-2011-02-06-Bidding for the kickoff

Introduction: Steven Brams and James Jorash propose a system for reducing the advantage that comes from winning the coin flip in overtime: Dispensing with a coin toss, the teams would bid on where the ball is kicked from by the kicking team. In the NFL, it’s now the 30-yard line. Under Brams and Jorasch’s rule, the kicking team would be the team that bids the lower number, because it is willing to put itself at a disadvantage by kicking from farther back. However, it would not kick from the number it bids, but from the average of the two bids. To illustrate, assume team A bids to kick from the 38-yard line, while team B bids its 32-yard line. Team B would win the bidding and, therefore, be designated as the kick-off team. But B wouldn’t kick from 32, but instead from the average of 38 and 32–its 35-yard line. This is better for B by 3 yards than the 32-yard line that it proposed, because it’s closer to the end zone it is kicking towards. It’s also better for A by 3 yards to have B kick fr

4 0.80944604 693 andrew gelman stats-2011-05-04-Don’t any statisticians work for the IRS?

Introduction: A friend asks the above question and writes: This article left me thinking – how could the IRS not notice that this guy didn’t file taxes for several years? Don’t they run checks and notice if you miss a year? If I write a check our of order, there’s an asterisk next to the check number in my next bank statement showing that there was a gap in the sequence. If you ran the IRS, wouldn’t you do this: SSNs are issued sequentially. Once a SSN reaches 18, expect it to file a return. If it doesn’t, mail out a postage paid letter asking why not with check boxes such as Student, Unemployed, etc. Follow up at reasonable intervals. Eventually every SSN should be filing a return, or have an international address. Yes this is intrusive, but my goal is only to maximize tax revenue. Surely people who do this for a living could come up with something more elegant. My response: I dunno, maybe some confidentiality rules? The other thing is that I’m guessing that IRS gets lots of pushback w

5 0.80423367 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0

Introduction: The Stan Development Team is happy to announce CmdStan, RStan, and PyStan v2.2.0. As usual, more info is available on the Stan Home Page . This is a minor release with a mix of bug fixes and features. For a full list of changes, please see the v2.2.0 milestone on stan-dev/stan’s issue tracker. Some of the bug fixes and issues are listed below. Bug Fixes increment_log_prob is now vectorized and compiles with vector arguments multinomial random number generator used the wrong size for the return value fixed memory leaks in auto-diff implementation variables can start with the prefix ‘inf’ fixed parameter output order for arrays when using optimization RStan compatibility issue with latest Rcpp 0.11.0 Features suppress command line output with refresh <= 0 added 1 to treedepth to match usual definition of treedepth added distance, squared_distance, diag_pre_multiply, diag_pre_multiply to Stan modeling lnaguage added a ‘fixed_param’ sampler for

6 0.78544414 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

7 0.78242451 141 andrew gelman stats-2010-07-12-Dispute over counts of child deaths in Iraq due to sanctions

8 0.78008938 864 andrew gelman stats-2011-08-21-Going viral — not!

9 0.77147669 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!

10 0.74663574 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

11 0.74287623 748 andrew gelman stats-2011-06-06-Why your Klout score is meaningless

12 0.74194175 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

13 0.73676103 111 andrew gelman stats-2010-06-26-Tough love as a style of writing

14 0.73632002 2017 andrew gelman stats-2013-09-11-“Informative g-Priors for Logistic Regression”

15 0.73480546 444 andrew gelman stats-2010-12-02-Rational addiction

16 0.733679 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud

17 0.73274249 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

18 0.72720194 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

19 0.72684908 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

20 0.72592735 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming