andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2178 knowledge-graph by maker-knowledge-mining

2178 andrew gelman stats-2014-01-20-Mailing List Degree-of-Difficulty Difficulty

meta infos for this blog

Source: html

Introduction: The Difficulty with Difficult Questions Andrew’s commented during our Stan meetings that he’s observed that when a user sends an easy question to a mailing list, it gets answered right away, whereas difficult questions often languish with no answers. These difficult questions usually come from power users with real issues, whereas the simple questions are often ill-formulated or already answered in the top-level doc. So we’re arguably devoting our energy to the wrong users by adopting this strategy. Of course, this is related to Andrew’s suggestion that this whole blog be called “tl;dr” (i.e., too long, didn’t read). An Example On the Stan Users Group , we often get very complex models with a simple accompanying question such as “How can I make my model faster or mix better?” An example is this recent query , which involves a difficult multivariate multilevel model. Such questions require a lot of work on our part to answer. Model fitting is hard and often very pr

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The Difficulty with Difficult Questions Andrew’s commented during our Stan meetings that he’s observed that when a user sends an easy question to a mailing list, it gets answered right away, whereas difficult questions often languish with no answers. [sent-1, score-1.563]

2 These difficult questions usually come from power users with real issues, whereas the simple questions are often ill-formulated or already answered in the top-level doc. [sent-2, score-1.262]

3 So we’re arguably devoting our energy to the wrong users by adopting this strategy. [sent-3, score-0.446]

4 Of course, this is related to Andrew’s suggestion that this whole blog be called “tl;dr” (i. [sent-4, score-0.081]

5 An Example On the Stan Users Group , we often get very complex models with a simple accompanying question such as “How can I make my model faster or mix better? [sent-7, score-0.386]

6 ” An example is this recent query , which involves a difficult multivariate multilevel model. [sent-8, score-0.242]

7 Such questions require a lot of work on our part to answer. [sent-9, score-0.215]

8 Model fitting is hard and often very problem specific. [sent-10, score-0.339]

9 And it varies by platform — the tweaks you need to do for BUGS/JAGS are Gibbs sampling specific whereas those required for Stan are HMC specific. [sent-11, score-0.295]

10 Mitigating the Problem The degree-of-difficulty difficulty can be mitigated somewhat by breaking questions down into simpler, digestible bits. [sent-12, score-0.914]

11 Everyone likes a short question they can understand and answer. [sent-13, score-0.176]

12 But breaking a problem down is often impossible for a user —- if a user could isolate the problem, they could probably solve it. [sent-15, score-0.998]

13 I often find myself struggling to express a problem to mailing lists such as the Boost Spirit or Rcpp list in terms other than “I tried this and didn’t work. [sent-16, score-0.756]

14 Sometimes the mere act of breaking a question down into digestible bits leads me to the answer and I can spare the mailing list. [sent-18, score-1.248]

15 This is closely related to the “ rubber ducky ” phenomenon in debugging, namely that the mere act of explaining a problem clearly often leads to a solution. [sent-19, score-1.379]

16 Maybe Andrew can come up with a better name for this phenomenon than “degree-of-diffuclty difficulty” and drop it into his Handy Statistical Lexicon for posterity. [sent-21, score-0.128]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mailing', 0.258), ('digestible', 0.235), ('questions', 0.215), ('ducky', 0.214), ('breaking', 0.21), ('often', 0.199), ('user', 0.183), ('rubber', 0.181), ('users', 0.173), ('lexicon', 0.172), ('difficult', 0.159), ('answered', 0.157), ('difficulty', 0.153), ('whereas', 0.144), ('mere', 0.143), ('problem', 0.14), ('stan', 0.14), ('andrew', 0.138), ('phenomenon', 0.128), ('act', 0.112), ('leads', 0.108), ('tl', 0.107), ('question', 0.101), ('mitigating', 0.101), ('adopting', 0.101), ('mitigated', 0.101), ('devoting', 0.101), ('rcpp', 0.101), ('dr', 0.097), ('debugging', 0.097), ('everyone', 0.086), ('accompanying', 0.086), ('list', 0.084), ('handy', 0.083), ('isolate', 0.083), ('query', 0.083), ('spare', 0.081), ('related', 0.081), ('platform', 0.078), ('boost', 0.078), ('worthy', 0.077), ('meetings', 0.076), ('hmc', 0.076), ('struggling', 0.075), ('likes', 0.075), ('namely', 0.073), ('varies', 0.073), ('gibbs', 0.072), ('commented', 0.071), ('arguably', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 2178 andrew gelman stats-2014-01-20-Mailing List Degree-of-Difficulty Difficulty

2 0.13997747 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

Introduction: Stan is alive and well. We’re up to 523 people on the users list . [We're sure there are many more than 523 actual users, since it's easy to download and use Stan directly without joining the list.] We’re working on a v2.1.0 release now and we hope to release it in within the next couple of weeks.

3 0.13934845 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

Introduction: 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estimating individual out- comes such as death. (b) In their report, Burnham et al. did not identify their primary sampling units. (c) The second-stage sampling was not a probability sample. (d) Survey materials supplied by the authors are incomplete and inconsistent with published descriptions of the survey. Solution to question 26 From yesterday : 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination

4 0.11952639 1475 andrew gelman stats-2012-08-30-A Stan is Born

Introduction: Stan 1.0.0 and RStan 1.0.0 It’s official. The Stan Development Team is happy to announce the first stable versions of Stan and RStan. What is (R)Stan? Stan is an open-source package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo. It’s sort of like BUGS, but with a different language for expressing models and a different sampler for sampling from their posteriors. RStan is the R interface to Stan. Stan Home Page Stan’s home page is: http://mc-stan.org/ It links everything you need to get started running Stan from the command line, from R, or from C++, including full step-by-step install instructions, a detailed user’s guide and reference manual for the modeling language, and tested ports of most of the BUGS examples. Peruse the Manual If you’d like to learn more, the Stan User’s Guide and Reference Manual is the place to start.

5 0.11662473 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

6 0.1131954 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley

7 0.11250255 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

8 0.10950469 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

9 0.10909192 1748 andrew gelman stats-2013-03-04-PyStan!

10 0.10479768 2161 andrew gelman stats-2014-01-07-My recent debugging experience

11 0.1036622 1392 andrew gelman stats-2012-06-26-Occam

12 0.1001599 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys

13 0.098561242 2325 andrew gelman stats-2014-05-07-Stan users meetup next week

14 0.097945035 1296 andrew gelman stats-2012-05-03-Google Translate for code, and an R help-list bot

15 0.097224265 2126 andrew gelman stats-2013-12-07-If I could’ve done it all over again

16 0.09407106 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

17 0.091127209 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

18 0.085231751 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

19 0.082487486 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

20 0.081782661 505 andrew gelman stats-2011-01-05-Wacky interview questions: An exploration into the nature of evidence on the internet

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.147), (1, 0.032), (2, -0.016), (3, 0.025), (4, 0.071), (5, 0.067), (6, 0.005), (7, -0.102), (8, -0.009), (9, -0.071), (10, -0.043), (11, -0.001), (12, -0.045), (13, -0.005), (14, 0.0), (15, -0.04), (16, -0.006), (17, 0.0), (18, -0.017), (19, 0.042), (20, -0.034), (21, -0.072), (22, -0.043), (23, 0.007), (24, -0.018), (25, -0.009), (26, 0.029), (27, -0.03), (28, -0.006), (29, -0.014), (30, 0.023), (31, -0.009), (32, -0.0), (33, 0.033), (34, -0.038), (35, 0.016), (36, 0.036), (37, 0.021), (38, 0.011), (39, -0.057), (40, 0.012), (41, -0.041), (42, -0.01), (43, -0.005), (44, -0.026), (45, 0.021), (46, -0.029), (47, -0.017), (48, 0.015), (49, 0.002)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96482986 2178 andrew gelman stats-2014-01-20-Mailing List Degree-of-Difficulty Difficulty

2 0.80770671 2124 andrew gelman stats-2013-12-05-Stan (quietly) passes 512 people on the users list

3 0.80743098 2291 andrew gelman stats-2014-04-14-Transitioning to Stan

Introduction: Kevin Cartier writes: I’ve been happily using R for a number of years now and recently came across Stan. Looks big and powerful, so I’d like to pick an appropriate project and try it out. I wondered if you could point me to a link or document that goes into the motivation for this tool (aside from the Stan user doc)? What I’d like to understand is, at what point might you look at an emergent R project and advise, “You know, that thing you’re trying to do would be a whole lot easier/simpler/more straightforward to implement with Stan.” (or words to that effect). My reply: For my collaborators in political science, Stan has been most useful for models where the data set is not huge (e.g., we might have 10,000 data points or 50,000 data points but not 10 million) but where the model is somewhat complex (for example, a model with latent time series structure). The point is that the model has enough parameters and uncertainty that you’ll want to do full Bayes (rather than some sort

4 0.79113328 2161 andrew gelman stats-2014-01-07-My recent debugging experience

Introduction: OK, so this sort of thing happens sometimes. I was working on a new idea (still working on it; if it ultimately works out—or if it doesn’t—I’ll let you know) and as part of it I was fitting little models in Stan, in a loop. I thought it would make sense to start with linear regression with normal priors and known data variance, because then the exact solution is Gaussian and I can also work with the problem analytically. So I programmed up the algorithm and, no surprise, it didn’t work. I went through my R code, put in print statements here and there, and cleared out bug after bug until at least it stopped crashing. But the algorithm still wasn’t doing what it was supposed to do. So I decided to do something simpler, and just check that the Stan linear regression gave the same answer as the analytic posterior distribution: I ran Stan for tons of iterations, then computed the sample mean and variance of the simulations. It was an example with two coefficients—I’d originally cho

5 0.77517849 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

6 0.75854874 1748 andrew gelman stats-2013-03-04-PyStan!

7 0.7582438 2020 andrew gelman stats-2013-09-12-Samplers for Big Science: emcee and BAT

8 0.74914622 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

9 0.7311455 2325 andrew gelman stats-2014-05-07-Stan users meetup next week

10 0.72940463 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models

11 0.72042286 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

12 0.72006774 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore

13 0.70993537 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

14 0.70344257 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

15 0.70194209 2035 andrew gelman stats-2013-09-23-Scalable Stan

16 0.70166487 1475 andrew gelman stats-2012-08-30-A Stan is Born

17 0.69679981 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

18 0.69344795 1711 andrew gelman stats-2013-02-07-How Open Should Academic Papers Be?

19 0.69113868 2096 andrew gelman stats-2013-11-10-Schiminovich is on The Simpsons

20 0.68506187 712 andrew gelman stats-2011-05-14-The joys of working in the public domain

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.019), (3, 0.024), (9, 0.062), (13, 0.08), (16, 0.037), (18, 0.048), (24, 0.137), (34, 0.018), (36, 0.01), (42, 0.015), (46, 0.011), (48, 0.011), (55, 0.019), (63, 0.079), (71, 0.011), (73, 0.018), (82, 0.017), (86, 0.03), (99, 0.248)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96518862 2178 andrew gelman stats-2014-01-20-Mailing List Degree-of-Difficulty Difficulty

2 0.92870605 428 andrew gelman stats-2010-11-24-Flawed visualization of U.S. voting maybe has some good features

Introduction: Aleks points me to this attractive visualization by David Sparks of U.S. voting. On the plus side, the pictures and associated movie (showing an oddly horizontally-stretched-out United States) are pretty and seem to have gotten a bit of attention–the maps have received 31 comments, which is more than we get on almost all our blog entries here. On the minus side, the movie is misleading. In many years it shows the whole U.S. as a single color, even when candidates from both parties won some votes. The text has errors too, for example the false claim that the South favored a Democratic candidate in 1980. The southern states that Jimmy Carter carried in 1980 were Georgia and . . . that’s it. But, as Aleks says, once this tool is out there, maybe people can use it to do better. It’s in that spirit that I’m linking. Ya gotta start somewhere. Also, this is a good example of a general principle: When you make a graph, look at it carefully to see if it makes sense!

3 0.92545092 2148 andrew gelman stats-2013-12-25-Spam!

Introduction: This one totally faked me out at first. It was an email from “Nick Bagnall” that began: Dear Dr. Gelman, I made contact last year regarding your work in the CMG: Reconstructing Climate from Tree Ring Data project. We are about to start producing the 2014 edition and I wanted to discuss this with you as we still remain keen to feature your work. Research Media are producing a special publication in February of 2014, within this report we will be working with a small selected number of PI’s with a focus on geosciences, atmospheric and geospace sciences and earth Sciences.. At this point, I’m thinking: Hmmm, I don’t remember this guy, is this some sort of collaborative project that I’d forgotten about? The message then continues: The publication is called International Innovation . . . Huh? This doesn’t sound so good. The email then goes on with some very long lists, and then finally the kicker: The total cost for each article produced in this report is fixed a

4 0.921242 1942 andrew gelman stats-2013-07-17-“Stop and frisk” statistics

Introduction: Washington Post columnist Richard Cohen brings up one of my research topics: In New York City, blacks make up a quarter of the population, yet they represent 78 percent of all shooting suspects — almost all of them young men. We know them from the nightly news. Those statistics represent the justification for New York City’s controversial stop-and-frisk program, which amounts to racial profiling writ large. After all, if young black males are your shooters, then it ought to be young black males whom the police stop and frisk. I have two comments on this. First, my research with Jeff Fagan and Alex Kiss (based on data from the late 1990s, so maybe things have changed) found that the NYPD was stopping blacks and hispanics at a rate higher than their previous arrest rates: To briefly summarize our findings, blacks and Hispanics represented 51% and 33% of the stops while representing only 26% and 24% of the New York City population. Compared with the number of arrests of

5 0.92081523 980 andrew gelman stats-2011-10-29-When people meet this guy, can they resist the temptation to ask him what he’s doing for breakfast??

Introduction: This is hilarious ( link from a completely deadpan Tyler Cowen). I’d call it “unintentionally hilarious” but I’m pretty sure that rms knew this was funny when he was writing it. It’s sort of like when you write a top 10 list—it’s hard to resist getting silly and going over the top. It’s only near the end that we get to the bit about the parrots. All joking aside, the most interesting part of the email was this: I [rms] have to spend 6 to 8 hours *every day* doing my usual work, which is responding to email about the GNU Project and the Free Software Movement. I’d wondered for awhile what is it that Richard Stallman actually does, that is how does he spend his time (aside from giving lectures to promote his ideas and pay the bills). Emailing –> Blogging I too spend a lot of time on email, but a few years ago I consciously tried to shift a bunch of my email exchanges to the blog. I found that I was sending out a lot of information to an audience of one, information

6 0.91480887 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

7 0.91438079 1648 andrew gelman stats-2013-01-02-A important new survey of Bayesian predictive methods for model assessment, selection and comparison

8 0.91341406 102 andrew gelman stats-2010-06-21-Why modern art is all in the mind

9 0.91334575 437 andrew gelman stats-2010-11-29-The mystery of the U-shaped relationship between happiness and age

10 0.9123314 1227 andrew gelman stats-2012-03-23-Voting patterns of America’s whites, from the masses to the elites

11 0.91102374 1621 andrew gelman stats-2012-12-13-Puzzles of criminal justice

12 0.91067624 1480 andrew gelman stats-2012-09-02-“If our product is harmful . . . we’ll stop making it.”

13 0.91006494 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

14 0.90973258 1484 andrew gelman stats-2012-09-05-Two exciting movie ideas: “Second Chance U” and “The New Dirty Dozen”

15 0.90775549 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

16 0.90705913 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley

17 0.90611434 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

18 0.9055655 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?

19 0.90492576 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

20 0.90481126 2110 andrew gelman stats-2013-11-22-A Bayesian model for an increasing function, in Stan!