andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-101 knowledge-graph by maker-knowledge-mining

101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

meta infos for this blog

Source: html

Introduction: Derek Sonderegger writes: I have just finished my Ph.D. in statistics and am currently working in applied statistics (plant ecology) using Bayesian statistics. As the statistician in the group I only ever get the ‘hard analysis’ problems that don’t readily fit into standard models. As I delve into the computational aspects of Bayesian analysis, I find myself increasingly frustrated with the current set of tools. I was delighted to see JAGS 2.0 just came out and spent yesterday happily playing with it. My question is, where do you see the short-term future of Bayesian computing going and what can we do to steer it in a particular direction? In your book with Dr Hill, you mention that you expect BUGS (or its successor) to become increasingly sophisticated and, for example, re-parameterizations that increase convergence rates would be handled automatically. Just as R has been successful because users can extend it, I think progress here also will be made by input from ‘p

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Derek Sonderegger writes: I have just finished my Ph. [sent-1, score-0.086]

2 in statistics and am currently working in applied statistics (plant ecology) using Bayesian statistics. [sent-3, score-0.162]

3 As the statistician in the group I only ever get the ‘hard analysis’ problems that don’t readily fit into standard models. [sent-4, score-0.166]

4 As I delve into the computational aspects of Bayesian analysis, I find myself increasingly frustrated with the current set of tools. [sent-5, score-0.376]

5 0 just came out and spent yesterday happily playing with it. [sent-7, score-0.101]

6 My question is, where do you see the short-term future of Bayesian computing going and what can we do to steer it in a particular direction? [sent-8, score-0.234]

7 In your book with Dr Hill, you mention that you expect BUGS (or its successor) to become increasingly sophisticated and, for example, re-parameterizations that increase convergence rates would be handled automatically. [sent-9, score-0.444]

8 Just as R has been successful because users can extend it, I think progress here also will be made by input from ‘people with an itch to scratch. [sent-10, score-0.317]

9 raw[i] - mean(alpha[]) } I would love to write something that hides that from me. [sent-17, score-0.121]

10 Here is my hope/expectation: There should be a greater decoupling of the BUGS interface to build a graph structure and the back end engine that takes a graph and runs the MCMC using whatever samplers in deems appropriate. [sent-18, score-1.259]

11 By separating the two steps, people can modify the input to make it easier to build a specific graph without worrying about the MCMC engine. [sent-19, score-0.962]

12 Re-parameterization problems will lie firmly in this sphere. [sent-20, score-0.269]

13 People doing research into different samplers can just worry about a particular graph structure and not how the structure was created. [sent-21, score-0.799]

14 This would make it easier for *both* types of developers to debug and test their code and make it easier to add new functionality. [sent-22, score-0.599]

15 My answer: I agree with you and I do think that future versions of Bugs will be more modular. [sent-23, score-0.186]

16 As it is, relatively simple hierarchical regression models can take up several pages of Bugs code. [sent-24, score-0.069]

17 The resulting models are likely to have errors and will typically run slowly. [sent-25, score-0.075]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('bugs', 0.308), ('samplers', 0.242), ('dnorm', 0.217), ('structure', 0.19), ('easier', 0.189), ('alpha', 0.179), ('graph', 0.177), ('increasingly', 0.167), ('input', 0.157), ('mcmc', 0.154), ('build', 0.143), ('debug', 0.128), ('derek', 0.128), ('steer', 0.128), ('successor', 0.128), ('hides', 0.121), ('delighted', 0.116), ('delve', 0.116), ('dr', 0.116), ('handled', 0.112), ('firmly', 0.112), ('separating', 0.108), ('future', 0.106), ('happily', 0.101), ('plant', 0.101), ('bayesian', 0.1), ('interface', 0.097), ('ecology', 0.096), ('readily', 0.096), ('worrying', 0.094), ('modify', 0.094), ('jags', 0.093), ('frustrated', 0.093), ('developers', 0.093), ('extend', 0.091), ('lie', 0.087), ('engine', 0.086), ('convergence', 0.086), ('finished', 0.086), ('statistics', 0.081), ('versions', 0.08), ('hill', 0.079), ('sophisticated', 0.079), ('resulting', 0.075), ('runs', 0.075), ('medicine', 0.072), ('greater', 0.072), ('problems', 0.07), ('users', 0.069), ('relatively', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

2 0.16314118 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

Introduction: A student writes: I am new to Bayesian methods. While I am reading your book, I have some questions for you. I am interested in doing Bayesian hierarchical (multi-level) linear regression (e.g., random-intercept model) and Bayesian structural equation modeling (SEM)—for causality. Do you happen to know if I could find some articles, where authors could provide data w/ R and/or BUGS codes that I could replicate them? My reply: For Bayesian hierarchical (multi-level) linear regression and causal inference, see my book with Jennifer Hill. For Bayesian structural equation modeling, try google and you’ll find some good stuff. Also, I recommend Stan (http://mc-stan.org/) rather than Bugs.

3 0.12891671 55 andrew gelman stats-2010-05-27-In Linux, use jags() to call Jags instead of using bugs() to call OpenBugs

Introduction: Douglas Anderton informed us that, in a Linux system, you can’t call OpenBugs from R using bugs() from the R2Winbugs package. Instead, you should call Jags using jags() from the R2jags package. P.S. Not the Rotter’s Club guy.

4 0.12364195 427 andrew gelman stats-2010-11-23-Bayesian adaptive methods for clinical trials

Introduction: Scott Berry, Brad Carlin, Jack Lee, and Peter Muller recently came out with a book with the above title. The book packs a lot into its 280 pages and is fun to read as well (even if they do use the word “modalities” in their first paragraph, and later on they use the phrase “DIC criterion,” which upsets my tidy, logical mind). The book starts off fast on page 1 and never lets go. Clinical trials are a big part of statistics and it’s cool to see the topic taken seriously and being treated rigorously. (Here I’m not talking about empty mathematical rigor (or, should I say, “rigor”), so-called optimal designs and all that, but rather the rigor of applied statistics, mapping models to reality.) Also I have a few technical suggestions. 1. The authors fit a lot of models in Bugs, which is fine, but they go overboard on the WinBUGS thing. There’s WinBUGS, OpenBUGS, JAGS: they’re all Bugs recommend running Bugs from R using the clunky BRugs interface rather than the smoother bugs(

5 0.12089378 1948 andrew gelman stats-2013-07-21-Bayes related

Introduction: Dave Decker writes: I’ve seen some Bayes related things recently that might make for interesting fodder on your blog. There are two books, teaching Bayesian analysis from a programming perspective. And also a “web application for data analysis using powerful Bayesian statistical methods.” I took a look. The first book is Think Bayes: Bayesian Statistics Made Simple, by Allen B. Downey . It’s super readable and, amazingly, has approximately zero overlap with Bayesian Data Analysis. Downey discusses lots of little problems in a conversational way. In some ways it’s like an old-style math stat textbook (although with a programming rather than mathematical flavor) in that the examples are designed for simplicity rather than realism. I like it! Our book already exists; it’s good to have something else for people to read, coming from an entirely different perspective. The second book is Probabilistic Programming and Bayesian Methods for Hackers , by Cameron Davidson-P

6 0.11640553 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

7 0.11373033 234 andrew gelman stats-2010-08-25-Modeling constrained parameters

8 0.11341203 1580 andrew gelman stats-2012-11-16-Stantastic!

9 0.11200487 1188 andrew gelman stats-2012-02-28-Reference on longitudinal models?

10 0.10910082 41 andrew gelman stats-2010-05-19-Updated R code and data for ARM

11 0.10826394 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

12 0.10163623 1661 andrew gelman stats-2013-01-08-Software is as software does

13 0.098121732 1469 andrew gelman stats-2012-08-25-Ways of knowing

14 0.096968643 1497 andrew gelman stats-2012-09-15-Our blog makes connections!

15 0.095904544 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models

16 0.094775528 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

17 0.09424597 61 andrew gelman stats-2010-05-31-A data visualization manifesto

18 0.093887255 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

19 0.093426287 1719 andrew gelman stats-2013-02-11-Why waste time philosophizing?

20 0.091319725 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.185), (1, 0.048), (2, -0.047), (3, 0.067), (4, 0.063), (5, 0.002), (6, -0.063), (7, -0.017), (8, 0.013), (9, -0.012), (10, -0.003), (11, -0.029), (12, -0.028), (13, 0.001), (14, 0.06), (15, 0.028), (16, 0.008), (17, 0.015), (18, -0.03), (19, -0.005), (20, 0.03), (21, 0.063), (22, -0.031), (23, -0.011), (24, -0.008), (25, -0.006), (26, 0.018), (27, -0.018), (28, -0.046), (29, -0.01), (30, 0.009), (31, -0.022), (32, -0.009), (33, -0.014), (34, -0.003), (35, -0.014), (36, -0.035), (37, -0.022), (38, -0.018), (39, 0.028), (40, -0.011), (41, -0.004), (42, 0.002), (43, 0.014), (44, -0.029), (45, -0.029), (46, -0.015), (47, 0.017), (48, 0.032), (49, -0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96531016 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

2 0.7958377 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

Introduction: Steve Cohen writes: As someone who has been working with Bayesian statistical models for the past several years, I [Cohen] have been challenged recently to describe the difference between Bayesian Networks (as implemented in BayesiaLab software) and modeling and inference using MCMC methods. I hope you have the time to give me (or to write on your blog) and relatively simple explanation that an advanced layman could understand. My reply: I skimmed the above website but I couldn’t quite see what they do. My guess is that they use MCMC and also various parametric approximations such as variational Bayes. They also seem to have something set up for decision analysis. My guess is that, compared to a general-purpose tool such as Stan, this Bayesia software is more accessible to non-academics in particular application areas (in this case, it looks like business marketing). But I can’t be sure. I’ve also heard about another company that looks to be doing something similar: h

3 0.70429116 421 andrew gelman stats-2010-11-19-Just chaid

Introduction: Reading somebody else’s statistics rant made me realize the inherent contradictions in much of my own statistical advice. Jeff Lax sent along this article by Philip Schrodt, along with the cryptic comment: Perhaps of interest to you. perhaps not. Not meant to be an excuse for you to rant against hypothesis testing again. In his article, Schrodt makes a reasonable and entertaining argument against the overfitting of data and the overuse of linear models. He states that his article is motivated by the quantitative papers he has been sent to review for journals or conferences, and he explicitly excludes “studies of United States voting behavior,” so at least I think Mister P is off the hook. I notice a bit of incoherence in Schrodt’s position–on one hand, he criticizes “kitchen-sink models” for overfitting and he criticizes “using complex methods without understanding the underlying assumptions” . . . but then later on he suggests that political scientists in this countr

4 0.70337051 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

Introduction: Peter Huber’s most famous work derives from his paper on robust statistics published nearly fifty years ago in which he introduced the concept of M-estimation (a generalization of maximum likelihood) to unify some ideas of Tukey and others for estimation procedures that were relatively insensitive to small departures from the assumed model. Huber has in many ways been ahead of his time. While remaining connected to the theoretical ideas from the early part of his career, his interests have shifted to computational and graphical statistics. I never took Huber’s class on data analysis–he left Harvard while I was still in graduate school–but fortunately I have an opportunity to learn his lessons now, as he has just released a book, “Data Analysis: What Can Be Learned from the Past 50 Years.” The book puts together a few articles published in the past 15 years, along with some new material. Many of the examples are decades old, which is appropriate given that Huber is reviewing f

5 0.70221889 427 andrew gelman stats-2010-11-23-Bayesian adaptive methods for clinical trials

6 0.70161366 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

7 0.70088655 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

8 0.70084542 1948 andrew gelman stats-2013-07-21-Bayes related

9 0.70021504 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building

10 0.68853158 1253 andrew gelman stats-2012-04-08-Technology speedup graph

11 0.68836838 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

12 0.6852811 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

13 0.68327677 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

14 0.68087876 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

15 0.67883998 2254 andrew gelman stats-2014-03-18-Those wacky anti-Bayesians used to be intimidating, but now they’re just pathetic

16 0.67705452 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

17 0.67332482 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

18 0.67243958 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

19 0.66914821 1808 andrew gelman stats-2013-04-17-Excel-bashing

20 0.66780996 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.01), (15, 0.02), (16, 0.088), (21, 0.014), (24, 0.105), (30, 0.016), (36, 0.235), (53, 0.025), (58, 0.01), (77, 0.055), (86, 0.06), (89, 0.013), (99, 0.259)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96313512 176 andrew gelman stats-2010-08-02-Information is good

Introduction: Washington Post and Slate reporter Anne Applebaum wrote a dismissive column about Wikileaks, saying that they “offer nothing more than raw data.” Applebaum argues that “The notion that the Internet can replace traditional news-gathering has just been revealed to be a myth. . . . without more journalism, more investigation, more work, these documents just don’t matter that much.” Fine. But don’t undervalue the role of mere data! The usual story is that we don’t get to see the raw data underlying newspaper stories. Wikileaks and other crowdsourced data can be extremely useful, whether or not they replace “traditional news-gathering.”

2 0.9578917 1797 andrew gelman stats-2013-04-10-“Proposition and experiment”

Introduction: Anna Lena Phillips writes : I. Many people will not, of their own accord, look at a poem. II. Millions of people will, of their own accord, spend lots and lots of time looking at photographs of cats. III. Therefore, earlier this year, I concluded that the best strategy for increasing the number of viewers for poems would be to print them on top of photographs of cats. IV. I happen to like looking at both poems and cats. V. So this is, for me, a win-win situation. VI. Fortunately, my own cat is a patient model, and (if I am to be believed) quite photogenic. VII. The aforementioned cat is Tisko Tansi, small hero. VII. Thus I present to you (albeit in digital rather than physical form) an Endearments broadside, featuring a poem that originally appeared in BlazeVOX spring 2011. VIII. If you want to share a copy of this image, please ask first. If you want a real copy, you can ask about that too. She follows up with an image of a cat, on which is superimposed a short

3 0.93552816 1476 andrew gelman stats-2012-08-30-Stan is fast

Introduction: 10,000 iterations for 4 chains on the (precompiled) efficiently-parameterized 8-schools model: > date () [1] "Thu Aug 30 22:12:53 2012" > fit3 <- stan (fit=fit2, data = schools_dat, iter = 1e4, n_chains = 4) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3). Iteration: 10000 / 10000 [100%] (Sampling) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4). Iteration: 10000 / 10000 [100%] (Sampling) > date () [1] "Thu Aug 30 22:12:55 2012" > print (fit3) Inference for Stan model: anon_model. 4 chains: each with iter=10000; warmup=5000; thin=1; 10000 iterations saved. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat mu 8.0 0.1 5.1 -2.0 4.7 8.0 11.3 18.4 4032 1 tau 6.7 0.1 5.6 0.3 2.5 5.4 9.3 21.2 2958 1 eta[1] 0.4 0.0 0.9 -1.5 -0

4 0.93471676 1478 andrew gelman stats-2012-08-31-Watercolor regression

Introduction: Solomon Hsiang writes: Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges). 1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically. 2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background

5 0.93142331 551 andrew gelman stats-2011-02-02-Obama and Reagan, sitting in a tree, etc.

Introduction: I saw this picture staring at me from the newsstand the other day: Here’s the accompanying article, by Michael Scherer and Michael Duffy, which echoes some of the points I made a few months ago , following the midterm election: Why didn’t Obama do a better job of leveling with the American people? In his first months in office, why didn’t he anticipate the example of the incoming British government and warn people of economic blood, sweat, and tears? Why did his economic team release overly-optimistic graphs such as shown here? Wouldn’t it have been better to have set low expectations and then exceed them, rather than the reverse? I don’t know, but here’s my theory. When Obama came into office, I imagine one of his major goals was to avoid repeating the experiences of Bill Clinton and Jimmy Carter in their first two years. Clinton, you may recall, was elected with less then 50% of the vote, was never given the respect of a “mandate” by congressional Republicans, wasted

6 0.92986965 2242 andrew gelman stats-2014-03-10-Stan Model of the Week: PK Calculation of IV and Oral Dosing

same-blog 7 0.90429968 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

8 0.89538103 370 andrew gelman stats-2010-10-25-Who gets wedding announcements in the Times?

9 0.87705255 1470 andrew gelman stats-2012-08-26-Graphs showing regression uncertainty: the code!

10 0.87460703 883 andrew gelman stats-2011-09-01-Arrow’s theorem update

11 0.87135786 1847 andrew gelman stats-2013-05-08-Of parsing and chess

12 0.86852527 1217 andrew gelman stats-2012-03-17-NSF program “to support analytic and methodological research in support of its surveys”

13 0.86558366 415 andrew gelman stats-2010-11-15-The two faces of Erving Goffman: Subtle observer of human interactions, and Smug organzation man

14 0.86168802 2105 andrew gelman stats-2013-11-18-What’s my Kasparov number?

15 0.85970402 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

16 0.84595996 55 andrew gelman stats-2010-05-27-In Linux, use jags() to call Jags instead of using bugs() to call OpenBugs

17 0.83840674 1666 andrew gelman stats-2013-01-10-They’d rather be rigorous than right

18 0.82091188 998 andrew gelman stats-2011-11-08-Bayes-Godel

19 0.81373525 619 andrew gelman stats-2011-03-19-If a comment is flagged as spam, it will disappear forever

20 0.81043541 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance