andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-964 knowledge-graph by maker-knowledge-mining

964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency


meta infos for this blog

Source: html

Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. [sent-2, score-0.173]

2 This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. [sent-3, score-0.489]

3 This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. [sent-4, score-0.604]

4 It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian dependence, especially when the original CP and NCP form a “beauty and beast” pair (i. [sent-5, score-0.725]

5 , when one chain mixes far more rapidly than the other). [sent-7, score-0.059]

6 The ancillarity–sufficiency reformulation of the CP–NCP dichotomy allows us to borrow insight from the well-known Basu’s theorem on the independence of (complete) sufficient and ancillary statistics, albeit a Bayesian version of Basu’s theorem is currently lacking. [sent-8, score-0.578]

7 A bevy of open questions are presented, from the mysterious but exceedingly suggestive connections between ASIS and fiducial/structural inferences to nested ASIS for further boosting MCMC efficiency. [sent-10, score-0.469]

8 I’m reminded of the folk theorem and the Pinocchio principle . [sent-12, score-0.126]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ncp', 0.406), ('cp', 0.349), ('asis', 0.325), ('ancillarity', 0.244), ('interweaving', 0.244), ('sufficiency', 0.244), ('basu', 0.148), ('strategy', 0.146), ('boosting', 0.133), ('mcmc', 0.133), ('yu', 0.129), ('theorem', 0.126), ('parameterizations', 0.125), ('parameterization', 0.109), ('meng', 0.104), ('property', 0.103), ('efficiency', 0.089), ('bevy', 0.074), ('suggestive', 0.074), ('jcgs', 0.074), ('alternating', 0.074), ('competitiveness', 0.074), ('photon', 0.074), ('pinocchio', 0.074), ('reformulation', 0.074), ('ancillary', 0.07), ('achieves', 0.07), ('telescope', 0.07), ('beast', 0.07), ('exceedingly', 0.07), ('dichotomy', 0.067), ('via', 0.065), ('magical', 0.064), ('xl', 0.064), ('multilevel', 0.062), ('mysterious', 0.059), ('nested', 0.059), ('mixes', 0.059), ('borrow', 0.059), ('probit', 0.058), ('lifetime', 0.057), ('albeit', 0.056), ('converge', 0.056), ('introduces', 0.056), ('intensity', 0.055), ('prevent', 0.055), ('compromise', 0.055), ('detecting', 0.054), ('cox', 0.054), ('devoted', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian

2 0.079401828 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing

Introduction: From an email I received the other day: Things are going much better now — it’s interesting, it feels like with both of my models, parameters are slow to converge or get “stuck” and have trouble mixing when the model is somehow misspecified. See here for a statement of the folk theorem.

3 0.075183101 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

Introduction: Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I could fit the first regression only using MCMC because those estimates are hard to obtain and perform the second regression using least squares or a separate MCMC. So there’s an ‘one step’ inference based on doing it all at the same time and a ‘two step’ inference by fitting one and using the estimates on the further steps. What is gained or lost between both? Is anything done in this question? My response: Rather than answering your particular question, I’ll give you my generic answer, which is to simulate fake data from your model, then fit your model both ways and see how the results differ. Repeat the simulation a few thousand times and you can make all the statistical comparisons you like.

4 0.07258974 992 andrew gelman stats-2011-11-05-Deadwood in the math curriculum

Introduction: Mark Palko asks : What are the worst examples of curriculum dead wood? Here’s the background: One of the first things that hit me [Palko] when I started teaching high school math was how much material there was to cover. . . . The most annoying part, though, was the number of topics that could easily have been cut, thus giving the students the time to master the important skills and concepts. The example that really stuck with me was synthetic division, a more concise but less intuitive way of performing polynomial long division. Both of these topics are pretty much useless in daily life but polynomial long division does, at least, give the student some insight into the relationship between polynomials and familiar base-ten numbers. Synthetic division has no such value; it’s just a faster but less interesting way of doing something you’ll never have to do. I started asking hardcore math people — mathematicians, statisticians, physicists, rocket scientists — if they.’d ever u

5 0.067172565 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics

Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz

6 0.064719662 419 andrew gelman stats-2010-11-18-Derivative-based MCMC as a breakthrough technique for implementing Bayesian statistics

7 0.064709768 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

8 0.060379546 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

9 0.057709582 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

10 0.055934846 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

11 0.055254746 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

12 0.0546223 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

13 0.053178407 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

14 0.049444251 746 andrew gelman stats-2011-06-05-An unexpected benefit of Arrow’s other theorem

15 0.048981354 1469 andrew gelman stats-2012-08-25-Ways of knowing

16 0.047693849 3 andrew gelman stats-2010-04-26-Bayes in the news…in a somewhat frustrating way

17 0.047027402 2231 andrew gelman stats-2014-03-03-Running into a Stan Reference by Accident

18 0.046880096 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

19 0.046822846 109 andrew gelman stats-2010-06-25-Classics of statistics

20 0.045510016 1520 andrew gelman stats-2012-10-03-Advice that’s so eminently sensible but so difficult to follow


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.076), (1, 0.052), (2, -0.013), (3, 0.015), (4, 0.013), (5, 0.028), (6, -0.017), (7, -0.025), (8, 0.015), (9, 0.017), (10, 0.013), (11, 0.008), (12, -0.028), (13, 0.003), (14, -0.007), (15, -0.014), (16, 0.015), (17, -0.003), (18, -0.013), (19, -0.002), (20, 0.005), (21, -0.007), (22, 0.003), (23, -0.013), (24, -0.002), (25, -0.006), (26, -0.012), (27, 0.013), (28, 0.017), (29, -0.005), (30, -0.022), (31, 0.007), (32, 0.011), (33, -0.001), (34, 0.005), (35, -0.008), (36, -0.02), (37, 0.002), (38, 0.008), (39, 0.002), (40, -0.006), (41, 0.019), (42, -0.007), (43, -0.016), (44, 0.02), (45, -0.024), (46, -0.013), (47, -0.001), (48, -0.003), (49, -0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95867658 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian

2 0.85021114 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

Introduction: David Duvenaud writes: I’ve been following your recent discussions about how an AI could do statistics [see also here ]. I was especially excited about your suggestion for new statistical methods using “a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit.” Your discussion of these ideas was exciting to me and my colleagues because we recently did some work taking a step in this direction, automatically searching through a grammar over Gaussian process regression models. Roger Grosse previously did the same thing , but over matrix decomposition models using held-out predictive likelihood to check model fit. These are both examples of automatic Bayesian model-building by a search over more and more complex models, as you suggested. One nice thing is that both grammars include lots of standard models for free, and they seem to work pretty well, although the

3 0.80584359 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

Introduction: In an article catchily entitled, “I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb?”, Meng and Xie write: Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are par- ticularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple mod- els such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994, Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation ρ under a statio

4 0.79884309 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

Introduction: I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write : The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. We can actually now fit Gaussian processes in Stan . But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.) GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whip

5 0.78697115 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

6 0.78424573 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

7 0.78085411 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

8 0.77940339 1682 andrew gelman stats-2013-01-19-R package for Bayes factors

9 0.77423364 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

10 0.76816732 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables

11 0.75159049 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

12 0.74754202 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

13 0.74621189 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

14 0.74546373 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

15 0.74510574 1392 andrew gelman stats-2012-06-26-Occam

16 0.74316502 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

17 0.7415033 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc

18 0.73996001 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

19 0.73978478 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

20 0.73595297 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.02), (9, 0.025), (13, 0.015), (15, 0.011), (16, 0.046), (24, 0.122), (30, 0.014), (36, 0.048), (39, 0.012), (43, 0.013), (45, 0.017), (53, 0.013), (63, 0.014), (76, 0.019), (80, 0.31), (86, 0.037), (97, 0.012), (99, 0.15)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.87506157 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

Introduction: Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian

2 0.80000597 1029 andrew gelman stats-2011-11-26-“To Rethink Sprawl, Start With Offices”

Introduction: According to this op-ed by Louise Mozingo, the fashion for suburban corporate parks is seventy years old: In 1942 the AT&T; Bell Telephone Laboratories moved from its offices in Lower Manhattan to a new, custom-designed facility on 213 acres outside Summit, N.J. The location provided space for laboratories and quiet for acoustical research, and new features: parking lots that allowed scientists and engineers to drive from their nearby suburban homes, a spacious cafeteria and lounge and, most surprisingly, views from every window of a carefully tended pastoral landscape designed by the Olmsted brothers, sons of the designer of Central Park. Corporate management never saw the city center in the same way again. Bell Labs initiated a tide of migration of white-collar workers, especially as state and federal governments conveniently extended highways into the rural edge. Just to throw some Richard Florida in the mix: Back in 1990, I turned down a job offer from Bell Labs, larg

3 0.71310818 470 andrew gelman stats-2010-12-16-“For individuals with wine training, however, we find indications of a positive relationship between price and enjoyment”

Introduction: The title of this blog post quotes the second line of the abstract of Goldstein et al.’s much ballyhooed 2008 tech report, Do More Expensive Wines Taste Better? Evidence from a Large Sample of Blind Tastings . The first sentence of the abstract is Individuals who are unaware of the price do not derive more enjoyment from more expensive wine. Perhaps not surprisingly, given the easy target wine snobs make, the popular press has picked up on the first sentence of the tech report. For example, the Freakonomics blog/radio entry of the same name quotes the first line, ignores the qualification, then concludes Wishing you the happiest of holiday seasons, and urging you to spend $15 instead of $50 on your next bottle of wine. Go ahead, take the money you save and blow it on the lottery. In case you’re wondering about whether to buy me a cheap or expensive bottle of wine, keep in mind I’ve had classical “wine training”. After ten minutes of training with some side by

4 0.70405424 730 andrew gelman stats-2011-05-25-Rechecking the census

Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju

5 0.68960655 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend

Introduction: A student journalist called me with some questions about when the U.S. would have a female president. At one point she asked if there were any surveys of whether people would vote for a woman. I suggested she try Google. I was by my computer anyway so typed “what percentage of americans would vote for a woman president” (without the quotation marks), and the very first hit was this from Gallup, from 2007: The Feb. 9-11, 2007, poll asked Americans whether they would vote for “a generally well-qualified” presidential candidate nominated by their party with each of the following characteristics: Jewish, Catholic, Mormon, an atheist, a woman, black, Hispanic, homosexual, 72 years of age, and someone married for the third time. Between now and the 2008 political conventions, there will be discussion about the qualifications of presidential candidates — their education, age, religion, race, and so on. If your party nominated a generally well-qualified person for president who happene

6 0.67700934 1494 andrew gelman stats-2012-09-13-Watching the sharks jump

7 0.66962457 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates

8 0.65848911 1063 andrew gelman stats-2011-12-16-Suspicious histogram bars

9 0.64434093 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics

10 0.63288814 642 andrew gelman stats-2011-04-02-Bill James and the base-rate fallacy

11 0.61711395 137 andrew gelman stats-2010-07-10-Cost of communicating numbers

12 0.59885693 2119 andrew gelman stats-2013-12-01-Separated by a common blah blah blah

13 0.58554995 1985 andrew gelman stats-2013-08-16-Learning about correlations using cross-sectional and over-time comparisons between and within countries

14 0.58110619 227 andrew gelman stats-2010-08-23-Visualization magazine

15 0.5806244 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe

16 0.57636118 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

17 0.57428324 795 andrew gelman stats-2011-07-10-Aleks says this is the future of visualization

18 0.57361174 140 andrew gelman stats-2010-07-10-SeeThroughNY

19 0.57187617 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

20 0.55364466 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave