andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1459 knowledge-graph by maker-knowledge-mining

1459 andrew gelman stats-2012-08-15-How I think about mixture models


meta infos for this blog

Source: html

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs. [sent-1, score-1.219]

2 ” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. [sent-2, score-0.748]

3 First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. [sent-3, score-0.211]

4 For a quick description of our mixture model and its context, see pp. [sent-4, score-0.835]

5 Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. [sent-6, score-1.484]

6 My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). [sent-7, score-1.261]

7 I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate its parameters using maximum likelihood or Bayes with flat priors. [sent-8, score-0.933]

8 The trouble with mixture models, in some sense, is that that the natural mathematical formulation is broader than what we typically want to fit. [sent-12, score-0.813]

9 Some prior constraints, particularly on the ratio of the mixture variances, controls the estimates and also makes sense in that in any application I’ve seen, I have some idea about a reasonable range of these variances. [sent-13, score-1.095]

10 What’s confusing, I think, is that we have developed some complacent intuitions based on various simple models with which we are familiar. [sent-14, score-0.411]

11 If you fit a normal or binomial or Poisson model with direct data, you’ll usually get a simple reasonable answer (except for some known tough cases such as estimating a rate when the number of events in the data is zero). [sent-15, score-0.514]

12 So we just start to assume that this is the way it should always be, that we can write down a mathematically convenient class of models and go fit to data. [sent-16, score-0.493]

13 We’ve seen this for logistic regression with complete separation, and it happens for mixture models too. [sent-18, score-1.12]

14 The class of mixture models is general enough that we always have the equivalent of complete separation, and we need to constrain the set of parameter models to ensure reasonable estimates. [sent-19, score-1.642]

15 In summary, yes, a mixture model can be a “beast” (as Larry puts it), but this beast can be tamed with a good prior distribution. [sent-20, score-1.133]

16 More generally, I think prior distributions for mixture models can be expressed hierarchically, which connects my sort of old-fashioned models to more advanced mixture models that have potentially infinite dimension. [sent-21, score-2.524]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mixture', 0.69), ('models', 0.276), ('beast', 0.173), ('larry', 0.155), ('model', 0.145), ('separation', 0.132), ('fitting', 0.129), ('prior', 0.125), ('reasonable', 0.096), ('hierarchically', 0.092), ('complete', 0.089), ('fit', 0.089), ('counterexamples', 0.083), ('constrain', 0.08), ('intuitions', 0.078), ('avoided', 0.074), ('theorems', 0.071), ('particularly', 0.07), ('berger', 0.069), ('infinite', 0.069), ('class', 0.068), ('jokes', 0.068), ('ensure', 0.067), ('connects', 0.067), ('avoiding', 0.066), ('formulation', 0.066), ('wasserman', 0.066), ('binomial', 0.065), ('seen', 0.065), ('variances', 0.064), ('poisson', 0.063), ('data', 0.062), ('inspired', 0.061), ('confusing', 0.06), ('mathematically', 0.06), ('jim', 0.06), ('controls', 0.059), ('dimension', 0.058), ('using', 0.058), ('simple', 0.057), ('broader', 0.057), ('weakly', 0.057), ('finite', 0.056), ('flat', 0.056), ('latent', 0.056), ('refers', 0.055), ('constraints', 0.055), ('ratio', 0.055), ('volume', 0.055), ('advanced', 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

2 0.28036854 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

Introduction: Michael McLaughlin sent me the following query with the above title. Some time ago, I [McLaughlin] was handed a dataset that needed to be modeled. It was generated as follows: 1. Random navigation errors, historically a binary mixture of normal and Laplace with a common mean, were collected by observation. 2. Sadly, these data were recorded with too few decimal places so that the resulting quantization is clearly visible in a scatterplot. 3. The quantized data were then interpolated (to an unobserved location). The final result looks like fuzzy points (small scale jitter) at quantized intervals spanning a much larger scale (the parent mixture distribution). This fuzziness, likely ~normal or ~Laplace, results from the interpolation. Otherwise, the data would look like a discrete analogue of the normal/Laplace mixture. I would like to characterize the latent normal/Laplace mixture distribution but the quantization is “getting in the way”. When I tried MCMC on this proble

3 0.19373715 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

Introduction: My (coauthored) books on Bayesian data analysis and applied regression are like almost all the other statistics textbooks out there, in that we spend most of our time on the basic distributions such as normal and logistic and then, only as an aside, discuss robust models such as t and robit. Why aren’t the t and robit front and center? Sure, I can see starting with the normal (at least in the Bayesian book, where we actually work out all the algebra), but then why don’t we move on immediately to the real stuff? This isn’t just (or mainly) a question of textbooks or teaching; I’m really thinking here about statistical practice. My statistical practice. Should t and robit be the default? If not, why not? Some possible answers: 10. Estimating the degrees of freedom in the error distribution isn’t so easy, and throwing this extra parameter into the model could make inference unstable. 9. Real data usually don’t have outliers. In practice, fitting a robust model costs you

4 0.17270313 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

Introduction: I’ve been writing a lot about my philosophy of Bayesian statistics and how it fits into Popper’s ideas about falsification and Kuhn’s ideas about scientific revolutions. Here’s my long, somewhat technical paper with Cosma Shalizi. Here’s our shorter overview for the volume on the philosophy of social science. Here’s my latest try (for an online symposium), focusing on the key issues. I’m pretty happy with my approach–the familiar idea that Bayesian data analysis iterates the three steps of model building, inference, and model checking–but it does have some unresolved (maybe unresolvable) problems. Here are a couple mentioned in the third of the above links. Consider a simple model with independent data y_1, y_2, .., y_10 ~ N(θ,σ^2), with a prior distribution θ ~ N(0,10^2) and σ known and taking on some value of approximately 10. Inference about μ is straightforward, as is model checking, whether based on graphs or numerical summaries such as the sample variance and skewn

5 0.16982818 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

Introduction: Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured for each student as scores derived from averaging a student’s responses on particular subsets of survey questions. (I suppose it is not uncontroversial to treat these scores as interval measures and would be interested to hear if you have any objections.) I am comparing genders on each construct. Researchers typically use t-tests to do so. To use a Bayesian approach I applied the programs written in R and JAGS by John Kruschke for estimating the difference of means: http://www.indiana.edu/~kruschke/BEST/ An issue in that analysis is that the distributions of student scores are not normal. There was skewness in some of the distributions and not always in

6 0.16649182 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

7 0.16386898 1392 andrew gelman stats-2012-06-26-Occam

8 0.16375695 1941 andrew gelman stats-2013-07-16-Priors

9 0.16218883 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)

10 0.16043083 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

11 0.15688887 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

12 0.14522679 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

13 0.14261803 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

14 0.14021465 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

15 0.13886251 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

16 0.13710086 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

17 0.13551171 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

18 0.13547897 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

19 0.1344509 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

20 0.131194 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.204), (1, 0.223), (2, 0.014), (3, 0.086), (4, -0.007), (5, -0.001), (6, 0.07), (7, -0.027), (8, -0.002), (9, 0.061), (10, 0.046), (11, 0.034), (12, -0.044), (13, 0.007), (14, -0.037), (15, -0.027), (16, 0.016), (17, -0.01), (18, -0.01), (19, -0.011), (20, 0.005), (21, -0.064), (22, -0.012), (23, -0.022), (24, -0.03), (25, -0.005), (26, -0.019), (27, -0.021), (28, 0.026), (29, 0.003), (30, -0.071), (31, 0.001), (32, 0.006), (33, -0.033), (34, 0.018), (35, -0.012), (36, -0.044), (37, -0.037), (38, -0.008), (39, 0.021), (40, 0.024), (41, 0.049), (42, -0.004), (43, 0.025), (44, 0.043), (45, 0.021), (46, -0.043), (47, 0.003), (48, -0.021), (49, -0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9714334 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

2 0.86587739 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

3 0.86180949 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.

4 0.85349488 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

Introduction: Nick Firoozye writes: I had a question about BMA [Bayesian model averaging] and model combinations in general, and direct it to you since they are a basic form of hierarchical model, albeit in the simplest of forms. I wanted to ask what the underlying assumptions are that could lead to BMA improving on a larger model. I know model combination is a topic of interest in the (frequentist) econometrics community (e.g., Bates & Granger, http://www.jstor.org/discover/10.2307/3008764?uid=3738032&uid;=2&uid;=4&sid;=21101948653381) but at the time it was considered a bit of a puzzle. Perhaps small models combined outperform a big model due to standard errors, insufficient data, etc. But I haven’t seen much in way of Bayesian justification. In simplest terms, you might have a joint density P(Y,theta_1,theta_2) from which you could use the two marginals P(Y,theta_1) and P(Y,theta_2) to derive two separate forecasts. A BMA-er would do a weighted average of the two forecast densities, having p

5 0.84744078 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?

Introduction: My (coauthored) books on Bayesian data analysis and applied regression are like almost all the other statistics textbooks out there, in that we spend most of our time on the basic distributions such as normal and logistic and then, only as an aside, discuss robust models such as t and robit. Why aren’t the t and robit front and center? Sure, I can see starting with the normal (at least in the Bayesian book, where we actually work out all the algebra), but then why don’t we move on immediately to the real stuff? This isn’t just (or mainly) a question of textbooks or teaching; I’m really thinking here about statistical practice. My statistical practice. Should t and robit be the default? If not, why not? Some possible answers: 10. Estimating the degrees of freedom in the error distribution isn’t so easy, and throwing this extra parameter into the model could make inference unstable. 9. Real data usually don’t have outliers. In practice, fitting a robust model costs you

6 0.84342939 1392 andrew gelman stats-2012-06-26-Occam

7 0.83742064 398 andrew gelman stats-2010-11-06-Quote of the day

8 0.82287729 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

9 0.8207134 1674 andrew gelman stats-2013-01-15-Prior Selection for Vector Autoregressions

10 0.81804353 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

11 0.81491035 916 andrew gelman stats-2011-09-18-Multimodality in hierarchical models

12 0.81486481 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

13 0.81397575 1431 andrew gelman stats-2012-07-27-Overfitting

14 0.8121655 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor

15 0.8025344 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

16 0.80115002 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

17 0.79824448 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics

18 0.79532576 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

19 0.79440719 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis

20 0.79181975 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.026), (16, 0.033), (20, 0.013), (21, 0.133), (24, 0.195), (45, 0.029), (48, 0.01), (53, 0.029), (76, 0.013), (84, 0.036), (86, 0.039), (99, 0.293)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97800422 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t

2 0.97731984 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?

Introduction: This post is by Phil Price. A reporter once told me that the worst-kept secret of journalism is that every story has errors. And it’s true that just about every time I know about something first-hand, the news stories about it have some mistakes. Reporters aren’t subject-matter experts, they have limited time, and they generally can’t keep revisiting the things they are saying and checking them for accuracy. Many of us have published papers with errors — my most recent paper has an incorrect figure — and that’s after working on them carefully for weeks! One way that reporters can try to get things right is by quoting experts. Even then, there are problems with taking quotes out of context, or with making poor choices about what material to include or exclude, or, of course, with making a poor selection of experts. Yesterday, I was interviewed by an NPR reporter about the risks of breathing radon (a naturally occurring radioactive gas): who should test for it, how dangerous

3 0.97682202 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation

4 0.97654879 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s

same-blog 5 0.97584486 1459 andrew gelman stats-2012-08-15-How I think about mixture models

Introduction: Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d prefer not to pay the cost of avoiding it. For a quick description of our mixture model and its context, see pp. 379-380 of my article in the Jim Berger volume). Actually, our case was particularly difficult because we were not even fitting a mixture model to data, we were fitting it to latent data and using the model to perform partial pooling. My difficulties in trying to fit this model inspired our discussion of mixture models in Bayesian Data Analysis (page 109 in the second edition, in the section on “Counterexamples to the theorems”). I agree with Larry that if you’re fitting a mixture model, it’s good to be aware of the problems that arise if you try to estimate

6 0.97546178 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

7 0.96639144 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)

8 0.96547711 659 andrew gelman stats-2011-04-13-Jim Campbell argues that Larry Bartels’s “Unequal Democracy” findings are not robust

9 0.96517646 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time

10 0.96368289 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

11 0.96206307 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think

12 0.96180117 432 andrew gelman stats-2010-11-27-Neumann update

13 0.95970207 897 andrew gelman stats-2011-09-09-The difference between significant and not significant…

14 0.95942926 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults

15 0.9584372 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

16 0.9581455 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

17 0.95791095 2306 andrew gelman stats-2014-04-26-Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

18 0.95715195 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

19 0.95588613 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

20 0.95348716 1355 andrew gelman stats-2012-05-31-Lindley’s paradox