andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1094 knowledge-graph by maker-knowledge-mining

1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?


meta infos for this blog

Source: html

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. [sent-1, score-0.106]

2 Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. [sent-4, score-0.782]

3 For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. [sent-5, score-1.112]

4 I see that you literally wrote the book on regression. [sent-7, score-0.079]

5 Do you know if it is possible to carry out major-axis or reduced-major-axis fitting in multiple linear regressions in SPSS, SAS or Systat (I know that it can’t be done in Minitab)? [sent-8, score-0.531]

6 Do you know if there are applications in R that carry out this type of analysis? [sent-9, score-0.281]

7 My reply: I’m a sucker for any email that begins, “I am a Canadian archaeologist. [sent-10, score-0.125]

8 ” I think there are various models out there that could work here, including factor analysis and measurement-error models. [sent-11, score-0.063]

9 I’m no expert on this particular set of models, but they get used in psychometrics when there are many variable measurements. [sent-12, score-0.252]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('dimensions', 0.377), ('minitab', 0.274), ('perpendicular', 0.25), ('spss', 0.218), ('distances', 0.218), ('canadian', 0.201), ('mutual', 0.197), ('sas', 0.19), ('carry', 0.162), ('regression', 0.16), ('measurements', 0.144), ('fitted', 0.14), ('regressions', 0.138), ('line', 0.135), ('sucker', 0.125), ('deposits', 0.125), ('survives', 0.113), ('linear', 0.111), ('reconstructed', 0.109), ('ma', 0.109), ('researching', 0.106), ('campbell', 0.103), ('variable', 0.102), ('exponential', 0.098), ('atlantic', 0.095), ('proportions', 0.093), ('chemistry', 0.091), ('psychometrics', 0.091), ('greg', 0.085), ('ordinary', 0.085), ('european', 0.083), ('ii', 0.08), ('axis', 0.08), ('literally', 0.079), ('relationships', 0.079), ('squares', 0.078), ('reduced', 0.078), ('packages', 0.077), ('multivariate', 0.074), ('predictor', 0.072), ('begins', 0.069), ('weight', 0.068), ('least', 0.068), ('equal', 0.068), ('predicted', 0.067), ('points', 0.065), ('models', 0.063), ('know', 0.06), ('expert', 0.059), ('applications', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of

2 0.13522419 533 andrew gelman stats-2011-01-23-The scalarization of America

Introduction: Mark Palko writes : You lose information when you go from a vector to a scalar. But what about this trick, which they told me about in high school? Combine two dimensions into one by interleaving the decimals. For example, if a=.11111 and b=.22222, then (a,b) = .1212121212.

3 0.12188485 83 andrew gelman stats-2010-06-13-Silly Sas lays out old-fashioned statistical thinking

Introduction: People keep telling me that Sas isn’t as bad as everybody says, but then I see (from Christian Robert ) this listing from the Sas website of “disadvantages in using Bayesian analysis”: There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results. . . . From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior. That is so tacky! As if least squares, logistic regressions, Cox models, and all those other likelihoods mentioned in the Sas documentation are so automatically convincing to subject matter experts. P.S. For some more serious objections to Bayesian statistics, see here and here . P.P.S. In case you’re wondering why I’m commenting on month-old blog entries . . . I have a monthlong backlog of entries, and I’m spooling

4 0.12042864 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

Introduction: Blogger Echidne pointed me to a recent article , “The Distance Between Mars and Venus: Measuring Global Sex Differences in Personality,” by Marco Del Giudice, Tom Booth, and Paul Irwing, who find: Sex differences in personality are believed to be comparatively small. However, research in this area has suffered from significant methodological limitations. We advance a set of guidelines for overcoming those limitations: (a) measure personality with a higher resolution than that afforded by the Big Five; (b) estimate sex differences on latent factors; and (c) assess global sex differences with multivariate effect sizes. . . . We found a global effect size D = 2.71, corresponding to an overlap of only 10% between the male and female distributions. Even excluding the factor showing the largest univariate ES [effect size], the global effect size was D = 1.71 (24% overlap). Echidne quotes a news article in which one of the study’s authors going overboard: “Psychologically, men a

5 0.11431195 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman

Introduction: Continuing with my discussion of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics: Larry Wasserman, “Low Assumptions, High Dimensions”: This article was refreshing to me because it was so different from anything I’ve seen before. Larry works in a statistics department and I work in a statistics department but there’s so little overlap in what we do. Larry and I both work in high dimesions (maybe his dimensions are higher than mine, but a few thousand dimensions seems like a lot to me!), but there the similarity ends. His article is all about using few to no assumptions, while I use assumptions all the time. Here’s an example. Larry writes: P. Laurie Davies (and his co-workers) have written several interesting papers where probability models, at least in the sense that we usually use them, are eliminated. Data are treated as deterministic. One then looks for adequate models rather than true mode

6 0.10802736 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

7 0.10511362 1661 andrew gelman stats-2013-01-08-Software is as software does

8 0.099713564 1735 andrew gelman stats-2013-02-24-F-f-f-fake data

9 0.098587222 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

10 0.097136781 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

11 0.095128484 1859 andrew gelman stats-2013-05-16-How do we choose our default methods?

12 0.090613708 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

13 0.089916639 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?

14 0.089288183 1655 andrew gelman stats-2013-01-05-The statistics software signal

15 0.087435745 852 andrew gelman stats-2011-08-13-Checking your model using fake data

16 0.086579062 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

17 0.085468233 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together

18 0.084552713 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

19 0.083834328 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

20 0.083788753 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.148), (1, 0.069), (2, 0.025), (3, 0.008), (4, 0.084), (5, 0.029), (6, -0.009), (7, -0.031), (8, 0.058), (9, 0.053), (10, 0.014), (11, -0.002), (12, -0.016), (13, -0.012), (14, 0.008), (15, 0.031), (16, 0.008), (17, -0.0), (18, -0.008), (19, -0.023), (20, 0.016), (21, 0.053), (22, 0.004), (23, -0.015), (24, 0.007), (25, 0.031), (26, 0.05), (27, -0.043), (28, -0.011), (29, -0.023), (30, 0.045), (31, 0.055), (32, 0.017), (33, -0.004), (34, -0.031), (35, -0.04), (36, 0.024), (37, 0.019), (38, -0.026), (39, -0.022), (40, -0.004), (41, 0.002), (42, -0.003), (43, -0.007), (44, 0.04), (45, 0.002), (46, -0.013), (47, 0.021), (48, 0.013), (49, -0.041)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96976465 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of

2 0.85793245 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

3 0.85564297 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

Introduction: David Hoaglin writes: After seeing it cited, I just read your paper in Technometrics. The home radon levels provide an interesting and instructive example. I [Hoaglin] have a different take on the difficulty of interpreting the estimated coefficient of the county-level basement proportion (gamma-sub-2) on page 434. An important part of the difficulty involves “other things being equal.” That sounds like the widespread interpretation of a regression coefficient as telling how the dependent variable responds to change in that predictor when the other predictors are held constant. Unfortunately, as a general interpretation, that language is oversimplified; it doesn’t reflect how regression actually works. The appropriate general interpretation is that the coefficient tells how the dependent variable responds to change in that predictor after allowing for simultaneous change in the other predictors in the data at hand. Thus, in the county-level regression gamma-sub-2 summarize

4 0.80543268 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

Introduction: Matthew Bogard writes: Regarding the book Mostly Harmless Econometrics, you state : A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective. But in fact isn’t that what they are arguing, that, in a ‘mostly harmless way’ regression is in fact a matching estimator itself? “Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance” (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a ‘mostly harmless’ substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have n

5 0.80442113 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

Introduction: Denis Cote writes: Just read this today and my unsophisticated statistical mind is confused. “Initial bivariate analyses suggest that union membership is actually associated with worse health. This association disappears when controlling for demographics, then reverses and becomes significant when controlling for labor market characteristics.” From my education about statistics, I remember to be suspicious about multiple regression coefficients that are in the opposite direction of the bivariate coefficients. What I am missing? I vaguely remember something about the suppression effect. My reply: There’s a long literature on this from many decades ago. My general feeling about such situations is that, when the coefficient changes a lot after controlling for other variables, it is important to visualize this change, to understand what is the interaction among variables that is associated with the change in the coefficients. This is what we did in our Red State Blue State

6 0.80354166 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

7 0.78927195 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

8 0.78803796 14 andrew gelman stats-2010-05-01-Imputing count data

9 0.78357893 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

10 0.77910382 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

11 0.77485603 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

12 0.77288371 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

13 0.77230364 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

14 0.77083659 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

15 0.75851351 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

16 0.7570287 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

17 0.75003994 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

18 0.74636012 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

19 0.74502254 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

20 0.73665094 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.016), (18, 0.011), (21, 0.042), (24, 0.162), (34, 0.011), (35, 0.01), (82, 0.268), (86, 0.05), (95, 0.042), (99, 0.269)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95762825 1772 andrew gelman stats-2013-03-20-Stan at Google this Thurs and at Berkeley this Fri noon

Introduction: Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff. And he’ll be showing the whirlpool movie!

2 0.94524264 1749 andrew gelman stats-2013-03-04-Stan in L.A. this Wed 3:30pm

Introduction: Michael Betancourt will be speaking at UCLA: The location for refreshment is in room 51-254 CHS at 3:00 PM. The place for the seminar is at CHS 33-105A at 3:30pm – 4:30pm, Wed 6 Mar. ["CHS" stands for Center for Health Sciences, the building of the UCLA schools of medicine and public health. Here's a map with directions .] Title of talk: Stan : Practical Bayesian Inference with Hamiltonian Monte Carlo Abstract: Practical implementations of Bayesian inference are often limited to approximation methods that only slowly explore the posterior distribution. By taking advantage of the curvature of the posterior, however, Hamiltonian Monte Carlo (HMC) efficiently explores even the most highly contorted distributions. In this talk I will review the foundations of and recent developments within HMC, concluding with a discussion of Stan, a powerful inference engine that utilizes HMC, automatic differentiation, and adaptive methods to minimize user input. This is cool stuff.

3 0.94443011 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

Introduction: David Hogg pointed me to this news article by Angela Saini: It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law. At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed. . . . “The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London.

4 0.94301641 335 andrew gelman stats-2010-10-11-How to think about Lou Dobbs

Introduction: I was unsurprised to read that Lou Dobbs, the former CNN host who crusaded against illegal immigrants, had actually hired a bunch of them himself to maintain his large house and his horse farm. (OK, I have to admit I was surprised by the part about the horse farm.) But I think most of the reactions to this story missed the point. Isabel Macdonald’s article that broke the story was entitled, “Lou Dobbs, American Hypocrite,” and most of the discussion went from there, with some commenters piling on Dobbs and others defending him by saying that Dobbs hired his laborers through contractors and may not have known they were in the country illegally. To me, though, the key issue is slightly different. And Macdonald’s story is relevant whether or not Dobbs knew he was hiring illegals. My point is not that Dobbs is a bad guy, or a hypocrite, or whatever. My point is that, in his setting, it would take an extraordinary effort to not hire illegal immigrants to take care of his house

5 0.91610146 340 andrew gelman stats-2010-10-13-Randomized experiments, non-randomized experiments, and observational studies

Introduction: In the spirit of Dehejia and Wahba: Three Conditions under Which Experiments and Observational Studies Produce Comparable Causal Estimates: New Findings from Within-Study Comparisons , by Cook, Shadish, and Wong. Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments, by Shadish, Clark, and Steiner. I just talk about causal inference. These people do it. The second link above is particularly interesting because it includes discussions by some causal inference heavyweights. WWJD and all that.

6 0.91139674 178 andrew gelman stats-2010-08-03-(Partisan) visualization of health care legislation

same-blog 7 0.90500474 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

8 0.87828147 699 andrew gelman stats-2011-05-06-Another stereotype demolished

9 0.87715805 1488 andrew gelman stats-2012-09-08-Annals of spam

10 0.87497306 359 andrew gelman stats-2010-10-21-Applied Statistics Center miniconference: Statistical sampling in developing countries

11 0.86774266 2003 andrew gelman stats-2013-08-30-Stan Project: Continuous Relaxations for Discrete MRFs

12 0.86528879 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission

13 0.86351568 1440 andrew gelman stats-2012-08-02-“A Christmas Carol” as applied to plagiarism

14 0.85861611 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform

15 0.85410792 931 andrew gelman stats-2011-09-29-Hamiltonian Monte Carlo stories

16 0.8512733 2162 andrew gelman stats-2014-01-08-Belief aggregation

17 0.85039473 1963 andrew gelman stats-2013-07-31-Response by Jessica Tracy and Alec Beall to my critique of the methods in their paper, “Women Are More Likely to Wear Red or Pink at Peak Fertility”

18 0.85019886 1958 andrew gelman stats-2013-07-27-Teaching is hard

19 0.84870172 357 andrew gelman stats-2010-10-20-Sas and R

20 0.84393704 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study