andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1714 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Wayne Folta writes: I [Folta] was looking for R packages to address a project I’m working on and stumbled onto a package called ‘plspm’. It seems to be a nice package, but the thing I wanted to pass on is the PDF that Gaston Sanchez, its author, wrote that describes PLS Path Analysis in general and shows how to use plspm in particular. It’s like a 200-page R vignette that’s really informative and fun to read. I’d recommend it to you and your readers: even if you don’t want to delve into PLS and plspm deeply, the first seven pages and the Appendix A provide a great read about a grad student, PLS Path Analysis, and the history of the field. It’s written at a more popular level than you might like. For example, he says at one point: “A moderating effect is the fancy term that some authors use to say that there is a nosy variable M influencing the effect between an independent variable X and a dependent variable Y.” You would obviously never write anything like that [yup --- AG]
sentIndex sentText sentNum sentScore
1 Wayne Folta writes: I [Folta] was looking for R packages to address a project I’m working on and stumbled onto a package called ‘plspm’. [sent-1, score-0.402]
2 It seems to be a nice package, but the thing I wanted to pass on is the PDF that Gaston Sanchez, its author, wrote that describes PLS Path Analysis in general and shows how to use plspm in particular. [sent-2, score-0.715]
3 It’s like a 200-page R vignette that’s really informative and fun to read. [sent-3, score-0.199]
4 I’d recommend it to you and your readers: even if you don’t want to delve into PLS and plspm deeply, the first seven pages and the Appendix A provide a great read about a grad student, PLS Path Analysis, and the history of the field. [sent-4, score-0.819]
5 For example, he says at one point: “A moderating effect is the fancy term that some authors use to say that there is a nosy variable M influencing the effect between an independent variable X and a dependent variable Y. [sent-6, score-0.91]
6 ” You would obviously never write anything like that [yup --- AG], and most of your blog readers are pretty sophisticated. [sent-7, score-0.26]
7 It appears to me the PLS Path Analysis is an interesting alternative to SEM, based on partial-least-squares rather then ML. [sent-8, score-0.156]
8 Same diagrams, similar results, similar procedures, different underlying mechanism/philosophy. [sent-9, score-0.198]
9 And Gaston gives an interesting history of things and obviously put a lot of work into a 200+ page document and R package. [sent-10, score-0.344]
10 I don’t know anything about PLS path analysis but I thought I’d pass this on for the benefit of those of you who use these methods. [sent-11, score-0.663]
wordName wordTfidf (topN-words)
[('pls', 0.61), ('plspm', 0.366), ('path', 0.261), ('gaston', 0.244), ('folta', 0.183), ('variable', 0.136), ('pass', 0.127), ('obviously', 0.117), ('package', 0.116), ('influencing', 0.105), ('delve', 0.1), ('vignette', 0.1), ('yup', 0.1), ('history', 0.1), ('sem', 0.097), ('diagrams', 0.097), ('analysis', 0.095), ('wayne', 0.091), ('stumbled', 0.087), ('ag', 0.086), ('appendix', 0.083), ('seven', 0.082), ('readers', 0.08), ('fancy', 0.078), ('pdf', 0.075), ('similar', 0.074), ('onto', 0.073), ('deeply', 0.071), ('dependent', 0.07), ('document', 0.07), ('packages', 0.069), ('grad', 0.067), ('effect', 0.066), ('use', 0.064), ('anything', 0.063), ('procedures', 0.061), ('describes', 0.058), ('address', 0.057), ('interesting', 0.057), ('pages', 0.056), ('independent', 0.053), ('benefit', 0.053), ('nice', 0.052), ('alternative', 0.051), ('informative', 0.051), ('underlying', 0.05), ('recommend', 0.048), ('fun', 0.048), ('shows', 0.048), ('appears', 0.048)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis
Introduction: Wayne Folta writes: I [Folta] was looking for R packages to address a project I’m working on and stumbled onto a package called ‘plspm’. It seems to be a nice package, but the thing I wanted to pass on is the PDF that Gaston Sanchez, its author, wrote that describes PLS Path Analysis in general and shows how to use plspm in particular. It’s like a 200-page R vignette that’s really informative and fun to read. I’d recommend it to you and your readers: even if you don’t want to delve into PLS and plspm deeply, the first seven pages and the Appendix A provide a great read about a grad student, PLS Path Analysis, and the history of the field. It’s written at a more popular level than you might like. For example, he says at one point: “A moderating effect is the fancy term that some authors use to say that there is a nosy variable M influencing the effect between an independent variable X and a dependent variable Y.” You would obviously never write anything like that [yup --- AG]
2 0.17625713 1146 andrew gelman stats-2012-01-30-Convenient page of data sources from the Washington Post
Introduction: Wayne Folta points us to this list .
3 0.14344497 891 andrew gelman stats-2011-09-05-World Bank data now online
Introduction: Wayne Folta writes that the World Bank is opening up some of its data for researchers.
4 0.11732646 306 andrew gelman stats-2010-09-29-Statistics and the end of time
Introduction: Wayne Folta sends in this . It seems nuts to me (although I was happy to see that no mention was made of this horrible argument of a related sort). But I know nothing about theoretical physics so I suppose it’s all possible. I certainly have no sense of confidence in anything I’d say about the topic.
5 0.10964565 1538 andrew gelman stats-2012-10-17-Rust
Introduction: I happened to be referring to the path sampling paper today and took a look at Appendix A.2: I’m sure I could reconstruct all of this if I had to, but I certainly can’t read this sort of thing cold anymore.
6 0.10261263 1134 andrew gelman stats-2012-01-21-Lessons learned from a recent R package submission
8 0.088660643 1682 andrew gelman stats-2013-01-19-R package for Bayes factors
9 0.081883945 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course
10 0.074159436 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses
11 0.071328461 1661 andrew gelman stats-2013-01-08-Software is as software does
12 0.067795575 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance
13 0.06629394 25 andrew gelman stats-2010-05-10-Two great tastes that taste great together
14 0.064786546 2274 andrew gelman stats-2014-03-30-Adjudicating between alternative interpretations of a statistical interaction?
15 0.064759277 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.
16 0.057501785 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
17 0.057308991 2251 andrew gelman stats-2014-03-17-In the best alternative histories, the real world is what’s ultimately real
18 0.057293829 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
19 0.056875031 2069 andrew gelman stats-2013-10-19-R package for effect size calculations for psychology researchers
20 0.056045491 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
topicId topicWeight
[(0, 0.107), (1, 0.003), (2, -0.012), (3, -0.002), (4, 0.043), (5, -0.005), (6, 0.016), (7, -0.013), (8, 0.031), (9, 0.007), (10, -0.002), (11, -0.007), (12, 0.02), (13, -0.024), (14, 0.032), (15, 0.005), (16, -0.012), (17, 0.012), (18, 0.0), (19, 0.006), (20, -0.011), (21, 0.042), (22, -0.0), (23, 0.025), (24, 0.02), (25, 0.011), (26, 0.031), (27, 0.017), (28, -0.006), (29, 0.018), (30, -0.013), (31, 0.029), (32, 0.008), (33, 0.024), (34, 0.004), (35, 0.003), (36, 0.02), (37, -0.005), (38, -0.003), (39, 0.02), (40, 0.006), (41, -0.008), (42, 0.033), (43, 0.01), (44, 0.001), (45, -0.002), (46, -0.027), (47, 0.018), (48, -0.015), (49, 0.018)]
simIndex simValue blogId blogTitle
same-blog 1 0.94778061 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis
Introduction: Wayne Folta writes: I [Folta] was looking for R packages to address a project I’m working on and stumbled onto a package called ‘plspm’. It seems to be a nice package, but the thing I wanted to pass on is the PDF that Gaston Sanchez, its author, wrote that describes PLS Path Analysis in general and shows how to use plspm in particular. It’s like a 200-page R vignette that’s really informative and fun to read. I’d recommend it to you and your readers: even if you don’t want to delve into PLS and plspm deeply, the first seven pages and the Appendix A provide a great read about a grad student, PLS Path Analysis, and the history of the field. It’s written at a more popular level than you might like. For example, he says at one point: “A moderating effect is the fancy term that some authors use to say that there is a nosy variable M influencing the effect between an independent variable X and a dependent variable Y.” You would obviously never write anything like that [yup --- AG]
Introduction: Ilya Esteban writes: In your blog your advice for performing regression in the presence of large numbers of correlated features, has been to use composite scores and hierarchical modeling. Unfortunately, many problems don’t provide an obvious and unambiguous way of grouping features together (e.g. gene expression data). Are there any techniques that you would recommend that automatically pool correlated features together based on the data, without requiring the researcher to manually define composite scores or feature hierarchies? I don’t know the answer to this but I imagine something is possible . . . any ideas? In the meantime I’m reminded of this recent article by Shaw-Hwa Lo, Haitian Wang, Tian Zheng, and Inchi Hu: Recent high-throughput biological studies successfully identified thousands of risk factors associated with common human dis- eases. Most of these studies used single-variable method and each variable is analyzed individually. The risk factors so identi
3 0.71616524 2296 andrew gelman stats-2014-04-19-Index or indicator variables
Introduction: Someone who doesn’t want his name shared (for the perhaps reasonable reason that he’ll “one day not be confused, and would rather my confusion not live on online forever”) writes: I’m exploring HLMs and stan, using your book with Jennifer Hill as my field guide to this new territory. I think I have a generally clear grasp on the material, but wanted to be sure I haven’t gone astray. The problem in working on involves a multi-nation survey of students, and I’m especially interested in understanding the effects of country, religion, and sex, and the interactions among those factors (using IRT to estimate individual-level ability, then estimating individual, school, and country effects). Following the basic approach laid out in chapter 13 for such interactions between levels, I think I need to create a matrix of indicator variables for religion and sex. Elsewhere in the book, you recommend against indicator variables in favor of a single index variable. Am I right in thinking t
4 0.71094841 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models
Introduction: Fred Schiff writes: I’m writing to you to ask about the “R-squared” approximation procedure you suggest in your 2004 book with Dr. Hill. [See also this paper with Pardoe---ed.] I’m a media sociologist at the University of Houston. I’ve been using HLM3 for about two years. Briefly about my data. It’s a content analysis of news stories with a continuous scale dependent variable, story prominence. I have 6090 news stories, 114 newspapers, and 59 newspaper group owners. All the Level-1, Level-2 and dependent variables have been standardized. Since the means were zero anyway, we left the variables uncentered. All the Level-3 ownership groups and characteristics are dichotomous scales that were left uncentered. PROBLEM: The single most important result I am looking for is to compare the strength of nine competing Level-1 variables in their ability to predict and explain the outcome variable, story prominence. We are trying to use the residuals to calculate a “R-squ
5 0.70424587 2199 andrew gelman stats-2014-02-04-Widening the goalposts in medical trials
Introduction: Paul Alper writes: I do not believe your blog has ever dealt with the following phenomenon which might be called “(widening) moving the goalposts.” Drug companies and the medical world at large often create powerful drugs and procedures for people who are far (many standard deviations) from the norm (mean) and via randomized clinical trials, the relevant authorities approve. But there aren’t enough of those people to be truly profitable so the next step is to ask for approval to prescribe the same for people who aren’t that far (fewer standard deviations) from the norm. Or, just move the norm (center) so as to pick up a much larger number of patients. Afflictions include hypertension, cholesterol, overweight, osteoporosis. The result is what is often called “the worried well,” who receive little or no benefit but suffer harms from the treatment. H. Gilbert Welch has written extensively on this “goalpost” issue. He is the author of http://www.amazon.com/Overdiagnosed-Ma
7 0.70042521 418 andrew gelman stats-2010-11-17-ff
8 0.68415278 2297 andrew gelman stats-2014-04-20-Fooled by randomness
10 0.67965698 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
12 0.66868377 1382 andrew gelman stats-2012-06-17-How to make a good fig?
13 0.6665107 258 andrew gelman stats-2010-09-05-A review of a review of a review of a decade
14 0.66537106 1918 andrew gelman stats-2013-06-29-Going negative
15 0.66483068 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead
16 0.66432542 401 andrew gelman stats-2010-11-08-Silly old chi-square!
17 0.66420883 76 andrew gelman stats-2010-06-09-Both R and Stata
18 0.66189367 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”
19 0.65856159 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs
20 0.6576575 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs
topicId topicWeight
[(8, 0.014), (16, 0.036), (19, 0.013), (24, 0.083), (25, 0.015), (27, 0.011), (49, 0.013), (57, 0.043), (59, 0.011), (61, 0.235), (69, 0.01), (79, 0.057), (82, 0.017), (86, 0.031), (95, 0.022), (99, 0.278)]
simIndex simValue blogId blogTitle
1 0.9399122 1558 andrew gelman stats-2012-11-02-Not so fast on levees and seawalls for NY harbor?
Introduction: I was talking with June Williamson and mentioned offhand that I’d seen something in the paper saying that if only we’d invested a few billion dollars in levees we would’ve saved zillions in economic damage from the flood. (A quick search also revealed this eerily prescient article from last month and, more recently, this online discussion.) June said, No, no, no: levees are not the way to go: Here and here are the articles on “soft infrastructure” for the New York-New Jersey Harbor I was mentioning, summarizing work that is more extensively published in two books, “Rising Currents” and “On the Water: Palisade Bay”: The hazards posed by climate change, sea level rise, and severe storm surges make this the time to transform our coastal cities through adaptive design. The conventional response to flooding, in recent history, has been hard engineering — fortifying the coastal infrastructure with seawalls and bulkheads to protect real estate at the expense of natural t
2 0.91422176 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat
Introduction: The other day, a friend of mine who is an untenured professor (not in statistics or political science) was telling me about a class where many of the students seemed to be resubmitting papers that they had already written for previous classes. (The supposition was based on internal evidence of the topics of the submitted papers.) It would be possible to check this and then kick the cheating students out of the program—but why do it? It would be a lot of work, also some of the students who are caught might complain, then word would get around that my friend is a troublemaker. And nobody likes a troublemaker. Once my friend has tenure it would be possible to do the right thing. But . . . here’s the hitch: most college instructors do not have tenure, and one result, I suspect, is a decline in ethical standards. This is something I hadn’t thought of in our earlier discussion of job security for teachers: tenure gives you the freedom to kick out cheating students.
3 0.90864575 9 andrew gelman stats-2010-04-28-But it all goes to pay for gas, car insurance, and tolls on the turnpike
Introduction: As a New Yorker I think I’m obliged to pass on the occasional Jersey joke (most recently, this one , which annoyingly continues to attract spam comments). I’ll let the above title be my comment on this entry from Tyler Cowen entitled, “Which Americans are ‘best off’?”: If you consult human development indices the answer is Asians living in New Jersey. The standard is: The index factors in life expectancy at birth, educational degree attainment among adults 25-years or older, school enrollment for people at least three years old and median annual gross personal earnings. More generally, these sorts of rankings and ndexes seem to be cheap ways of grabbing headlines. This has always irritated me but really maybe I should go with the flow and invent a few of these indexes myself.
4 0.90511197 2349 andrew gelman stats-2014-05-26-WAIC and cross-validation in Stan!
Introduction: Aki and I write : The Watanabe-Akaike information criterion (WAIC) and cross-validation are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model. WAIC is based on the series expansion of leave-one-out cross-validation (LOO), and asymptotically they are equal. With finite data, WAIC and cross-validation address different predictive questions and thus it is useful to be able to compute both. WAIC and an importance-sampling approximated LOO can be estimated directly using the log-likelihood evaluated at the posterior simulations of the parameter values. We show how to compute WAIC, IS-LOO, K-fold cross-validation, and related diagnostic quantities in the Bayesian inference package Stan as called from R. This is important, I think. One reason the deviance information criterion (DIC) has been so popular is its implementation in Bugs. We think WAIC and cross-validation make more sense than DIC, especially from a Bayesian perspective in whic
5 0.90106225 16 andrew gelman stats-2010-05-04-Burgess on Kipling
Introduction: This is my last entry derived from Anthony Burgess’s book reviews , and it’ll be short. His review of Angus Wilson’s “The Strange Ride of Rudyard Kipling: His Life and Works” is a wonderfully balanced little thing. Nothing incredibly deep–like most items in the collection, the review is only two pages long–but I give it credit for being a rare piece of Kipling criticism I’ve seen that (a) seriously engages with the politics, without (b) congratulating itself on bravely going against the fashions of the politically incorrect chattering classes by celebrating Kipling’s magnificent achievement blah blah blah. Instead, Burgess shows respect for Kipling’s work and puts it in historical, biographical, and literary context. Burgess concludes that Wilson’s book “reminds us, in John Gross’s words, that Kipling ‘remains a haunting, unsettling presence, with whom we still have to come to terms.’ Still.” Well put, and generous of Burgess to end his review with another’s quote. Other cri
6 0.89470541 1370 andrew gelman stats-2012-06-07-Duncan Watts and the Titanic
7 0.88973522 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
8 0.88834918 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data
same-blog 9 0.88536447 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis
10 0.88507843 21 andrew gelman stats-2010-05-07-Environmentally induced cancer “grossly underestimated”? Doubtful.
12 0.86452955 827 andrew gelman stats-2011-07-28-Amusing case of self-defeating science writing
13 0.84673864 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it
14 0.84185374 776 andrew gelman stats-2011-06-22-Deviance, DIC, AIC, cross-validation, etc
15 0.83182037 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar
16 0.83132458 1433 andrew gelman stats-2012-07-28-LOL without the CATS
17 0.83023417 2134 andrew gelman stats-2013-12-14-Oswald evidence
18 0.82240754 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling
19 0.82140195 72 andrew gelman stats-2010-06-07-Valencia: Summer of 1991