Introduction: Peter Huber’s most famous work derives from his paper on robust statistics published nearly fifty years ago in which he introduced the concept of M-estimation (a generalization of maximum likelihood) to unify some ideas of Tukey and others for estimation procedures that were relatively insensitive to small departures from the assumed model. Huber has in many ways been ahead of his time. While remaining connected to the theoretical ideas from the early part of his career, his interests have shifted to computational and graphical statistics. I never took Huber’s class on data analysis–he left Harvard while I was still in graduate school–but fortunately I have an opportunity to learn his lessons now, as he has just released a book, “Data Analysis: What Can Be Learned from the Past 50 Years.” The book puts together a few articles published in the past 15 years, along with some new material. Many of the examples are decades old, which is appropriate given that Huber is reviewing f

1 I never took Huber’s class on data analysis–he left Harvard while I was still in graduate school–but fortunately I have an opportunity to learn his lessons now, as he has just released a book, “Data Analysis: What Can Be Learned from the Past 50 Years. [sent-4, score-0.193]

2 The radon study is 15 years old, the data from the redistricting study are from the 1960s and 1970s, and so on. [sent-11, score-0.287]

3 So at this point in my career I’d like to make a virtue of necessity and say that it’s just fine to work with old examples that we really understand. [sent-13, score-0.239]

4 He also has worked on various graphical methods for data exploration and dimension reduction; although I have not used these programs myself, I view them as close in spirit to the graphical tools that we now use to explore our data in the context of our fitted models. [sent-17, score-0.426]

5 Right now, data analysis seems dominated by three approaches: - Machine learning - Bayes - Graphical exploratory data analysis with some overlap, of course. [sent-18, score-0.532]

6 ” I like Huber’s pluralistic perspective, which ranges from contamination models to object-oriented programming, from geophysics to data cleaning. [sent-22, score-0.201]

7 His is not a book to turn to for specific advice; rather, I enjoyed reading his thoughts on a variety of statistical issues and reflecting upon the connections between Huber’s strategies for data analysis and his better-known theoretical work. [sent-23, score-0.413]

8 Within orthodox Bayesian statistics, we cannot even address the question whether a model Mi, under consideration at stage i of the investigation, is consonant with the data y. [sent-38, score-0.249]

9 Also please see chapter 6 of Bayesian Data Analysis and my article , “A Bayesian formulation of exploratory data analysis and goodness-of-fit testing,” which appeared in the International Statistical Review in 2003. [sent-41, score-0.407]

10 (Huber’s chapter 5 was written in 2000 so too soon for my 2003 paper, but the first edition of our book and our paper on posterior predictive checks had already appeared several years before. [sent-42, score-0.192]

11 I like what Huber writes about approximately specified models, and I think he’d be very comfortable with our formulation of Bayesian data analysis, from the very first page of our book, as comprising three steps: (1) Model building, (2) Inference, (3) Model checking. [sent-46, score-0.232]

12 For example, the schizophrenics’ reaction time example (featured in the mixture-modeling chapter of Bayesian Data Analysis), we used the model Don recommended of a mixture of normal distributions with a fixed lag between them. [sent-57, score-0.199]

13 Looking at the data and thinking about the phenomenon, a fixed lag didn’t make sense to me, but Don emphasized that the psychology researchers were interested in an average difference and so it didn’t make sense in his perspective to try to do any further modeling on these data. [sent-58, score-0.412]

14 He said that if we wanted to model the variation of the lag, that would be fine but it would make sense to gather more data rather than knocking ourselves out on this particular small data set. [sent-59, score-0.677]

15 We’re often torn between modeling the raw raw data or modeling the processed data. [sent-79, score-0.435]

16 The latter choice can throw away important information but has the advantage, not only of computational convenience but also, sometimes, conceptual simplicity: processed data are typically closer to the form of the scientific concepts being modeled. [sent-80, score-0.197]

17 For example, an economist might prefer to analyze some sort of preprocessed price data rather than data on individual transactions. [sent-81, score-0.359]

18 8, Huber writes: We found (through exploratory data analysis of a large environmental data set) that very high radon levels were tightly localized and occurred in houses sitting on the locations of old mine shafts. [sent-87, score-0.66]

19 ” Random samples would have been useless, too: either one would have missed the exceptional values altogether, or one would have thrown them out as outliers. [sent-92, score-0.192]

20 More generally, methods such as singular value decomposition and principal components analyses have their limitations–they can work fine for balanced data such as in this example but in more complicated problems I’d go with item-response or ideal-point models. [sent-109, score-0.322]

