andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1276 knowledge-graph by maker-knowledge-mining

1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning


meta infos for this blog

Source: html

Introduction: Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. I want to suggest to you that you look at the discussions there if you want examples of such abuse. The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. My reply: I don’t think it’s such a bad thing. I like when people make statistical arguments, even bad statistical arguments. Once you accept the concept of arguing from logic and data, maybe you’ll be open to learning something new


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. [sent-1, score-0.328]

2 I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. [sent-2, score-1.462]

3 I want to suggest to you that you look at the discussions there if you want examples of such abuse. [sent-3, score-0.635]

4 The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. [sent-4, score-1.158]

5 My reply: I don’t think it’s such a bad thing. [sent-5, score-0.159]

6 I like when people make statistical arguments, even bad statistical arguments. [sent-6, score-0.437]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lightburn', 0.5), ('analytics', 0.34), ('discussions', 0.214), ('business', 0.193), ('misuse', 0.188), ('perceive', 0.176), ('rick', 0.173), ('support', 0.172), ('group', 0.165), ('gross', 0.161), ('manipulation', 0.159), ('logic', 0.147), ('introductory', 0.144), ('struck', 0.143), ('courses', 0.14), ('previously', 0.138), ('bad', 0.134), ('member', 0.133), ('concept', 0.128), ('taught', 0.124), ('arguing', 0.124), ('developed', 0.121), ('belief', 0.12), ('members', 0.119), ('accept', 0.11), ('arguments', 0.108), ('learning', 0.097), ('suggest', 0.096), ('statistics', 0.095), ('want', 0.094), ('open', 0.089), ('statistical', 0.088), ('examples', 0.078), ('including', 0.072), ('reply', 0.066), ('data', 0.062), ('look', 0.059), ('thought', 0.056), ('ll', 0.052), ('things', 0.051), ('maybe', 0.05), ('something', 0.044), ('new', 0.041), ('make', 0.037), ('even', 0.034), ('writes', 0.034), ('people', 0.031), ('also', 0.03), ('like', 0.025), ('think', 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

Introduction: Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. I want to suggest to you that you look at the discussions there if you want examples of such abuse. The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. My reply: I don’t think it’s such a bad thing. I like when people make statistical arguments, even bad statistical arguments. Once you accept the concept of arguing from logic and data, maybe you’ll be open to learning something new

2 0.18999138 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex

3 0.15662213 635 andrew gelman stats-2011-03-29-Bayesian spam!

Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.

4 0.11118143 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge

Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at

5 0.10252052 1405 andrew gelman stats-2012-07-04-“Titanic Thompson: The Man Who Would Bet on Everything”

Introduction: I just finished reading this book by Kevin Cook. Nothing surprising, but it’s got almost all the stories, including many that I’d never previously read. Excellent if you like that sort of thing. It’s just too bad Thompson wasn’t around to golf-hustle Michael Jordan, back when that was big business.

6 0.09144564 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

7 0.083161138 963 andrew gelman stats-2011-10-18-Question on Type M errors

8 0.082031503 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

9 0.073492177 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

10 0.071109712 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

11 0.07042744 1927 andrew gelman stats-2013-07-05-“Numbersense: How to use big data to your advantage”

12 0.069394924 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference

13 0.068889104 1561 andrew gelman stats-2012-11-04-Someone is wrong on the internet

14 0.068303406 596 andrew gelman stats-2011-03-01-Looking for a textbook for a two-semester course in probability and (theoretical) statistics

15 0.068273202 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course

16 0.066424534 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

17 0.066292949 295 andrew gelman stats-2010-09-25-Clusters with very small numbers of observations

18 0.06412641 2046 andrew gelman stats-2013-10-01-I’ll say it again

19 0.06402953 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)

20 0.063710436 153 andrew gelman stats-2010-07-17-Tenure-track position at U. North Carolina in survey methods and social statistics


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.116), (1, -0.024), (2, -0.045), (3, 0.007), (4, 0.003), (5, 0.039), (6, -0.049), (7, 0.019), (8, -0.016), (9, 0.03), (10, -0.017), (11, -0.019), (12, 0.034), (13, 0.002), (14, -0.021), (15, 0.013), (16, -0.034), (17, -0.008), (18, 0.014), (19, -0.019), (20, 0.024), (21, -0.007), (22, 0.001), (23, -0.022), (24, -0.054), (25, 0.018), (26, -0.0), (27, 0.01), (28, 0.007), (29, -0.002), (30, 0.029), (31, -0.012), (32, -0.017), (33, 0.012), (34, -0.003), (35, 0.057), (36, -0.021), (37, 0.017), (38, 0.008), (39, 0.015), (40, 0.012), (41, -0.034), (42, -0.037), (43, 0.014), (44, -0.003), (45, 0.05), (46, 0.014), (47, 0.017), (48, -0.013), (49, -0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9582876 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

Introduction: Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. I want to suggest to you that you look at the discussions there if you want examples of such abuse. The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. My reply: I don’t think it’s such a bad thing. I like when people make statistical arguments, even bad statistical arguments. Once you accept the concept of arguing from logic and data, maybe you’ll be open to learning something new

2 0.8031081 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

Introduction: Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. This is an independent forum and open to anyone sharing an interest in the larger use of

3 0.79225308 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex

4 0.78874052 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

5 0.78232545 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course

Introduction: Rajit Dasgupta writes: I have been working on a website, SlideRule that in its present state, is a catalog of online courses aggregated from over 35 providers. One of the products we are building on top of this is something called Learning Paths, which are essentially a sequence of Online Courses designed to help learners gain mastery over a certain subject. We have recently released a Learning Path on Data Analysis , contributed by Claudia Gold, an early data scientist at Airbnb. We’d love it if you could look at it and tell us what you think. We are always looking for constructive feedback. I clicked through and took a look. It’s pretty cool. I haven’t tried to assess the actual teaching materials (they’re mostly about programming, not statistics) but I like how it’s structured based on pointers to existing resources, which seems like an excellent compromise between (a) someone trying to write the material all himself or herself (which would require either limiting the sco

6 0.76992708 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

7 0.76822776 223 andrew gelman stats-2010-08-21-Statoverflow

8 0.75665963 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!

9 0.74468285 1119 andrew gelman stats-2012-01-15-Excellence in Statistical Reporting Award

10 0.74337232 1777 andrew gelman stats-2013-03-26-Data Science for Social Good summer fellowship program

11 0.73892003 1297 andrew gelman stats-2012-05-03-New New York data research organizations

12 0.73302609 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

13 0.73150176 1590 andrew gelman stats-2012-11-26-I need a title for my book on ethics and statistics!!

14 0.71862608 2016 andrew gelman stats-2013-09-11-Zipfian Academy, A School for Data Science

15 0.71557879 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association

16 0.71214169 830 andrew gelman stats-2011-07-29-Introductory overview lectures at the Joint Statistical Meetings in Miami this coming week

17 0.70635289 1541 andrew gelman stats-2012-10-19-Statistical discrimination again

18 0.70464915 33 andrew gelman stats-2010-05-14-Felix Salmon wins the American Statistical Association’s Excellence in Statistical Reporting Award

19 0.69845021 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update

20 0.69625998 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.124), (16, 0.055), (22, 0.019), (24, 0.131), (44, 0.029), (53, 0.052), (86, 0.061), (89, 0.027), (99, 0.366)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97284013 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

Introduction: Since we’re talking about the scaled inverse Wishart . . . here’s a recent message from Chris Chatham: I have been reading your book on Bayesian Hierarchical/Multilevel Modeling but have been struggling a bit with deciding whether to model my multivariate normal distribution using the scaled inverse Wishart approach you advocate given the arguments at this blog post [entitled "Why an inverse-Wishart prior may not be such a good idea"]. My reply: We discuss this in our book. We know the inverse-Wishart has problems, that’s why we recommend the scaled inverse-Wishart, which is a more general class of models. Here ‘s an old blog post on the topic. And also of course there’s the description in our book. Chris pointed me to the following comment by Simon Barthelmé: Using the scaled inverse Wishart doesn’t change anything, the standard deviations of the invidual coefficients and their covariance are still dependent. My answer would be to use a prior that models the stan

same-blog 2 0.96169555 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

Introduction: Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. I want to suggest to you that you look at the discussions there if you want examples of such abuse. The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. My reply: I don’t think it’s such a bad thing. I like when people make statistical arguments, even bad statistical arguments. Once you accept the concept of arguing from logic and data, maybe you’ll be open to learning something new

3 0.95864463 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys

Introduction: We had 28 class periods, so I wrote an exam with an approximate correspondence of one question per class. Rather than dumping the exam in your lap all at once, I’ll post the questions once per day. Then each day I’ll post the answer to yesterday’s questions. So it will be 29 days in all. I’ll post them to appear late in the day so as not to interfere with our main daily posts (which are currently backed up to early June). The course was offered in the political science department and covered a mix of statistical and political topics. Followers of our recent discussion on test questions won’t be surprised to learn that some of the questions are ambiguous. This wasn’t on purpose. I tried my best, but good questions are hard to write. Question 1 will appear tomorrow.

4 0.95848835 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i

5 0.9584468 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates

Introduction: I remember attending a talk a few years ago by my political science colleague John Huber in which he discussed cross-national comparisons of religious attitudes. One thing I remember is that the U.S. is highly religious, another thing I remembered is that lots more Americans believe in heaven than believe in hell. Some of this went into Red State Blue State—not the heaven/hell thing, but the graph of religiosity vs. GDP: and the corresponding graph of religious attendance vs. GDP for U.S. states: Also we learned that, at the individual level, the correlation of religious attendance with income is zero (according to survey reports, rich Americans are neither more nor less likely than poor Americans to go to church regularly): while the correlation of prayer with income is strongly negative (poor Americans are much more likely than rich Americans to regularly pray): Anyway, with all this, I was primed to be interested in a recent study by psychologist

6 0.95808327 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

7 0.95647293 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?

8 0.95278162 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability

9 0.94695807 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis

10 0.94593894 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”

11 0.94464499 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy

12 0.9439792 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics

13 0.94138241 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?

14 0.94008434 2313 andrew gelman stats-2014-04-30-Seth Roberts

15 0.93922395 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge

16 0.93913412 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

17 0.93893337 1960 andrew gelman stats-2013-07-28-More on that machine learning course

18 0.9384197 2058 andrew gelman stats-2013-10-11-Gladwell and Chabris, David and Goliath, and science writing as stone soup

19 0.93819904 731 andrew gelman stats-2011-05-26-Lottery probability update

20 0.93802899 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update