andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1837 knowledge-graph by maker-knowledge-mining

1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup


meta infos for this blog

Source: html

Introduction: Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. This is an independent forum and open to anyone sharing an interest in the larger use of


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. [sent-1, score-0.955]

2 How is it helping or damaging people, and which people? [sent-3, score-0.268]

3 What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? [sent-4, score-1.175]

4 This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. [sent-5, score-0.811]

5 Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. [sent-6, score-2.143]

6 We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. [sent-7, score-0.273]

7 This is an independent forum and open to anyone sharing an interest in the larger use of data. [sent-8, score-0.448]

8 Technical aspects will be discussed, but attendees do not need to have a technical background. [sent-9, score-0.499]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('surrounding', 0.279), ('skeptical', 0.199), ('nerds', 0.18), ('aspects', 0.176), ('data', 0.171), ('technical', 0.171), ('mathematical', 0.163), ('promises', 0.163), ('entrepreneurs', 0.163), ('damaging', 0.157), ('schutt', 0.157), ('business', 0.152), ('intentional', 0.152), ('attendees', 0.152), ('gaming', 0.148), ('meetup', 0.148), ('loops', 0.148), ('misuse', 0.148), ('rachel', 0.148), ('guest', 0.145), ('pitch', 0.142), ('larger', 0.137), ('speakers', 0.137), ('hype', 0.131), ('wider', 0.12), ('feedback', 0.117), ('examine', 0.114), ('focuses', 0.113), ('forum', 0.112), ('sharing', 0.112), ('helping', 0.111), ('ethical', 0.11), ('opportunities', 0.109), ('consequences', 0.108), ('engineering', 0.108), ('practices', 0.108), ('exist', 0.102), ('deep', 0.098), ('misleading', 0.096), ('mistakes', 0.094), ('accurate', 0.092), ('fields', 0.089), ('account', 0.089), ('solve', 0.088), ('science', 0.087), ('independent', 0.087), ('financial', 0.086), ('message', 0.084), ('avoid', 0.082), ('politics', 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

Introduction: Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. This is an independent forum and open to anyone sharing an interest in the larger use of

2 0.13800278 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

3 0.12841836 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

Introduction: After reading Rachel and Cathy’s book , I wrote that “Statistics is the least important part of data science . . . I think it would be fair to consider statistics as a subset of data science. . . . it’s not the most important part of data science, or even close.” But then I received “Data Science for Business,” by Foster Provost and Tom Fawcett, in the mail. I might not have opened the book at all (as I’m hardly in the target audience) but for seeing a blurb by Chris Volinsky, a statistician whom I respect a lot. So I flipped through the book and it indeed looked pretty good. It moves slowly but that’s appropriate for an intro book. But what surprised me, given the book’s title and our recent discussion on the nature of data science, was that the book was 100% statistics! It had some math (for example, definitions of various distance measures), some simple algebra, some conceptual graphs such as ROC curve, some tables and graphs of low-dimensional data summaries—but almost

4 0.11700431 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

Introduction: Rachel Schutt (the author of the Taxonomy of Confusion) has a blog! for the course she’s teaching at Columbia, “Introduction to Data Science.” It sounds like a great course—I wish I could take it! Her latest post is “On Inspiring Students and Being Human”: Of course one hopes as a teacher that one will inspire students . . . But what I actually mean by “inspiring students” is that you are inspiring me; you are students who inspire: “inspiring students”. This is one of the happy unintended consequences of this course so far for me. She then gives examples of some of the students in her class and some of their interesting ideas: Phillip is a PhD student in the sociology department . . . He’s in the process of developing his thesis topic around some of the themes we’ve been discussing in this class, such as the emerging data science community. Arvi works at the College Board and is a part time student . . . He analyzes user-level data of students who have signed up f

5 0.11566307 1687 andrew gelman stats-2013-01-21-Workshop on science communication for graduate students

Introduction: Nathan Sanders writes: Applications are now open for the Communicating Science 2013 workshop (http://workshop.astrobites.com/), to be held in Cambridge, MA on June 13-15th, 2013. Graduate students at US institutions in all fields of science and engineering are encouraged to apply – funding is available for travel expenses and accommodations. The application can be found here: http://workshop.astrobites.org/application Participants will build the communication skills that technical professionals need to express complex ideas to their peers, experts in other fields, and the general public. There will be panel discussions on the following topics: * Engaging Non-Scientific Audiences * Science Writing for a Cause * Communicating Science Through Fiction * Sharing Science with Scientists * The World of Non-Academic Publishing * Communicating using Multimedia and the Web In addition to these discussions, ample time is allotted for interacting with the experts and with att

6 0.10683577 717 andrew gelman stats-2011-05-17-Statistics plagiarism scandal

7 0.10651425 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

8 0.085546307 2325 andrew gelman stats-2014-05-07-Stan users meetup next week

9 0.085065164 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.

10 0.083235137 1131 andrew gelman stats-2012-01-20-Stan: A (Bayesian) Directed Graphical Model Compiler

11 0.083227977 703 andrew gelman stats-2011-05-10-Bringing Causal Models Into the Mainstream

12 0.082031503 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

13 0.081041999 1844 andrew gelman stats-2013-05-06-Against optimism about social science

14 0.07961151 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

15 0.076790258 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

16 0.07524541 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

17 0.073853709 648 andrew gelman stats-2011-04-04-The Case for More False Positives in Anti-doping Testing

18 0.073194548 1032 andrew gelman stats-2011-11-28-Does Avastin work on breast cancer? Should Medicare be paying for it?

19 0.072883911 757 andrew gelman stats-2011-06-10-Controversy over the Christakis-Fowler findings on the contagion of obesity

20 0.071793132 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.126), (1, -0.001), (2, -0.042), (3, -0.013), (4, -0.003), (5, 0.05), (6, -0.079), (7, 0.006), (8, -0.023), (9, 0.057), (10, -0.031), (11, -0.009), (12, -0.01), (13, -0.013), (14, -0.053), (15, 0.009), (16, -0.047), (17, 0.002), (18, 0.037), (19, -0.037), (20, 0.016), (21, -0.002), (22, -0.018), (23, -0.021), (24, -0.086), (25, 0.046), (26, 0.037), (27, -0.022), (28, 0.016), (29, 0.015), (30, 0.002), (31, -0.01), (32, -0.018), (33, -0.011), (34, -0.015), (35, 0.071), (36, -0.018), (37, 0.016), (38, 0.018), (39, 0.051), (40, 0.025), (41, -0.027), (42, -0.049), (43, 0.014), (44, -0.039), (45, 0.028), (46, 0.002), (47, -0.025), (48, 0.006), (49, -0.014)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95306849 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

Introduction: Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. This is an independent forum and open to anyone sharing an interest in the larger use of

2 0.84428847 1276 andrew gelman stats-2012-04-22-“Gross misuse of statistics” can be a good thing, if it indicates the acceptance of the importance of statistical reasoning

Introduction: Rick Lightburn writes: I [Lightburn] am also a member of the group Business Analytics on LinkedIn. I am struck by what I perceive as the gross misuse of statistics by the members of this group, including things that (I thought) were taught in Introductory Statistics courses in business schools. I want to suggest to you that you look at the discussions there if you want examples of such abuse. The discussions there support me in my belief that Analytics is data manipulation in the support of previously developed conclusions. My reply: I don’t think it’s such a bad thing. I like when people make statistical arguments, even bad statistical arguments. Once you accept the concept of arguing from logic and data, maybe you’ll be open to learning something new

3 0.81887734 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

Introduction: After reading Rachel and Cathy’s book , I wrote that “Statistics is the least important part of data science . . . I think it would be fair to consider statistics as a subset of data science. . . . it’s not the most important part of data science, or even close.” But then I received “Data Science for Business,” by Foster Provost and Tom Fawcett, in the mail. I might not have opened the book at all (as I’m hardly in the target audience) but for seeing a blurb by Chris Volinsky, a statistician whom I respect a lot. So I flipped through the book and it indeed looked pretty good. It moves slowly but that’s appropriate for an intro book. But what surprised me, given the book’s title and our recent discussion on the nature of data science, was that the book was 100% statistics! It had some math (for example, definitions of various distance measures), some simple algebra, some conceptual graphs such as ROC curve, some tables and graphs of low-dimensional data summaries—but almost

4 0.81129152 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!

Introduction: I was told about an organization called Reproducibility Initiative. They tell me they are trying to make what was described in our “50 shades of gray” post standard across all of science, particularly areas like cancer research. I don’t know anything else about them, but that sounds like a good start! Here’s the ad: Data Scientist: Science Exchange, Palo Alto, CA Science Exchange is an innovative start-up with a mission to improve the efficiency and quality of scientific research. This Data Science position is critical to our mission. Our ideal candidate has the ability to collect and normalize data from multiple sources. This information will be used to drive marketing and product decisions, as well as fuel many of the features of Science Exchange. Desired Skills & Experience Experience with text mining, entity extraction and natural language processing is essential Experience scripting with either Python or R Experience running complex statistical analyses on l

5 0.79270566 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!

Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex

6 0.79102296 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge

7 0.77287561 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools

8 0.75359118 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.

9 0.74771166 1777 andrew gelman stats-2013-03-26-Data Science for Social Good summer fellowship program

10 0.7476601 2221 andrew gelman stats-2014-02-23-Postdoc with Huffpost Pollster to do Bayesian poll tracking

11 0.73803425 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica

12 0.73801482 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

13 0.73365593 192 andrew gelman stats-2010-08-08-Turning pages into data

14 0.73000449 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course

15 0.72610581 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

16 0.72316855 830 andrew gelman stats-2011-07-29-Introductory overview lectures at the Joint Statistical Meetings in Miami this coming week

17 0.72250628 2016 andrew gelman stats-2013-09-11-Zipfian Academy, A School for Data Science

18 0.72038531 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data

19 0.71816349 176 andrew gelman stats-2010-08-02-Information is good

20 0.71302938 33 andrew gelman stats-2010-05-14-Felix Salmon wins the American Statistical Association’s Excellence in Statistical Reporting Award


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.011), (16, 0.057), (24, 0.107), (27, 0.042), (29, 0.043), (38, 0.014), (39, 0.018), (44, 0.231), (47, 0.013), (53, 0.017), (55, 0.018), (72, 0.013), (86, 0.051), (99, 0.27)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9397707 864 andrew gelman stats-2011-08-21-Going viral — not!

Introduction: Sharad explains : HIV/AIDS, like many other contagious diseases, exemplifies the common view of so-called viral propagation, growing from a few initial cases to millions through close person-to-person interactions. (Ironically, not all viruses in fact exhibit “viral” transmission patterns. For example, Hepatitis A often spreads through contaminated drinking water.[1]) By analogy to such biological epidemics, the diffusion of products and ideas is conventionally assumed to occur “virally” as well, as evidenced by prevailing theoretical frameworks (e.g., the cascade and threshold models) and an obsession in the marketing world for all things social. . . . Despite hundreds of papers written about diffusion, there is surprisingly little work addressing this fundamental empirical question. In a recent study, Duncan Watts, Dan Goldstein, and I [Goel] examined the adoption patterns of several different types of products diffusing over various online platforms — including Twitter, Face

2 0.93216658 559 andrew gelman stats-2011-02-06-Bidding for the kickoff

Introduction: Steven Brams and James Jorash propose a system for reducing the advantage that comes from winning the coin flip in overtime: Dispensing with a coin toss, the teams would bid on where the ball is kicked from by the kicking team. In the NFL, it’s now the 30-yard line. Under Brams and Jorasch’s rule, the kicking team would be the team that bids the lower number, because it is willing to put itself at a disadvantage by kicking from farther back. However, it would not kick from the number it bids, but from the average of the two bids. To illustrate, assume team A bids to kick from the 38-yard line, while team B bids its 32-yard line. Team B would win the bidding and, therefore, be designated as the kick-off team. But B wouldn’t kick from 32, but instead from the average of 38 and 32–its 35-yard line. This is better for B by 3 yards than the 32-yard line that it proposed, because it’s closer to the end zone it is kicking towards. It’s also better for A by 3 yards to have B kick fr

same-blog 3 0.92352808 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup

Introduction: Rachel Schutt writes: The hype surrounding Big Data and Data Science is at a fever pitch with promises to solve the world’s business and social problems, large and small. How accurate or misleading is this message? How is it helping or damaging people, and which people? What opportunities exist for data nerds and entrepreneurs that examine the larger issues with a skeptical view? This Meetup focuses on mathematical, ethical, and business aspects of data from a skeptical perspective. Guest speakers will discuss the misuse of and best practices with data, common mistakes people make with data and ways to avoid them, how to deal with intentional gaming and politics surrounding mathematical modeling, and taking into account the feedback loops and wider consequences of modeling. We will take deep dives into models in the fields of Data Science, statistics, financial engineering, and economics. This is an independent forum and open to anyone sharing an interest in the larger use of

4 0.91183734 444 andrew gelman stats-2010-12-02-Rational addiction

Introduction: Ole Rogeberg sends in this: and writes: No idea if this is amusing to non-economists, but I tried my hand at the xtranormal-trend. It’s an attempt to spoof the many standard “incantations” I’ve encountered over the years from economists who don’t want to agree that rational addiction theory lacks justification for some of the claims it makes. More specifically, the claims that the theory can be used to conduct welfare analysis of alternative policies. See here (scroll to Rational Addiction) and here for background.

5 0.90806383 1798 andrew gelman stats-2013-04-11-Continuing conflict over conflict statistics

Introduction: Mike Spagat sends along a serious presentation with an ironic title: 18.7 MILLION ANNIHILATED SAYS LEADING EXPERT IN PEER–REVIEWED JOURNAL: AN APPROVED, AUTHORITATIVE, SCIENTIFIC PRESENTATION MADE BY AN EXPERT He’ll be speaking on it at tomorrow’s meeting of the Catastrophes and Conflict Forum of the Royal Society of Medicine in London. All I can say is, it’s a long time since I’ve seen a slide presentation in portrait form. It brings me back to the days of transparency sheets.

6 0.90052211 1627 andrew gelman stats-2012-12-17-Stan and RStan 1.1.0

7 0.89832926 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud

8 0.89407951 748 andrew gelman stats-2011-06-06-Why your Klout score is meaningless

9 0.86918759 2150 andrew gelman stats-2013-12-27-(R-Py-Cmd)Stan 2.1.0

10 0.8669607 1436 andrew gelman stats-2012-07-31-A book on presenting numbers from spreadsheets

11 0.86603004 1145 andrew gelman stats-2012-01-30-A tax on inequality, or a tax to keep inequality at the current level?

12 0.86504304 617 andrew gelman stats-2011-03-17-“Why Preschool Shouldn’t Be Like School”?

13 0.86305988 111 andrew gelman stats-2010-06-26-Tough love as a style of writing

14 0.86111188 693 andrew gelman stats-2011-05-04-Don’t any statisticians work for the IRS?

15 0.84744394 1879 andrew gelman stats-2013-06-01-Benford’s law and addresses

16 0.84677887 2209 andrew gelman stats-2014-02-13-CmdStan, RStan, PyStan v2.2.0

17 0.831352 30 andrew gelman stats-2010-05-13-Trips to Cleveland

18 0.82909048 865 andrew gelman stats-2011-08-22-Blogging is “destroying the business model for quality”?

19 0.82144302 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

20 0.82021022 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis