andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1014 knowledge-graph by maker-knowledge-mining

1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data


meta infos for this blog

Source: html

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. [sent-1, score-0.769]

2 Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago. [sent-3, score-1.404]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nypd', 0.343), ('cathy', 0.331), ('project', 0.324), ('borders', 0.321), ('restrictions', 0.279), ('neil', 0.275), ('organized', 0.247), ('alex', 0.242), ('legal', 0.233), ('visualization', 0.198), ('jeff', 0.196), ('send', 0.173), ('couldn', 0.172), ('unfortunately', 0.169), ('several', 0.119), ('ago', 0.109), ('data', 0.103), ('part', 0.103), ('without', 0.099), ('used', 0.091), ('years', 0.082)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

2 0.11865681 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

3 0.11694599 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”

Introduction: After reading Rachel and Cathy’s book , I wrote that “Statistics is the least important part of data science . . . I think it would be fair to consider statistics as a subset of data science. . . . it’s not the most important part of data science, or even close.” But then I received “Data Science for Business,” by Foster Provost and Tom Fawcett, in the mail. I might not have opened the book at all (as I’m hardly in the target audience) but for seeing a blurb by Chris Volinsky, a statistician whom I respect a lot. So I flipped through the book and it indeed looked pretty good. It moves slowly but that’s appropriate for an intro book. But what surprised me, given the book’s title and our recent discussion on the nature of data science, was that the book was 100% statistics! It had some math (for example, definitions of various distance measures), some simple algebra, some conceptual graphs such as ROC curve, some tables and graphs of low-dimensional data summaries—but almost

4 0.10472084 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

Introduction: I stumbled across a chart that’s in my opinion the best way to express a comparison of quantities through time: It compares the new PC companies, such as Apple, to traditional PC companies like IBM and Compaq, but on the same scale. If you’d like to see how iPads and other novelties compare, see here . I’ve tried to use the same type of visualization in my old work on legal data visualization . It comes from a new market research firm Asymco that also produced a very clean income vs expenses visualization (click to enlarge): While the first figure is pure perfection, Tufte purists might find the second one too colorful. But to a busy person, color helps tell things apart: when I know that pink means interest, it takes a fraction of the second to assess the situation. We live in 2012, not in 1712 to have to think black and white. Finally, they have a few other interesting uses of interactive visualization, such as cellular-broadband infrastructure around

5 0.1018772 1532 andrew gelman stats-2012-10-13-A real-life dollar auction game!

Introduction: Actually, $100,000 auction. I learned about it after seeing the following email which was broadcast to a couple of mailing lists: Dear all, I am now writing about something completely different! I need your help “voting” for our project, and sending this e-mail to others so that they can also vote for our project. As you will see from the video, the project would fund *** Project: I am a finalist for a $100,000 prize from Brigham and Women’s Hospital. My project is to understand how ***. Ultimately, we want to develop a ***. We expect that this ** can be used to *** Here are the instructions: 1. Go to the web page: http://brighamandwomens.org/research/BFF/default.aspx 2. scroll to the bottom and follow the link to “Vote” 3. select project #** 4. FORWARD THIS E-MAIL TO AS MANY PEOPLE AS YOU CAN. Best regards, ** I love that step 4 is in ALL CAPS, just to give it that genuine chain-letter aura. Isn’t this weird? First, that this foundation would give ou

6 0.08734633 1246 andrew gelman stats-2012-04-04-Data visualization panel at the New York Public Library this evening!

7 0.084680475 1251 andrew gelman stats-2012-04-07-Mathematical model of vote operations

8 0.082324378 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?

9 0.079415262 1634 andrew gelman stats-2012-12-21-Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil

10 0.079119883 1256 andrew gelman stats-2012-04-10-Our data visualization panel at the New York Public Library

11 0.078361146 1077 andrew gelman stats-2011-12-21-In which I compare “POLITICO’s chief political columnist” unfavorably to a cranky old dead guy and one of the funniest writers who’s ever lived

12 0.078070894 194 andrew gelman stats-2010-08-09-Data Visualization

13 0.077053078 222 andrew gelman stats-2010-08-21-Estimating and reporting teacher effectivenss: Newspaper researchers do things that academic researchers never could

14 0.076748848 40 andrew gelman stats-2010-05-18-What visualization is best?

15 0.075138308 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?

16 0.073323436 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

17 0.073305525 537 andrew gelman stats-2011-01-25-Postdoc Position #1: Missing-Data Imputation, Diagnostics, and Applications

18 0.073256023 2153 andrew gelman stats-2013-12-29-“Statistics Done Wrong”

19 0.072506674 2309 andrew gelman stats-2014-04-28-Crowdstorming a dataset

20 0.07219594 2028 andrew gelman stats-2013-09-17-Online conference for young statistics researchers


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.064), (1, -0.019), (2, -0.033), (3, 0.015), (4, 0.049), (5, 0.003), (6, -0.042), (7, -0.007), (8, -0.028), (9, 0.012), (10, 0.006), (11, -0.021), (12, 0.016), (13, -0.018), (14, -0.006), (15, 0.025), (16, 0.013), (17, -0.014), (18, 0.04), (19, -0.011), (20, -0.008), (21, 0.008), (22, -0.005), (23, -0.03), (24, -0.037), (25, -0.009), (26, -0.022), (27, -0.006), (28, 0.04), (29, 0.021), (30, -0.024), (31, -0.04), (32, 0.02), (33, -0.004), (34, 0.013), (35, 0.042), (36, 0.014), (37, 0.007), (38, 0.043), (39, 0.0), (40, -0.012), (41, 0.026), (42, -0.002), (43, -0.086), (44, -0.007), (45, 0.004), (46, -0.005), (47, -0.014), (48, 0.015), (49, -0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93865031 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

2 0.68081081 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association

Introduction: Stephanie Evergreen writes: Media, web design, and marketing have all created an environment where stakeholders – clients, program participants, funders – all expect high quality graphics and reporting that effectively conveys the valuable insights from evaluation work. Some in statistics and mathematics have used data visualization strategies to support more useful reporting of complex ideas. Global growing interest in improving communications has begun to take root in the evaluation field as well. But as anyone who has sat through a day’s worth of a conference or had to endure a dissertation-worthy evaluation report knows, evaluators still have a long way to go. To support the development of researchers and evaluators, some members of the American Evaluation Association are proposing a new TIG (Topical Interest Group) on Data Visualization and Reporting. If you are a member of AEA (or want to be) and you are interested in joining this TIG, contact Stephanie Evergreen.

3 0.64885575 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data

Introduction: Jake Porway writes: We launched Openpaths the other week. It’s a site where people can privately upload and view their iPhone location data (at least until an Apple update wipes it out) and also download their data for their own use. More than just giving people a neat tool to view their data with, however, we’re also creating an option for them to donate their data to research projects at varying levels of anonymity. We’re still working out the terms for that, but we’d love any input and to get in touch with anyone who might want to use the data. I don’t have any use for this personally but maybe it will interest some of you. From the webpage: Openpaths is an anonymous, user-contributed database for the personal location data files recorded by iOS devices. Users securely store, explore, and manage their personal location data, and grant researchers access to portions of that data as they choose. All location data stored in openpaths is kept separate from user profi

4 0.63296235 2221 andrew gelman stats-2014-02-23-Postdoc with Huffpost Pollster to do Bayesian poll tracking

Introduction: Mark Blumenthal writes: HuffPost Pollster has an immediate opening for a social and data scientist to join us full time, preferably in our Washington D.C. bureau, to work on development and improvement of our poll tracking models and political forecasts. You are someone who has: * A passion for electoral politics, * Advanced training in statistics and dynamic Bayesian data analysis, * A Ph.D. in statistics, political science, economics or the social sciences or comparable high level training or experience, * A desire to make a lasting contribution in the way the news media cover polls and elections. We are: * The award-winning website formerly known as  Pollster.com , which joined the Huffington Post in 2010 and remains the internet’s premier source for uniquely interactive polling charts and electorate forecasts and a running daily commentary that explains, demystifies and critiques political polling. * Home to the open source Pollster API, which provides academic

5 0.61836082 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica

Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard

6 0.61014116 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.

7 0.6095295 211 andrew gelman stats-2010-08-17-Deducer update

8 0.60872626 215 andrew gelman stats-2010-08-18-DataMarket

9 0.59416157 396 andrew gelman stats-2010-11-05-Journalism in the age of data

10 0.58160681 1689 andrew gelman stats-2013-01-23-MLB Hall of Fame Voting Trajectories

11 0.579422 1343 andrew gelman stats-2012-05-25-And now, here’s something we hope you’ll really like

12 0.57142925 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

13 0.56654435 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)

14 0.56291384 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!

15 0.56227618 2016 andrew gelman stats-2013-09-11-Zipfian Academy, A School for Data Science

16 0.5583095 685 andrew gelman stats-2011-04-29-Data mining and allergies

17 0.55706352 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data

18 0.55586797 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

19 0.55572063 1212 andrew gelman stats-2012-03-14-Controversy about a ranking of philosophy departments, or How should we think about statistical results when we can’t see the raw data?

20 0.55540437 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.71), (99, 0.102)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99999899 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update

Introduction: I checked and somebody went in and screwed up my fixes to the wikipedia page on Bayesian inference. I give up.

2 0.98984516 1745 andrew gelman stats-2013-03-02-Classification error

Introduction: 15-2040 != 19-3010 (and, for that matter, 25-1022 != 25-1063).

same-blog 3 0.95896512 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data

Introduction: Cathy O’Neil organized this visualization project with NYPD stop-and-frisk data. It’s part of the Data Without Borders project. Unfortunately, because of legal restrictions I couldn’t send them the data Jeff, Alex, and I used in our project several years ago.

4 0.95826006 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

Introduction: Malecki asks: Is this the worst infographic ever to appear in NYT? USA Today is not something to aspire to. To connect to some of our recent themes , I agree this is a pretty horrible data display. But it’s not bad as a series of images. Considering the competition to be a cartoon or series of photos, these images aren’t so bad. One issue, I think, is that designers get credit for creativity and originality (unusual color combinations! Histogram bars shaped like mosques!) , which is often the opposite of what we want in a clear graph. It’s Martin Amis vs. George Orwell all over again.

5 0.93204361 528 andrew gelman stats-2011-01-21-Elevator shame is a two-way street

Introduction: Tyler Cowen links a blog by Samuel Arbesman mocking people who are so lazy that they take the elevator from 1 to 2. This reminds me of my own annoyance about a guy who worked in my building and did not take the elevator. (For the full story, go here and search on “elevator.”)

6 0.93158191 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

7 0.93007129 398 andrew gelman stats-2010-11-06-Quote of the day

8 0.91032976 1659 andrew gelman stats-2013-01-07-Some silly things you (didn’t) miss by not reading the sister blog

9 0.88956922 1304 andrew gelman stats-2012-05-06-Picking on Stephen Wolfram

10 0.88182586 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

11 0.86869031 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

12 0.86741149 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

13 0.8328681 1697 andrew gelman stats-2013-01-29-Where 36% of all boys end up nowadays

14 0.83040982 1487 andrew gelman stats-2012-09-08-Animated drought maps

15 0.81368023 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

16 0.80973226 445 andrew gelman stats-2010-12-03-Getting a job in pro sports… as a statistician

17 0.78953165 1025 andrew gelman stats-2011-11-24-Always check your evidence

18 0.78923863 1598 andrew gelman stats-2012-11-30-A graphics talk with no visuals!

19 0.76288646 700 andrew gelman stats-2011-05-06-Suspicious pattern of too-strong replications of medical research

20 0.74452621 1168 andrew gelman stats-2012-02-14-The tabloids strike again