andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1853 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
sentIndex sentText sentNum sentScore
1 Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica. [sent-1, score-0.192]
2 org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. [sent-2, score-1.009]
3 We are doing this because the road to the future of news has been littered with lost datasets. [sent-3, score-0.318]
4 A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. [sent-4, score-0.905]
5 Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. [sent-6, score-1.089]
6 If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica. [sent-7, score-0.213]
7 To apply, go to the website and answer a simple form agreeing to meet the standard criteria for open data. [sent-9, score-0.698]
8 Once the application is approved, you will receive an account to start running and managing open data, becoming part of the community. [sent-10, score-0.924]
wordName wordTfidf (topN-words)
[('repository', 0.384), ('paz', 0.192), ('knight', 0.192), ('hackathon', 0.192), ('miguel', 0.192), ('launching', 0.192), ('america', 0.186), ('central', 0.172), ('managing', 0.167), ('latin', 0.162), ('open', 0.151), ('approved', 0.148), ('agreeing', 0.143), ('road', 0.137), ('files', 0.132), ('regional', 0.132), ('data', 0.13), ('reliable', 0.129), ('meet', 0.123), ('tries', 0.122), ('activities', 0.122), ('proved', 0.121), ('criteria', 0.118), ('meeting', 0.117), ('journalism', 0.117), ('becoming', 0.116), ('receive', 0.114), ('chapters', 0.111), ('foundation', 0.109), ('organization', 0.109), ('realized', 0.106), ('analyze', 0.106), ('international', 0.104), ('part', 0.104), ('lost', 0.102), ('successful', 0.099), ('community', 0.097), ('therefore', 0.097), ('account', 0.095), ('application', 0.093), ('website', 0.092), ('leading', 0.089), ('compare', 0.088), ('share', 0.087), ('apply', 0.085), ('running', 0.084), ('future', 0.079), ('remember', 0.079), ('together', 0.076), ('form', 0.071)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
2 0.14981672 192 andrew gelman stats-2010-08-08-Turning pages into data
Introduction: There is a lot of data on the web, meant to be looked at by people, but how do you turn it into a spreadsheet people could actually analyze statistically? The technique to turn web pages intended for people into structured data sets intended for computers is called “screen scraping.” It has just been made easier with a wiki/community http://scraperwiki.com/ . They provide libraries to extract information from PDF, Excel files, to automatically fill in forms and similar. Moreover, the community aspect of it should allow researchers doing similar things to get connected. It’s very good. Here’s an example of scraping road accident data or port of London ship arrivals . You can already find collections of structured data online, examples are Infochimps (“find the world’s data”), and Freebase (“An entity graph of people, places and things, built by a community that loves open data.”). There’s also a repository system for data, TheData (“An open-source application for pub
3 0.11005074 999 andrew gelman stats-2011-11-09-I was at a meeting a couple months ago . . .
Introduction: . . . and I decided to amuse myself by writing down all the management-speak words I heard: “grappling” “early prototypes” “technology platform” “building block” “machine learning” “your team” “workspace” “tagging” “data exhaust” “monitoring a particular population” “collective intelligence” “communities of practice” “hackathon” “human resources . . . technologies” Any one or two or three of these phrases might be fine, but put them all together and what you have is a festival of jargon. A hackathon, indeed.
4 0.099887297 2016 andrew gelman stats-2013-09-11-Zipfian Academy, A School for Data Science
Introduction: Katie Kent writes: I’m with Zipfian Academy – we’re launching next week as the first 12-week immersive program to teach data science. The program combines the hard and soft skills of data science with introductions to the data science community out here in San Francisco. The launch will be covered by a couple big tech blogs, but we’d love to offer the opportunity to blog about it to some smaller and well-respected data science blogs like yours. I don’t know anything about this but I took a look at the website and it looks pretty cool. Maybe in a future iteration of their course, they can teach Stan, once it has a few more useful features such as VB and EP.
5 0.078062735 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
Introduction: David Karger writes: Your recent post on sharing data was of great interest to me, as my own research in computer science asks how to incentivize and lower barriers to data sharing. I was particularly curious about your highlighting of effort as the major dis-incentive to sharing. I would love to hear more, as this question of effort is on we specifically target in our development of tools for data authoring and publishing. As a straw man, let me point out that sharing data technically requires no more than posting an excel spreadsheet online. And that you likely already produced that spreadsheet during your own analytic work. So, in what way does such low-tech publishing fail to meet your data sharing objectives? Our own hypothesis has been that the effort is really quite low, with the problem being a lack of *immediate/tangible* benefits (as opposed to the long-term values you accurately describe). To attack this problem, we’re developing tools (and, since it appear
6 0.074283578 2239 andrew gelman stats-2014-03-09-Reviewing the peer review process?
8 0.071427107 1948 andrew gelman stats-2013-07-21-Bayes related
9 0.068017282 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!
10 0.066600271 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?
11 0.066388063 58 andrew gelman stats-2010-05-29-Stupid legal crap
12 0.066052809 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?
13 0.062381588 113 andrew gelman stats-2010-06-28-Advocacy in the form of a “deliberative forum”
14 0.061761044 911 andrew gelman stats-2011-09-15-More data tools worth using from Google
15 0.060371045 1135 andrew gelman stats-2012-01-22-Advice on do-it-yourself stats education?
16 0.059895165 223 andrew gelman stats-2010-08-21-Statoverflow
17 0.059803076 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model
18 0.059758827 1583 andrew gelman stats-2012-11-19-I can’t read this interview with me
19 0.05916103 1451 andrew gelman stats-2012-08-08-Robert Kosara reviews Ed Tufte’s short course
20 0.057967521 41 andrew gelman stats-2010-05-19-Updated R code and data for ARM
topicId topicWeight
[(0, 0.108), (1, -0.012), (2, -0.02), (3, 0.009), (4, 0.029), (5, 0.029), (6, -0.032), (7, -0.019), (8, -0.022), (9, 0.022), (10, -0.024), (11, -0.016), (12, 0.005), (13, 0.007), (14, -0.031), (15, 0.033), (16, 0.022), (17, -0.037), (18, 0.038), (19, 0.007), (20, 0.029), (21, 0.022), (22, -0.007), (23, -0.013), (24, -0.028), (25, 0.017), (26, 0.029), (27, -0.02), (28, 0.03), (29, 0.022), (30, -0.001), (31, -0.058), (32, -0.001), (33, 0.059), (34, 0.027), (35, 0.031), (36, -0.013), (37, 0.011), (38, 0.011), (39, 0.029), (40, 0.018), (41, 0.018), (42, -0.01), (43, 0.01), (44, -0.001), (45, 0.037), (46, -0.01), (47, -0.044), (48, 0.016), (49, 0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.94706464 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
2 0.84110194 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data
Introduction: Jake Porway writes: We launched Openpaths the other week. It’s a site where people can privately upload and view their iPhone location data (at least until an Apple update wipes it out) and also download their data for their own use. More than just giving people a neat tool to view their data with, however, we’re also creating an option for them to donate their data to research projects at varying levels of anonymity. We’re still working out the terms for that, but we’d love any input and to get in touch with anyone who might want to use the data. I don’t have any use for this personally but maybe it will interest some of you. From the webpage: Openpaths is an anonymous, user-contributed database for the personal location data files recorded by iOS devices. Users securely store, explore, and manage their personal location data, and grant researchers access to portions of that data as they choose. All location data stored in openpaths is kept separate from user profi
3 0.81610757 192 andrew gelman stats-2010-08-08-Turning pages into data
Introduction: There is a lot of data on the web, meant to be looked at by people, but how do you turn it into a spreadsheet people could actually analyze statistically? The technique to turn web pages intended for people into structured data sets intended for computers is called “screen scraping.” It has just been made easier with a wiki/community http://scraperwiki.com/ . They provide libraries to extract information from PDF, Excel files, to automatically fill in forms and similar. Moreover, the community aspect of it should allow researchers doing similar things to get connected. It’s very good. Here’s an example of scraping road accident data or port of London ship arrivals . You can already find collections of structured data online, examples are Infochimps (“find the world’s data”), and Freebase (“An entity graph of people, places and things, built by a community that loves open data.”). There’s also a repository system for data, TheData (“An open-source application for pub
4 0.81449986 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data
Introduction: Factual collects data on a variety of topics, organizes them, and allows easy access. If you ever wanted to do a histogram of calorie content in Starbucks coffees or plot warnings with a live feed of earthquake data – your life should be a bit simpler now. Also see DataMarket , InfoChimps , and a few older links in The Future of Data Analysis . If you access the data through the API, you can build live visualizations like this: Of course, you could just go to the source. Roy Mendelssohn writes (with minor edits): Since you are both interested in data access, please look at our service ERDDAP: http://coastwatch.pfel.noaa.gov/erddap/index.html http://upwell.pfeg.noaa.gov/erddap/index.html Please do not be fooled by the web pages. Everything is a service (including search and graphics) and the URL completely defines the request, and response formats are easily changed just by changing the “file extension”. The web pages are just html and javascript that u
Introduction: David Karger writes: Your recent post on sharing data was of great interest to me, as my own research in computer science asks how to incentivize and lower barriers to data sharing. I was particularly curious about your highlighting of effort as the major dis-incentive to sharing. I would love to hear more, as this question of effort is on we specifically target in our development of tools for data authoring and publishing. As a straw man, let me point out that sharing data technically requires no more than posting an excel spreadsheet online. And that you likely already produced that spreadsheet during your own analytic work. So, in what way does such low-tech publishing fail to meet your data sharing objectives? Our own hypothesis has been that the effort is really quite low, with the problem being a lack of *immediate/tangible* benefits (as opposed to the long-term values you accurately describe). To attack this problem, we’re developing tools (and, since it appear
6 0.77953005 118 andrew gelman stats-2010-06-30-Question & Answer Communities
7 0.77793241 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.
9 0.77330917 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!
10 0.74686277 211 andrew gelman stats-2010-08-17-Deducer update
11 0.739977 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing
12 0.73991746 1434 andrew gelman stats-2012-07-29-FindTheData.org
13 0.7357946 1923 andrew gelman stats-2013-07-03-Bayes pays!
14 0.73437816 1837 andrew gelman stats-2013-05-03-NYC Data Skeptics Meetup
15 0.73231399 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
16 0.72337234 907 andrew gelman stats-2011-09-14-Reproducibility in Practice
17 0.72289217 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation
18 0.72085434 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools
19 0.71918172 215 andrew gelman stats-2010-08-18-DataMarket
20 0.71770132 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association
topicId topicWeight
[(16, 0.097), (21, 0.027), (24, 0.141), (45, 0.036), (53, 0.019), (84, 0.011), (86, 0.033), (98, 0.307), (99, 0.223)]
simIndex simValue blogId blogTitle
1 0.96351576 1249 andrew gelman stats-2012-04-06-Thinking seriously about social science research
Introduction: I haven’t linked to the Baby Name Wizard in awhile. . . . Laura Wattenberg takes a look at the question , “Does a hard-to-pronounce baby name hurt you?” Critical thinking without “debunking”—this is the way to go.
2 0.93462402 26 andrew gelman stats-2010-05-11-Update on religious affiliations of Supreme Court justices
Introduction: When Sonia Sotomayor was nominated for the Supreme Court, and there was some discussion of having 6 Roman Catholics on the court at the same time, I posted the following historical graph: It’s time for an update: It’s still gonna take awhile for the Catholics to catch up. . . . And this one might be relevant too: It looks as if Jews and men have been overrepresented, also Episcopalians (which, as I noted earlier, are not necessarily considered Protestant in terms of religious doctrine but which I counted as such for the ethnic categorization). Religion is an interesting political variable because it’s nominally about religious belief but typically seems to be more about ethnicity.
3 0.92627984 1239 andrew gelman stats-2012-04-01-A randomized trial of the set-point diet
Introduction: Someone pointed me to this forthcoming article in the journal Nutrition by J. F. Lee et al. It looks pretty cool. I’m glad that someone went to the effort of performing this careful study. Regular readers will know that I’ve been waiting for this one for awhile. In case you can’t read the article through the paywall, here’s the abstract: Background: Under a widely-accepted theory of caloric balance, any individual has a set-point weight and will find it uncomfortable and typically unsustainable to keep his or her weight below that point. Set-points have evidently been increasing over the past few decades in the United States and other countries, leading to a public-health crisis of obesity. In an n=1 study, Roberts (2004, 2006) proposed an intervention to lower the set-point via daily consumption of unflavored sugar water or vegetable oil. Objective: To evaluate weight-loss outcomes under the diet proposed by Roberts (2004, 2006). Design: Randomized clinica
4 0.90622616 420 andrew gelman stats-2010-11-18-Prison terms for financial fraud?
Introduction: My econ dept colleague Joseph Stiglitz suggests that financial fraudsters be sent to prison. He points out that the usual penalty–million-dollar fines–just isn’t enough for crimes whose rewards can be in the hundreds of millions of dollars. That all makes sense, but why do the options have to be: 1. No punishment 2. A fine with little punishment or deterrent value 3. Prison. What’s the point of putting nonviolent criminals in prison? As I’ve said before , I’d prefer if the government just took all these convicted thieves’ assets along with 95% of their salary for several years, made them do community service (sorting bottles and cans at the local dump, perhaps; a financier should be good at this sort of thing, no?), etc. If restriction of personal freedom is judged be part of the sentence, they could be given some sort of electronic tag that would send a message to the police if you are ever more than 3 miles from your home. And a curfew so you have to stay home bet
same-blog 5 0.90386152 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
7 0.87902212 196 andrew gelman stats-2010-08-10-The U.S. as welfare state
8 0.87703705 425 andrew gelman stats-2010-11-21-If your comment didn’t get through . . .
9 0.86866933 1399 andrew gelman stats-2012-06-28-Life imitates blog
10 0.86359107 376 andrew gelman stats-2010-10-28-My talk at American University
11 0.85578901 1867 andrew gelman stats-2013-05-22-To Throw Away Data: Plagiarism as a Statistical Crime
12 0.85227311 1 andrew gelman stats-2010-04-22-Political Belief Networks: Socio-cognitive Heterogeneity in American Public Opinion
13 0.84571654 208 andrew gelman stats-2010-08-15-When Does a Name Become Androgynous?
14 0.83477604 742 andrew gelman stats-2011-06-02-Grouponomics, counterfactuals, and opportunity cost
15 0.82683754 132 andrew gelman stats-2010-07-07-Note to “Cigarettes”
16 0.81040907 1556 andrew gelman stats-2012-11-01-Recently in the sister blogs: special pre-election edition!
17 0.80093235 1701 andrew gelman stats-2013-01-31-The name that fell off a cliff
18 0.79457748 955 andrew gelman stats-2011-10-12-Why it doesn’t make sense to chew people out for not reading the help page
19 0.78975475 1806 andrew gelman stats-2013-04-16-My talk in Chicago this Thurs 6:30pm
20 0.78154647 2333 andrew gelman stats-2014-05-13-Personally, I’d rather go with Teragram