andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-118 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen
sentIndex sentText sentNum sentScore
1 StackOverflow has been a popular community where software developers would help one another. [sent-1, score-0.223]
2 Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. [sent-2, score-0.49]
3 Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. [sent-3, score-0.138]
4 Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. [sent-4, score-0.94]
5 Here you can ask and answer questions, comment and vote for the questions of others and their answers. [sent-5, score-0.479]
6 Both questions and answers can be revised and improved. [sent-6, score-0.481]
7 Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. [sent-7, score-0.583]
8 If you work very hard on your questions and answers, you will receive badges like “Guru”, “Student” or “Good answer”. [sent-8, score-0.388]
9 In return, well-meaning question answerers will be helping feed Google and numerous other companies with good information they will offer the public along with sponsored information that someone is paying for. [sent-10, score-0.582]
10 I’ll join the party myself when they introduce the “Rent,” “Mortgage Payment,” “Medical Bill”, and “Grocery” badges. [sent-11, score-0.165]
11 Until then, I’ll be spending time and money, and someone else will be saving time and earning money. [sent-12, score-0.199]
12 [9:15pm: Included Ryan Shaw's correction to my post, pointing out that MetaOptimize is based on OSQA and not on the StackOverflow platform. [sent-14, score-0.082]
13 ] [D+1, 7:30am: Igor Carron points to an initiative that's actually based on the StackOverflow. [sent-15, score-0.098]
wordName wordTfidf (topN-words)
[('metaoptimize', 0.283), ('osqa', 0.283), ('questions', 0.259), ('stackoverflow', 0.258), ('machine', 0.138), ('answers', 0.136), ('community', 0.13), ('accumulated', 0.129), ('badges', 0.129), ('geeks', 0.129), ('llc', 0.129), ('retrieval', 0.129), ('tagged', 0.129), ('vc', 0.129), ('answer', 0.122), ('carron', 0.121), ('grocery', 0.121), ('igor', 0.121), ('guru', 0.116), ('keywords', 0.112), ('profits', 0.112), ('simplify', 0.112), ('sponsored', 0.112), ('shaw', 0.109), ('postings', 0.109), ('payment', 0.106), ('rent', 0.104), ('information', 0.102), ('organize', 0.101), ('earning', 0.101), ('expanding', 0.1), ('ask', 0.098), ('initiative', 0.098), ('saving', 0.098), ('mortgage', 0.096), ('feed', 0.095), ('developers', 0.093), ('artificial', 0.092), ('numerous', 0.091), ('ryan', 0.089), ('selling', 0.089), ('mining', 0.088), ('revised', 0.086), ('introduce', 0.084), ('intelligence', 0.084), ('correction', 0.082), ('processing', 0.081), ('join', 0.081), ('helping', 0.08), ('funding', 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 118 andrew gelman stats-2010-06-30-Question & Answer Communities
Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen
2 0.2367544 223 andrew gelman stats-2010-08-21-Statoverflow
Introduction: Skirant Vadali writes: I am writing to seek your help in building a community driven Q&A; website tentatively called called ‘Statistics Analysis’. I am neither a founder of this website nor do I have any financial stake in its success. By way of background to this website, please see Stackoverflow (http://stackoverflow.com/) and Mathoverflow (http://mathoverflow.net/). Stackoverflow is a Q&A; website targeted at software developers and is designed to help them ask questions and get answers from other developers. Mathoverflow is a Q&A; website targeted at research mathematicians and is designed to help them ask and answer questions from other mathematicians across the world. The success of both these sites in helping their respective communities is a strong indicator that sites designed along these lines are very useful. The company that runs Stackoverflow (who also host Mathoverflow.net) has recently decided to develop other community driven websites for various other topic are
3 0.11476108 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys
Introduction: We had 28 class periods, so I wrote an exam with an approximate correspondence of one question per class. Rather than dumping the exam in your lap all at once, I’ll post the questions once per day. Then each day I’ll post the answer to yesterday’s questions. So it will be 29 days in all. I’ll post them to appear late in the day so as not to interfere with our main daily posts (which are currently backed up to early June). The course was offered in the political science department and covered a mix of statistical and political topics. Followers of our recent discussion on test questions won’t be surprised to learn that some of the questions are ambiguous. This wasn’t on purpose. I tried my best, but good questions are hard to write. Question 1 will appear tomorrow.
4 0.098929152 505 andrew gelman stats-2011-01-05-Wacky interview questions: An exploration into the nature of evidence on the internet
Introduction: Gayle Laackmann reports ( link from Felix Salmon) that Microsoft, Google, etc. don’t actually ask brain-teasers in their job interviews. The actually ask a lot of questions about programming. (I looked here and was relieved to see that the questions aren’t very hard. I could probably get a job as an entry-level programmer if I needed to.) Laackmann writes: Let’s look at the very widely circulated “15 Google Interview Questions that will make you feel stupid” list [ here's the original list , I think, from Lewis Lin] . . . these questions are fake. Fake fake fake. How can you tell that they’re fake? Because one of them is “Why are manhole covers round?” This is an infamous Microsoft interview question that has since been so very, very banned at both companies . I find it very hard to believe that a Google interviewer asked such a question. We’ll get back to the manhole question in a bit. Lacakmann reports that she never saw any IQ tests in three years of interviewi
5 0.096435383 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”
Introduction: Following up on our previous post , Andrew Wilson writes: I agree we are in a really exciting time for statistics and machine learning. There has been a lot of talk lately comparing machine learning with statistics. I am curious whether you think there are many fundamental differences between the fields, or just superficial differences — different popular approximate inference methods, slightly different popular application areas, etc. Is machine learning a subset of statistics? In the paper we discuss how we think machine learning is fundamentally about pattern discovery, and ultimately, fully automating the learning and decision making process. In other words, whatever a human does when he or she uses tools to analyze data, can be written down algorithmically and automated on a computer. I am not sure if the ambitions are similar in statistics — and I don’t have any conventional statistics background, which makes it harder to tell. I think it’s an interesting discussion.
7 0.092730798 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys
8 0.092306904 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
9 0.090499192 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
10 0.090378568 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley
11 0.082332321 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!
12 0.079098552 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests
13 0.078044578 1630 andrew gelman stats-2012-12-18-Postdoc positions at Microsoft Research – NYC
14 0.076895051 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
15 0.07403896 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions
16 0.073367134 2270 andrew gelman stats-2014-03-28-Creating a Lenin-style democracy
17 0.070467234 645 andrew gelman stats-2011-04-04-Do you have any idea what you’re talking about?
18 0.069930397 1297 andrew gelman stats-2012-05-03-New New York data research organizations
19 0.069722831 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
20 0.069533803 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions
topicId topicWeight
[(0, 0.127), (1, -0.02), (2, -0.016), (3, 0.035), (4, 0.033), (5, 0.082), (6, -0.037), (7, -0.019), (8, -0.016), (9, -0.0), (10, 0.003), (11, -0.006), (12, 0.043), (13, -0.004), (14, -0.074), (15, 0.013), (16, 0.027), (17, -0.042), (18, 0.008), (19, 0.016), (20, 0.009), (21, -0.019), (22, 0.021), (23, 0.006), (24, -0.013), (25, 0.051), (26, 0.047), (27, -0.043), (28, 0.005), (29, -0.006), (30, 0.015), (31, -0.077), (32, 0.036), (33, 0.036), (34, -0.007), (35, 0.042), (36, 0.006), (37, -0.012), (38, 0.008), (39, -0.021), (40, 0.039), (41, -0.017), (42, -0.018), (43, 0.086), (44, -0.036), (45, 0.062), (46, 0.022), (47, -0.021), (48, 0.016), (49, -0.02)]
simIndex simValue blogId blogTitle
same-blog 1 0.95565283 118 andrew gelman stats-2010-06-30-Question & Answer Communities
Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen
2 0.75408757 223 andrew gelman stats-2010-08-21-Statoverflow
Introduction: Skirant Vadali writes: I am writing to seek your help in building a community driven Q&A; website tentatively called called ‘Statistics Analysis’. I am neither a founder of this website nor do I have any financial stake in its success. By way of background to this website, please see Stackoverflow (http://stackoverflow.com/) and Mathoverflow (http://mathoverflow.net/). Stackoverflow is a Q&A; website targeted at software developers and is designed to help them ask questions and get answers from other developers. Mathoverflow is a Q&A; website targeted at research mathematicians and is designed to help them ask and answer questions from other mathematicians across the world. The success of both these sites in helping their respective communities is a strong indicator that sites designed along these lines are very useful. The company that runs Stackoverflow (who also host Mathoverflow.net) has recently decided to develop other community driven websites for various other topic are
3 0.70528209 1434 andrew gelman stats-2012-07-29-FindTheData.org
Introduction: I received the following (unsolicited) email: Hi Andrew, I work on the business development team of FindTheData.org, an unbiased comparison engine founded by Kevin O’Connor (founder and former CEO of DoubleClick) and backed by Kleiner Perkins with ~10M unique visitors per month. We are working with large online publishers including Golf Digest, Huffington Post, Under30CEO, and offer a variety of options to integrate our highly engaging content with your site. I believe our un-biased and reliable data resources would be of interest to you and your readers. I’d like to set up a quick call to discuss similar partnership ideas with you and would greatly appreciate 10 minutes of your time. Please suggest a couple times that work best for you or let me know if you would like me to send some more information before you make time for a call. Looking forward to hearing from you, Jonny – JONNY KINTZELE Business Development, FindThe Data mobile: 619-307-097
4 0.68343246 999 andrew gelman stats-2011-11-09-I was at a meeting a couple months ago . . .
Introduction: . . . and I decided to amuse myself by writing down all the management-speak words I heard: “grappling” “early prototypes” “technology platform” “building block” “machine learning” “your team” “workspace” “tagging” “data exhaust” “monitoring a particular population” “collective intelligence” “communities of practice” “hackathon” “human resources . . . technologies” Any one or two or three of these phrases might be fine, but put them all together and what you have is a festival of jargon. A hackathon, indeed.
5 0.67578673 1923 andrew gelman stats-2013-07-03-Bayes pays!
Introduction: Jason Rosenfeld, who has the amazing title of “Manager of Basketball Analytics” at the Charlotte Bobcats, announces the following jobs : Basketball Operations: Statistics Basketball Operations Systems Developer – Charlotte Bobcats (Charlotte, NC) POSITION OVERVIEW The Basketball Operations System Developer will collect and import data to our database, check data, and field requests from the Basketball Operations staff. This position will be instrumental in molding and improving our database to assist the staff in player personnel and coaching efforts. ESSENTIAL DUTIES AND RESPONSIBILITIES • Respond to data and database requests from the front office. • Build user-friendly software tools for use by the basketball operations staff. • Accumulate data from various sources to input and organize into our system to assist the basketball operations staff with decisions. • Check and clean data for accuracy and import to our database. • Provide ideas and play a key ro
6 0.67031258 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
7 0.66035128 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data
8 0.65241045 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course
9 0.65140378 752 andrew gelman stats-2011-06-08-Traffic Prediction
10 0.64745164 1297 andrew gelman stats-2012-05-03-New New York data research organizations
11 0.64526415 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!
12 0.63880891 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
13 0.63726914 1777 andrew gelman stats-2013-03-26-Data Science for Social Good summer fellowship program
14 0.63655448 927 andrew gelman stats-2011-09-26-R and Google Visualization
15 0.62750959 569 andrew gelman stats-2011-02-12-Get the Data
16 0.62513632 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst
17 0.6248368 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress
18 0.62482882 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data
19 0.62307423 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
20 0.61534488 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
topicId topicWeight
[(2, 0.012), (9, 0.039), (15, 0.018), (16, 0.029), (21, 0.017), (24, 0.146), (34, 0.018), (35, 0.011), (42, 0.014), (49, 0.012), (53, 0.02), (59, 0.012), (64, 0.228), (71, 0.011), (76, 0.012), (77, 0.026), (86, 0.035), (88, 0.011), (89, 0.02), (99, 0.215)]
simIndex simValue blogId blogTitle
same-blog 1 0.89916921 118 andrew gelman stats-2010-06-30-Question & Answer Communities
Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen
2 0.89304423 985 andrew gelman stats-2011-11-01-Doug Schoen has 2 poll reports
Introduction: According to Chris Wilson , there are two versions of the report of the Occupy Wall Street poll from so-called hack pollster Doug Schoen. Here’s the report that Azi Paybarah says that Schoen sent to him, and here’s the final question from the poll: And here’s what’s on Schoen’s own website: Very similar, except for that last phrase, “no matter what the cost.” I have no idea which was actually asked to the survey participants, but it’s a reminder of the difficulties of public opinion research—sometimes you don’t even know what question was asked! I’m not implying anything sinister on Schoen’s part, it’s just interesting to see these two documents floating around. P.S. More here from Kaiser Fung on fundamental flaws with Schoen’s poll.
3 0.88442147 1653 andrew gelman stats-2013-01-04-Census dotmap
Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.
4 0.86889827 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks
Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t
5 0.86815131 1109 andrew gelman stats-2012-01-09-Google correlate links statistics with minorities
Introduction: John Eppley asks what I make of this : Eppley is guessing the negative spikes are searches getting swamped by holiday season shoppers.
6 0.84219062 595 andrew gelman stats-2011-02-28-What Zombies see in Scatterplots
7 0.83588302 724 andrew gelman stats-2011-05-21-New search engine for data & statistics
8 0.83371723 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves
9 0.81397045 11 andrew gelman stats-2010-04-29-Auto-Gladwell, or Can fractals be used to predict human history?
10 0.80669576 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
13 0.76438749 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions
14 0.76167226 304 andrew gelman stats-2010-09-29-Data visualization marathon
18 0.74440467 899 andrew gelman stats-2011-09-10-The statistical significance filter
19 0.74390566 970 andrew gelman stats-2011-10-24-Bell Labs
20 0.74369347 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards