hunch_net hunch_net-2006 hunch_net-2006-159 knowledge-graph by maker-knowledge-mining

159 hunch net-2006-02-27-The Peekaboom Dataset


meta infos for this blog

Source: html

Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Luis von Ahn ‘s Peekaboom project has yielded data (830MB). [sent-1, score-0.636]

2 Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. [sent-2, score-1.413]

3 As a second attempt, it is meant to address all of the shortcomings of the first attempt. [sent-3, score-0.607]

4 In particular: The locations of specific objects are provided by the data. [sent-4, score-0.622]

5 The data collection is far more complete and extensive. [sent-5, score-0.507]

6 There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable. [sent-11, score-1.272]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('peekaboom', 0.344), ('file', 0.31), ('image', 0.245), ('vision', 0.223), ('specific', 0.204), ('attempt', 0.201), ('data', 0.19), ('shortcomings', 0.186), ('voluntary', 0.186), ('yielded', 0.186), ('per', 0.173), ('ahn', 0.172), ('espgame', 0.162), ('locations', 0.162), ('luis', 0.155), ('von', 0.155), ('consists', 0.155), ('second', 0.146), ('collection', 0.139), ('solve', 0.13), ('objects', 0.128), ('provided', 0.128), ('meant', 0.12), ('game', 0.116), ('description', 0.114), ('address', 0.11), ('event', 0.107), ('complete', 0.105), ('project', 0.105), ('produce', 0.103), ('dataset', 0.095), ('short', 0.088), ('problems', 0.081), ('source', 0.081), ('relevant', 0.081), ('help', 0.075), ('deal', 0.075), ('far', 0.073), ('hope', 0.067), ('reasonable', 0.061), ('based', 0.061), ('useful', 0.06), ('particular', 0.059), ('great', 0.057), ('quite', 0.051), ('first', 0.045), ('seems', 0.038), ('learning', 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 159 hunch net-2006-02-27-The Peekaboom Dataset

Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.

2 0.33896834 99 hunch net-2005-08-01-Peekaboom

Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.

3 0.25318003 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.

Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .

4 0.21841133 20 hunch net-2005-02-15-ESPgame and image labeling

Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make

5 0.079695165 260 hunch net-2007-08-25-The Privacy Problem

Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings

6 0.078907199 287 hunch net-2008-01-28-Sufficient Computation

7 0.076687433 168 hunch net-2006-04-02-Mad (Neuro)science

8 0.075616181 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

9 0.074571773 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

10 0.07032714 12 hunch net-2005-02-03-Learning Theory, by assumption

11 0.069833413 223 hunch net-2006-12-06-The Spam Problem

12 0.067419283 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

13 0.066849001 477 hunch net-2013-01-01-Deep Learning 2012

14 0.065262578 160 hunch net-2006-03-02-Why do people count for learning?

15 0.064321622 444 hunch net-2011-09-07-KDD and MUCMD 2011

16 0.063507073 276 hunch net-2007-12-10-Learning Track of International Planning Competition

17 0.062309697 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

18 0.061973207 77 hunch net-2005-05-29-Maximum Margin Mismatch?

19 0.06191003 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

20 0.061607391 68 hunch net-2005-05-10-Learning Reductions are Reductionist


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.114), (1, 0.027), (2, -0.066), (3, 0.027), (4, 0.044), (5, -0.004), (6, -0.046), (7, 0.032), (8, 0.028), (9, -0.075), (10, -0.137), (11, -0.038), (12, -0.095), (13, -0.002), (14, -0.18), (15, -0.14), (16, -0.04), (17, -0.107), (18, -0.077), (19, 0.126), (20, 0.13), (21, -0.038), (22, 0.051), (23, -0.068), (24, 0.241), (25, 0.135), (26, -0.155), (27, 0.037), (28, 0.043), (29, -0.163), (30, 0.064), (31, 0.039), (32, -0.127), (33, 0.174), (34, -0.08), (35, 0.03), (36, -0.065), (37, -0.069), (38, 0.045), (39, -0.035), (40, -0.028), (41, 0.105), (42, -0.066), (43, 0.078), (44, -0.119), (45, -0.033), (46, -0.067), (47, 0.02), (48, 0.009), (49, 0.081)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97374582 159 hunch net-2006-02-27-The Peekaboom Dataset

Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.

2 0.95887411 99 hunch net-2005-08-01-Peekaboom

Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.

3 0.83643842 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.

Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .

4 0.82825679 20 hunch net-2005-02-15-ESPgame and image labeling

Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make

5 0.44967982 223 hunch net-2006-12-06-The Spam Problem

Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see

6 0.40344912 408 hunch net-2010-08-24-Alex Smola starts a blog

7 0.33961949 444 hunch net-2011-09-07-KDD and MUCMD 2011

8 0.33100209 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

9 0.3265385 349 hunch net-2009-04-21-Interesting Presentations at Snowbird

10 0.31716594 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

11 0.30866161 260 hunch net-2007-08-25-The Privacy Problem

12 0.29862031 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

13 0.27505541 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

14 0.27283946 218 hunch net-2006-11-20-Context and the calculation misperception

15 0.26759017 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition

16 0.26753205 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

17 0.26645949 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

18 0.25993678 143 hunch net-2005-12-27-Automated Labeling

19 0.2584914 471 hunch net-2012-08-24-Patterns for research in machine learning

20 0.25656423 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.047), (53, 0.108), (55, 0.305), (60, 0.325), (95, 0.082)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95453686 159 hunch net-2006-02-27-The Peekaboom Dataset

Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.

2 0.81543106 198 hunch net-2006-07-25-Upcoming conference

Introduction: The Workshop for Women in Machine Learning will be held in San Diego on October 4, 2006. For details see the workshop website: http://www.seas.upenn.edu/~wiml/

3 0.68662667 331 hunch net-2008-12-12-Summer Conferences

Introduction: Here’s a handy table for the summer conferences. Conference Deadline Reviewer Targeting Double Blind Author Feedback Location Date ICML ( wrong ICML ) January 26 Yes Yes Yes Montreal, Canada June 14-17 COLT February 13 No No Yes Montreal June 19-21 UAI March 13 No Yes No Montreal June 19-21 KDD February 2/6 No No No Paris, France June 28-July 1 Reviewer targeting is new this year. The idea is that many poor decisions happen because the papers go to reviewers who are unqualified, and the hope is that allowing authors to point out who is qualified results in better decisions. In my experience, this is a reasonable idea to test. Both UAI and COLT are experimenting this year as well with double blind and author feedback, respectively. Of the two, I believe author feedback is more important, as I’ve seen it make a difference. However, I still consider double blind reviewing a net wi

4 0.67254299 453 hunch net-2012-01-28-Why COLT?

Introduction: By Shie and Nati Following John’s advertisement for submitting to ICML, we thought it appropriate to highlight the advantages of COLT, and the reasons it is often the best place for theory papers. We would like to emphasize that we both respect ICML, and are active in ICML, both as authors and as area chairs, and certainly are not arguing that ICML is a bad place for your papers. For many papers, ICML is the best venue. But for many theory papers, COLT is a better and more appropriate place. Why should you submit to COLT? By-and-large, theory papers go to COLT. This is the tradition of the field and most theory papers are sent to COLT. This is the place to present your ground-breaking theorems and new models that will shape the theory of machine learning. COLT is more focused then ICML with a single track session. Unlike ICML, the norm in COLT is for people to sit through most sessions, and hear most of the talks presented. There is also often a lively discussion followi

5 0.66869271 472 hunch net-2012-08-27-NYAS ML 2012 and ICML 2013

Introduction: The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year. The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing. Edit: Fixed second ICML deadline.

6 0.66196561 90 hunch net-2005-07-07-The Limits of Learning Theory

7 0.66107959 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge

8 0.6590628 20 hunch net-2005-02-15-ESPgame and image labeling

9 0.6573779 448 hunch net-2011-10-24-2011 ML symposium and the bears

10 0.65311003 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

11 0.65284073 446 hunch net-2011-10-03-Monday announcements

12 0.6513021 326 hunch net-2008-11-11-COLT CFP

13 0.6513021 465 hunch net-2012-05-12-ICML accepted papers and early registration

14 0.64904571 387 hunch net-2010-01-19-Deadline Season, 2010

15 0.62582111 422 hunch net-2011-01-16-2011 Summer Conference Deadline Season

16 0.62261248 270 hunch net-2007-11-02-The Machine Learning Award goes to …

17 0.61761093 395 hunch net-2010-04-26-Compassionate Reviewing

18 0.61487532 226 hunch net-2007-01-04-2007 Summer Machine Learning Conferences

19 0.60852158 356 hunch net-2009-05-24-2009 ICML discussion site

20 0.60381609 46 hunch net-2005-03-24-The Role of Workshops