hunch_net hunch_net-2005 hunch_net-2005-20 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make
sentIndex sentText sentNum sentScore
1 Luis von Ahn has been running the espgame for awhile now. [sent-1, score-0.519]
2 The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. [sent-2, score-0.874]
3 It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. [sent-3, score-0.842]
4 I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). [sent-4, score-0.843]
5 Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). [sent-5, score-0.878]
6 The dataset is also very ‘natural’, consisting of images spidered from the internet. [sent-6, score-0.717]
7 The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. [sent-7, score-0.589]
8 The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. [sent-8, score-0.571]
9 The open problem here is, of course, to make an internet image labeling program. [sent-9, score-0.532]
10 At a minimum this might be useful for blind people and image search. [sent-10, score-0.526]
11 Solving this problem well seems likely to require new learning methods. [sent-11, score-0.134]
wordName wordTfidf (topN-words)
[('image', 0.364), ('images', 0.309), ('dataset', 0.28), ('espgame', 0.241), ('label', 0.203), ('web', 0.172), ('labels', 0.153), ('bipartite', 0.138), ('funny', 0.138), ('intriguing', 0.138), ('paired', 0.138), ('predicting', 0.137), ('ahn', 0.128), ('consisting', 0.128), ('explore', 0.128), ('metalearning', 0.128), ('red', 0.128), ('characteristic', 0.12), ('rewarding', 0.12), ('variation', 0.115), ('luis', 0.115), ('picture', 0.115), ('von', 0.115), ('asks', 0.11), ('bandwidth', 0.11), ('implied', 0.11), ('natural', 0.104), ('hasn', 0.103), ('randomly', 0.103), ('produced', 0.1), ('organized', 0.098), ('awhile', 0.095), ('labeling', 0.095), ('varies', 0.095), ('agree', 0.093), ('unique', 0.093), ('graph', 0.089), ('managed', 0.087), ('minimum', 0.083), ('blind', 0.079), ('relative', 0.077), ('potentially', 0.076), ('across', 0.074), ('internet', 0.073), ('datasets', 0.071), ('requires', 0.07), ('average', 0.069), ('running', 0.068), ('likely', 0.068), ('require', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 20 hunch net-2005-02-15-ESPgame and image labeling
Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make
2 0.22093433 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
3 0.21841133 159 hunch net-2006-02-27-The Peekaboom Dataset
Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.
4 0.11805701 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.
Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .
5 0.10751364 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an
6 0.10539781 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
7 0.10373066 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
8 0.10323001 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
9 0.10043955 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
10 0.091107845 223 hunch net-2006-12-06-The Spam Problem
11 0.085198298 454 hunch net-2012-01-30-ICML Posters and Scope
12 0.080997214 143 hunch net-2005-12-27-Automated Labeling
13 0.075133994 45 hunch net-2005-03-22-Active learning
14 0.074584983 138 hunch net-2005-12-09-Some NIPS papers
15 0.073563717 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
16 0.06970343 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
17 0.06865447 477 hunch net-2013-01-01-Deep Learning 2012
18 0.068465352 446 hunch net-2011-10-03-Monday announcements
19 0.068445772 70 hunch net-2005-05-12-Math on the Web
20 0.066574261 264 hunch net-2007-09-30-NIPS workshops are out.
topicId topicWeight
[(0, 0.137), (1, 0.024), (2, -0.027), (3, 0.026), (4, 0.022), (5, 0.001), (6, -0.034), (7, 0.017), (8, 0.007), (9, -0.017), (10, -0.134), (11, 0.013), (12, -0.041), (13, 0.031), (14, -0.154), (15, -0.073), (16, -0.033), (17, -0.036), (18, -0.099), (19, 0.166), (20, 0.067), (21, -0.023), (22, 0.081), (23, -0.058), (24, 0.226), (25, 0.08), (26, -0.115), (27, 0.009), (28, -0.073), (29, -0.147), (30, 0.059), (31, 0.033), (32, -0.024), (33, 0.096), (34, -0.055), (35, 0.019), (36, -0.08), (37, -0.037), (38, 0.015), (39, -0.041), (40, -0.015), (41, 0.016), (42, -0.044), (43, 0.081), (44, -0.051), (45, -0.013), (46, -0.037), (47, -0.024), (48, -0.043), (49, 0.031)]
simIndex simValue blogId blogTitle
same-blog 1 0.96879852 20 hunch net-2005-02-15-ESPgame and image labeling
Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make
2 0.91444534 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
3 0.86316544 159 hunch net-2006-02-27-The Peekaboom Dataset
Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.
4 0.71049482 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.
Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .
5 0.47475985 223 hunch net-2006-12-06-The Spam Problem
Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see
6 0.44085628 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
7 0.39828521 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
8 0.38477364 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
9 0.37740812 143 hunch net-2005-12-27-Automated Labeling
10 0.36662728 218 hunch net-2006-11-20-Context and the calculation misperception
11 0.35790482 70 hunch net-2005-05-12-Math on the Web
12 0.34782562 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
13 0.34518516 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
14 0.34001064 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
15 0.33685619 444 hunch net-2011-09-07-KDD and MUCMD 2011
16 0.334993 161 hunch net-2006-03-05-“Structural” Learning
17 0.316641 471 hunch net-2012-08-24-Patterns for research in machine learning
18 0.31558678 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
19 0.30987012 281 hunch net-2007-12-21-Vowpal Wabbit Code Release
20 0.30699468 446 hunch net-2011-10-03-Monday announcements
topicId topicWeight
[(27, 0.124), (53, 0.021), (55, 0.741)]
simIndex simValue blogId blogTitle
1 0.9969278 472 hunch net-2012-08-27-NYAS ML 2012 and ICML 2013
Introduction: The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year. The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing. Edit: Fixed second ICML deadline.
2 0.99306667 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge
Introduction: The results have been posted , with CMU first , Stanford second , and Virginia Tech Third . Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops . Slashdot has a post with many comments.
3 0.99196714 446 hunch net-2011-10-03-Monday announcements
Introduction: Various people want to use hunch.net to announce things. I’ve generally resisted this because I feared hunch becoming a pure announcement zone while I am much more interested contentful posts and discussion personally. Nevertheless there is clearly some value and announcements are easy, so I’m planning to summarize announcements on Mondays. D. Sculley points out an interesting Semisupervised feature learning competition, with a deadline of October 17. Lihong Li points out the webscope user interaction dataset which is the first high quality exploration dataset I’m aware of that is publicly available. Seth Rogers points out CrossValidated which looks similar in conception to metaoptimize , but directly using the stackoverflow interface and with a bit more of a statistics twist.
4 0.98591411 326 hunch net-2008-11-11-COLT CFP
Introduction: Adam Klivans , points out the COLT call for papers . The important points are: Due Feb 13. Montreal, June 18-21. This year, there is author feedback.
5 0.98591411 465 hunch net-2012-05-12-ICML accepted papers and early registration
Introduction: The accepted papers are up in full detail. We are still struggling with the precise program itself, but that’s coming along. Also note the May 13 deadline for early registration and room booking.
same-blog 6 0.98509204 20 hunch net-2005-02-15-ESPgame and image labeling
7 0.98483789 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning
8 0.9803074 448 hunch net-2011-10-24-2011 ML symposium and the bears
9 0.95399445 90 hunch net-2005-07-07-The Limits of Learning Theory
10 0.93762308 331 hunch net-2008-12-12-Summer Conferences
11 0.90141028 270 hunch net-2007-11-02-The Machine Learning Award goes to …
12 0.90012944 387 hunch net-2010-01-19-Deadline Season, 2010
13 0.88060594 395 hunch net-2010-04-26-Compassionate Reviewing
14 0.86505806 453 hunch net-2012-01-28-Why COLT?
15 0.83608317 65 hunch net-2005-05-02-Reviewing techniques for conferences
16 0.8279379 356 hunch net-2009-05-24-2009 ICML discussion site
17 0.81309187 457 hunch net-2012-02-29-Key Scientific Challenges and the Franklin Symposium
18 0.80155271 216 hunch net-2006-11-02-2006 NIPS workshops
19 0.79066855 46 hunch net-2005-03-24-The Role of Workshops
20 0.78448838 71 hunch net-2005-05-14-NIPS