hunch_net hunch_net-2005 hunch_net-2005-99 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
sentIndex sentText sentNum sentScore
1 Luis has released Peekaboom a successor to ESPgame ( game site ). [sent-1, score-0.747]
2 The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. [sent-2, score-1.779]
3 Peekaboom gathers more detailed, and perhaps more useful, data about vision. [sent-3, score-0.122]
4 For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. [sent-4, score-1.175]
5 For Peekaboom, the location of the subimage generating the label is revealed by the game as well. [sent-5, score-1.035]
6 Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. [sent-6, score-0.874]
7 There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon. [sent-7, score-1.097]
wordName wordTfidf (topN-words)
[('game', 0.558), ('peekaboom', 0.415), ('espgame', 0.391), ('mutually', 0.149), ('label', 0.147), ('byproduct', 0.138), ('revealed', 0.138), ('gather', 0.13), ('luis', 0.124), ('agreed', 0.124), ('portion', 0.115), ('recognize', 0.112), ('feasible', 0.109), ('playing', 0.109), ('detailed', 0.106), ('released', 0.101), ('generating', 0.101), ('play', 0.101), ('image', 0.099), ('soon', 0.099), ('actions', 0.097), ('location', 0.091), ('purpose', 0.088), ('site', 0.088), ('hopefully', 0.086), ('appropriate', 0.083), ('labels', 0.083), ('upon', 0.079), ('data', 0.076), ('dataset', 0.076), ('knowledge', 0.073), ('available', 0.065), ('helpful', 0.062), ('ll', 0.059), ('related', 0.057), ('similar', 0.056), ('solving', 0.055), ('isn', 0.053), ('significant', 0.049), ('learn', 0.049), ('useful', 0.048), ('yet', 0.048), ('perhaps', 0.046), ('people', 0.046), ('common', 0.044), ('given', 0.041), ('using', 0.037), ('number', 0.036), ('work', 0.032), ('may', 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
2 0.40266708 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.
Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .
3 0.33896834 159 hunch net-2006-02-27-The Peekaboom Dataset
Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.
4 0.22093433 20 hunch net-2005-02-15-ESPgame and image labeling
Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make
5 0.090189219 456 hunch net-2012-02-24-ICML+50%
Introduction: The ICML paper deadline has passed. Joelle and I were surprised to see the number of submissions jump from last year by about 50% to around 900 submissions. A tiny portion of these are immediate rejects(*), so this is a much larger set of papers than expected. The number of workshop submissions also doubled compared to last year, so ICML may grow significantly this year, if we can manage to handle the load well. The prospect of making 900 good decisions is fundamentally daunting, and success will rely heavily on the program committee and area chairs at this point. For those who want to rubberneck a bit more, here’s a breakdown of submissions by primary topic of submitted papers: 66 Reinforcement Learning 52 Supervised Learning 51 Clustering 46 Kernel Methods 40 Optimization Algorithms 39 Feature Selection and Dimensionality Reduction 33 Learning Theory 33 Graphical Models 33 Applications 29 Probabilistic Models 29 NN & Deep Learning 26 Transfer and Multi-Ta
6 0.07844907 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem
7 0.074372523 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
8 0.054156166 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
9 0.051465306 48 hunch net-2005-03-29-Academic Mechanism Design
10 0.050528891 345 hunch net-2009-03-08-Prediction Science
11 0.049947504 183 hunch net-2006-06-14-Explorations of Exploration
12 0.047624797 218 hunch net-2006-11-20-Context and the calculation misperception
13 0.045925304 207 hunch net-2006-09-12-Incentive Compatible Reviewing
14 0.045478702 45 hunch net-2005-03-22-Active learning
15 0.045303807 200 hunch net-2006-08-03-AOL’s data drop
16 0.044400591 220 hunch net-2006-11-27-Continuizing Solutions
17 0.044136669 306 hunch net-2008-07-02-Proprietary Data in Academic Research?
18 0.043419283 315 hunch net-2008-09-03-Bidding Problems
19 0.042837854 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
20 0.041014139 353 hunch net-2009-05-08-Computability in Artificial Intelligence
topicId topicWeight
[(0, 0.091), (1, 0.017), (2, -0.03), (3, 0.024), (4, 0.035), (5, -0.005), (6, -0.027), (7, 0.039), (8, 0.022), (9, -0.063), (10, -0.107), (11, -0.027), (12, -0.065), (13, 0.006), (14, -0.206), (15, -0.143), (16, -0.034), (17, -0.104), (18, -0.063), (19, 0.157), (20, 0.143), (21, -0.096), (22, 0.058), (23, -0.045), (24, 0.27), (25, 0.149), (26, -0.195), (27, 0.023), (28, 0.021), (29, -0.189), (30, 0.063), (31, 0.018), (32, -0.121), (33, 0.129), (34, -0.069), (35, 0.062), (36, -0.113), (37, -0.047), (38, 0.042), (39, -0.051), (40, -0.056), (41, 0.064), (42, -0.095), (43, 0.065), (44, -0.129), (45, -0.01), (46, -0.042), (47, -0.042), (48, 0.052), (49, 0.102)]
simIndex simValue blogId blogTitle
same-blog 1 0.98712277 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
2 0.91315466 159 hunch net-2006-02-27-The Peekaboom Dataset
Introduction: Luis von Ahn ‘s Peekaboom project has yielded data (830MB). Peekaboom is the second attempt (after Espgame ) to produce a dataset which is useful for learning to solve vision problems based on voluntary game play. As a second attempt, it is meant to address all of the shortcomings of the first attempt. In particular: The locations of specific objects are provided by the data. The data collection is far more complete and extensive. The data consists of: The source images. (1 file per image, just short of 60K images.) The in-game events. (1 file per image, in a lispy syntax.) A description of the event language. There is a great deal of very specific and relevant data here so the hope that this will help solve vision problems seems quite reasonable.
3 0.86243057 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.
Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .
4 0.81400859 20 hunch net-2005-02-15-ESPgame and image labeling
Introduction: Luis von Ahn has been running the espgame for awhile now. The espgame provides a picture to two randomly paired people across the web, and asks them to agree on a label. It hasn’t managed to label the web yet, but it has produced a large dataset of (image, label) pairs. I organized the dataset so you could explore the implied bipartite graph (requires much bandwidth). Relative to other image datasets, this one is quite large—67000 images, 358,000 labels (average of 5/image with variation from 1 to 19), and 22,000 unique labels (one every 3 images). The dataset is also very ‘natural’, consisting of images spidered from the internet. The multiple label characteristic is intriguing because ‘learning to learn’ and metalearning techniques may be applicable. The ‘natural’ quality means that this dataset varies greatly in difficulty from easy (predicting “red”) to hard (predicting “funny”) and potentially more rewarding to tackle. The open problem here is, of course, to make
5 0.36048588 223 hunch net-2006-12-06-The Spam Problem
Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see
6 0.33694449 408 hunch net-2010-08-24-Alex Smola starts a blog
7 0.30116546 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
8 0.27004749 349 hunch net-2009-04-21-Interesting Presentations at Snowbird
9 0.26092187 287 hunch net-2008-01-28-Sufficient Computation
10 0.25373641 444 hunch net-2011-09-07-KDD and MUCMD 2011
11 0.25032291 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
12 0.24680465 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition
13 0.24378292 218 hunch net-2006-11-20-Context and the calculation misperception
14 0.2387816 260 hunch net-2007-08-25-The Privacy Problem
15 0.23406124 143 hunch net-2005-12-27-Automated Labeling
16 0.22829974 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
17 0.22752458 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
18 0.22279601 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop
19 0.2226343 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
20 0.22237085 487 hunch net-2013-07-24-ICML 2012 videos lost
topicId topicWeight
[(27, 0.145), (53, 0.095), (55, 0.134), (63, 0.4), (77, 0.051), (95, 0.016)]
simIndex simValue blogId blogTitle
same-blog 1 0.92987812 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
2 0.75309515 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
Introduction: Say we have two random variables X,Y with mutual information I(X,Y) . Let’s say we want to represent them with a bayes net of the form X< -M->Y , such that the entropy of M equals the mutual information, i.e. H(M)=I(X,Y) . Intuitively, we would like our hidden state to be as simple as possible (entropy wise). The data processing inequality means that H(M)>=I(X,Y) , so the mutual information is a lower bound on how simple the M could be. Furthermore, if such a construction existed it would have a nice coding interpretation — one could jointly code X and Y by first coding the mutual information, then coding X with this mutual info (without Y ) and coding Y with this mutual info (without X ). It turns out that such a construction does not exist in general (Thx Alina Beygelzimer for a counterexample! see below for the sketch). What are the implications of this? Well, it’s hard for me to say, but it does suggest to me that the ‘generative’ model philosophy might be
3 0.73051667 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
4 0.66929942 260 hunch net-2007-08-25-The Privacy Problem
Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings
5 0.45492527 452 hunch net-2012-01-04-Why ICML? and the summer conferences
Introduction: Here’s a quick reference for summer ML-related conferences sorted by due date: Conference Due date Location Reviewing KDD Feb 10 August 12-16, Beijing, China Single Blind COLT Feb 14 June 25-June 27, Edinburgh, Scotland Single Blind? (historically) ICML Feb 24 June 26-July 1, Edinburgh, Scotland Double Blind, author response, zero SPOF UAI March 30 August 15-17, Catalina Islands, California Double Blind, author response Geographically, this is greatly dispersed and the UAI/KDD conflict is unfortunate. Machine Learning conferences are triannual now, between NIPS , AIStat , and ICML . This has not always been the case: the academic default is annual summer conferences, then NIPS started with a December conference, and now AIStat has grown into an April conference. However, the first claim is not quite correct. NIPS and AIStat have few competing venues while ICML implicitly competes with many other conf
6 0.45421785 116 hunch net-2005-09-30-Research in conferences
7 0.44562209 151 hunch net-2006-01-25-1 year
8 0.44439119 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions
9 0.44251716 437 hunch net-2011-07-10-ICML 2011 and the future
10 0.44166026 395 hunch net-2010-04-26-Compassionate Reviewing
11 0.44040033 225 hunch net-2007-01-02-Retrospective
12 0.4403272 270 hunch net-2007-11-02-The Machine Learning Award goes to …
13 0.4367294 484 hunch net-2013-06-16-Representative Reviewing
14 0.43611488 466 hunch net-2012-06-05-ICML acceptance statistics
15 0.43512946 89 hunch net-2005-07-04-The Health of COLT
16 0.43472424 375 hunch net-2009-10-26-NIPS workshops
17 0.43217781 40 hunch net-2005-03-13-Avoiding Bad Reviewing
18 0.43213817 454 hunch net-2012-01-30-ICML Posters and Scope
19 0.4267385 403 hunch net-2010-07-18-ICML & COLT 2010
20 0.4258385 77 hunch net-2005-05-29-Maximum Margin Mismatch?