hunch_net hunch_net-2009 hunch_net-2009-367 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the
sentIndex sentText sentNum sentScore
1 Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. [sent-1, score-0.577]
2 When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . [sent-2, score-0.381]
3 Some prominent fallacies are: Costing money fallacy. [sent-3, score-0.129]
4 Some commenters appear to believe the system charges money per email. [sent-4, score-0.521]
5 Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. [sent-5, score-0.724]
6 The solution to this fallacy is simply reading the details . [sent-6, score-0.645]
7 Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. [sent-8, score-0.839]
8 But a complete solution is not at all necessary or even possible given the flag-day problem . [sent-9, score-0.354]
9 Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. [sent-10, score-0.597]
10 The solution to this fallacy is learning about machine learning. [sent-11, score-0.552]
11 In the current state of affairs, informed comment about spam fighting without knowing machine learning is difficult to imagine. [sent-12, score-0.68]
12 Some commenters seem to think that stamps can be reused arbitrarily on emails. [sent-14, score-0.464]
13 The solution to this fallacy is simply checking the details and possibly learning about cryptographics hashes. [sent-16, score-0.793]
14 Dan Reeves made a very detailed FAQ trying to address all the failure modes seen in comments, and there is a bit more discussion at messy matters . [sent-17, score-0.432]
15 My personal opinion is that Centmail is an interesting idea that might work, avoids the failure modes of many other ideas, hasn’t failed yet, and hence is worth trying. [sent-18, score-0.591]
16 It’s a better approach than my earlier thoughts on economic solutions to spam . [sent-19, score-0.465]
wordName wordTfidf (topN-words)
[('fallacy', 0.317), ('spam', 0.316), ('commenters', 0.294), ('centmail', 0.238), ('solution', 0.235), ('fighting', 0.211), ('donations', 0.211), ('charity', 0.196), ('modes', 0.169), ('money', 0.129), ('complete', 0.119), ('comments', 0.11), ('failure', 0.109), ('costing', 0.106), ('affairs', 0.106), ('opt', 0.106), ('deployed', 0.106), ('reeves', 0.106), ('appear', 0.098), ('ignores', 0.098), ('details', 0.093), ('strictly', 0.092), ('thoughtful', 0.088), ('scheme', 0.088), ('proposed', 0.085), ('crypto', 0.085), ('existence', 0.085), ('slashdot', 0.085), ('failed', 0.085), ('reused', 0.085), ('arbitrarily', 0.085), ('dan', 0.085), ('participation', 0.082), ('informed', 0.082), ('secondary', 0.082), ('hasn', 0.079), ('opinion', 0.079), ('matters', 0.079), ('economic', 0.079), ('avoids', 0.077), ('checking', 0.077), ('detailed', 0.075), ('participate', 0.075), ('broken', 0.073), ('idea', 0.072), ('possibly', 0.071), ('knowing', 0.071), ('users', 0.071), ('thoughts', 0.07), ('partial', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 367 hunch net-2009-08-16-Centmail comments
Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the
2 0.27777895 223 hunch net-2006-12-06-The Spam Problem
Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see
3 0.15044774 401 hunch net-2010-06-20-2010 ICML discussion site
Introduction: A substantial difficulty with the 2009 and 2008 ICML discussion system was a communication vacuum, where authors were not informed of comments, and commenters were not informed of responses to their comments without explicit monitoring. Mark Reid has setup a new discussion system for 2010 with the goal of addressing this. Mark didn’t want to make it to intrusive, so you must opt-in. As an author, find your paper and “Subscribe by email” to the comments. As a commenter, you have the option of providing an email for follow-up notification.
4 0.13099357 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
5 0.12849604 429 hunch net-2011-04-06-COLT open questions
Introduction: Alina and Jake point out the COLT Call for Open Questions due May 11. In general, this is cool, and worth doing if you can come up with a crisp question. In my case, I particularly enjoyed crafting an open question with precisely a form such that a critic targeting my papers would be forced to confront their fallacy or make a case for the reward. But less esoterically, this is a way to get the attention of some very smart people focused on a problem that really matters, which is the real value.
6 0.11230032 25 hunch net-2005-02-20-At One Month
7 0.098745972 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
8 0.079770789 370 hunch net-2009-09-18-Necessary and Sufficient Research
9 0.072594248 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
10 0.070876457 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
11 0.070191406 484 hunch net-2013-06-16-Representative Reviewing
12 0.066933125 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge
13 0.064919755 116 hunch net-2005-09-30-Research in conferences
14 0.063788667 380 hunch net-2009-11-29-AI Safety
15 0.063651145 344 hunch net-2009-02-22-Effective Research Funding
16 0.06213278 288 hunch net-2008-02-10-Complexity Illness
17 0.061435204 351 hunch net-2009-05-02-Wielding a New Abstraction
18 0.060912095 98 hunch net-2005-07-27-Not goal metrics
19 0.057841346 29 hunch net-2005-02-25-Solution: Reinforcement Learning with Classification
20 0.05746635 358 hunch net-2009-06-01-Multitask Poisoning
topicId topicWeight
[(0, 0.139), (1, -0.024), (2, -0.018), (3, 0.089), (4, -0.037), (5, 0.01), (6, 0.015), (7, -0.021), (8, -0.006), (9, -0.0), (10, -0.063), (11, -0.009), (12, -0.056), (13, 0.075), (14, 0.02), (15, 0.018), (16, -0.102), (17, -0.072), (18, 0.013), (19, 0.144), (20, -0.113), (21, 0.033), (22, -0.065), (23, -0.028), (24, 0.01), (25, -0.007), (26, 0.07), (27, 0.038), (28, 0.053), (29, -0.038), (30, 0.028), (31, -0.007), (32, -0.062), (33, 0.025), (34, 0.046), (35, 0.04), (36, 0.073), (37, -0.095), (38, 0.011), (39, -0.046), (40, 0.039), (41, -0.029), (42, 0.037), (43, -0.0), (44, -0.007), (45, -0.161), (46, -0.057), (47, 0.025), (48, -0.095), (49, -0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.9557727 367 hunch net-2009-08-16-Centmail comments
Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the
2 0.85925871 223 hunch net-2006-12-06-The Spam Problem
Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see
3 0.6857667 25 hunch net-2005-02-20-At One Month
Introduction: This is near the one month point, so it seems appropriate to consider meta-issues for the moment. The number of posts is a bit over 20. The number of people speaking up in discussions is about 10. The number of people viewing the site is somewhat more than 100. I am (naturally) dissatisfied with many things. Many of the potential uses haven’t been realized. This is partly a matter of opportunity (no conferences in the last month), partly a matter of will (no open problems because it’s hard to give them up), and partly a matter of tradition. In academia, there is a strong tradition of trying to get everything perfectly right before presentation. This is somewhat contradictory to the nature of making many posts, and it’s definitely contradictory to the idea of doing “public research”. If that sort of idea is to pay off, it must be significantly more succesful than previous methods. In an effort to continue experimenting, I’m going to use the next week as “open problems we
4 0.582461 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
5 0.51519281 195 hunch net-2006-07-12-Who is having visa problems reaching US conferences?
Introduction: Many of the large machine learning conferences were in the US this summer. A common problem which students from abroad encounter is visa issues. Just getting a visa to visit can be pretty rough: you stand around in lines, sometimes for days. Even worse is the timing with respect to ticket buying. Airplane tickets typically need to be bought well in advance on nonrefundable terms to secure a reasonable rate for air travel. When a visa is denied, as happens reasonably often, a very expensive ticket is burnt. A serious effort is under way to raise this as in issue in need of fixing. Over the long term, effectively driving research conferences to locate outside of the US seems an unwise policy. Robert Schapire is planning to talk to a congressman. Sally Goldman suggested putting together a list of problem cases, and Phil Long setup an email address immigration.and.confs@gmail.com to collect them. If you (or someone you know) has had insurmountable difficulties reaching
6 0.51249993 401 hunch net-2010-06-20-2010 ICML discussion site
7 0.50573999 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge
8 0.47494218 358 hunch net-2009-06-01-Multitask Poisoning
9 0.46693298 370 hunch net-2009-09-18-Necessary and Sufficient Research
10 0.46316373 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
11 0.46174762 151 hunch net-2006-01-25-1 year
12 0.44746065 69 hunch net-2005-05-11-Visa Casualties
13 0.42630351 399 hunch net-2010-05-20-Google Predict
14 0.42427826 354 hunch net-2009-05-17-Server Update
15 0.41897911 142 hunch net-2005-12-22-Yes , I am applying
16 0.4144201 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions
17 0.40908986 82 hunch net-2005-06-17-Reopening RL->Classification
18 0.4061437 122 hunch net-2005-10-13-Site tweak
19 0.40494299 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
20 0.4026956 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
topicId topicWeight
[(3, 0.025), (27, 0.158), (38, 0.02), (53, 0.602), (55, 0.053), (94, 0.04)]
simIndex simValue blogId blogTitle
1 0.99724162 107 hunch net-2005-09-05-Site Update
Introduction: I tweaked the site in a number of ways today, including: Updating to WordPress 1.5. Installing and heavily tweaking the Geekniche theme. Update: I switched back to a tweaked version of the old theme. Adding the Customizable Post Listings plugin. Installing the StatTraq plugin. Updating some of the links. I particularly recommend looking at the computer research policy blog. Adding threaded comments . This doesn’t thread old comments obviously, but the extra structure may be helpful for new ones. Overall, I think this is an improvement, and it addresses a few of my earlier problems . If you have any difficulties or anything seems “not quite right”, please speak up. A few other tweaks to the site may happen in the near future.
2 0.99290788 16 hunch net-2005-02-09-Intuitions from applied learning
Introduction: Since learning is far from an exact science, it’s good to pay attention to basic intuitions of applied learning. Here are a few I’ve collected. Integration In Bayesian learning, the posterior is computed by an integral, and the optimal thing to do is to predict according to this integral. This phenomena seems to be far more general. Bagging, Boosting, SVMs, and Neural Networks all take advantage of this idea to some extent. The phenomena is more general: you can average over many different classification predictors to improve performance. Sources: Zoubin , Caruana Differentiation Different pieces of an average should differentiate to achieve good performance by different methods. This is know as the ‘symmetry breaking’ problem for neural networks, and it’s why weights are initialized randomly. Boosting explicitly attempts to achieve good differentiation by creating new, different, learning problems. Sources: Yann LeCun , Phil Long Deep Representation Ha
3 0.99069476 56 hunch net-2005-04-14-Families of Learning Theory Statements
Introduction: The diagram above shows a very broad viewpoint of learning theory. arrow Typical statement Examples Past->Past Some prediction algorithm A does almost as well as any of a set of algorithms. Weighted Majority Past->Future Assuming independent samples, past performance predicts future performance. PAC analysis, ERM analysis Future->Future Future prediction performance on subproblems implies future prediction performance using algorithm A . ECOC, Probing A basic question is: Are there other varieties of statements of this type? Avrim noted that there are also “arrows between arrows”: generic methods for transforming between Past->Past statements and Past->Future statements. Are there others?
4 0.97621006 91 hunch net-2005-07-10-Thinking the Unthought
Introduction: One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts. Try to rephrase the problem so the a
5 0.97228611 2 hunch net-2005-01-24-Holy grails of machine learning?
Introduction: Let me kick things off by posing this question to ML researchers: What do you think are some important holy grails of machine learning? For example: – “A classifier with SVM-level performance but much more scalable” – “Practical confidence bounds (or learning bounds) for classification” – “A reinforcement learning algorithm that can handle the ___ problem” – “Understanding theoretically why ___ works so well in practice” etc. I pose this question because I believe that when goals are stated explicitly and well (thus providing clarity as well as opening up the problems to more people), rather than left implicit, they are likely to be achieved much more quickly. I would also like to know more about the internal goals of the various machine learning sub-areas (theory, kernel methods, graphical models, reinforcement learning, etc) as stated by people in these respective areas. This could help people cross sub-areas.
same-blog 6 0.96186495 367 hunch net-2009-08-16-Centmail comments
7 0.93735874 145 hunch net-2005-12-29-Deadline Season
8 0.91651988 6 hunch net-2005-01-27-Learning Complete Problems
9 0.78932065 21 hunch net-2005-02-17-Learning Research Programs
10 0.68324393 201 hunch net-2006-08-07-The Call of the Deep
11 0.67540526 151 hunch net-2006-01-25-1 year
12 0.67215329 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
13 0.66848975 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
14 0.66413641 141 hunch net-2005-12-17-Workshops as Franchise Conferences
15 0.6328941 152 hunch net-2006-01-30-Should the Input Representation be a Vector?
16 0.63027978 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
17 0.61398602 407 hunch net-2010-08-23-Boosted Decision Trees for Deep Learning
18 0.60754025 416 hunch net-2010-10-29-To Vidoelecture or not
19 0.60481536 25 hunch net-2005-02-20-At One Month
20 0.59759063 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011