hunch_net hunch_net-2005 hunch_net-2005-25 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is near the one month point, so it seems appropriate to consider meta-issues for the moment. The number of posts is a bit over 20. The number of people speaking up in discussions is about 10. The number of people viewing the site is somewhat more than 100. I am (naturally) dissatisfied with many things. Many of the potential uses haven’t been realized. This is partly a matter of opportunity (no conferences in the last month), partly a matter of will (no open problems because it’s hard to give them up), and partly a matter of tradition. In academia, there is a strong tradition of trying to get everything perfectly right before presentation. This is somewhat contradictory to the nature of making many posts, and it’s definitely contradictory to the idea of doing “public research”. If that sort of idea is to pay off, it must be significantly more succesful than previous methods. In an effort to continue experimenting, I’m going to use the next week as “open problems we
sentIndex sentText sentNum sentScore
1 This is near the one month point, so it seems appropriate to consider meta-issues for the moment. [sent-1, score-0.366]
2 This is partly a matter of opportunity (no conferences in the last month), partly a matter of will (no open problems because it’s hard to give them up), and partly a matter of tradition. [sent-7, score-0.94]
3 This is somewhat contradictory to the nature of making many posts, and it’s definitely contradictory to the idea of doing “public research”. [sent-9, score-0.549]
4 In an effort to continue experimenting, I’m going to use the next week as “open problems week”. [sent-11, score-0.213]
5 WordPress allows you to block specific posts by match, but there seems to be some minor bug (or maybe a misuse) in how it matches. [sent-13, score-0.604]
6 This resulted in everything being blocked pending approval, which is highly unnatural for any conversation. [sent-14, score-0.633]
7 I approved all posts by real people, and I think the ‘everything blocked pending approval’ problem has been solved. [sent-15, score-0.882]
8 A site discussing learning ought to have a better system for coping with what is spam and what is not. [sent-16, score-0.425]
9 (It’s not clear this is research instead of just engineering, but it is clear that it would be very valuable here and in many other places. [sent-18, score-0.32]
10 Threading would be helpful in comments because it would help localize discussion to particular contexts. [sent-21, score-0.445]
11 Tagging of posts with categories seems inadequate because it’s hard to anticipate all the ways something might be thought about. [sent-22, score-0.83]
12 Idealy, the sequence of posts would create a well-organized virtual site. [sent-24, score-0.643]
13 In many cases there are very good comments and it seems altering the post to summarize the comments is appropriate, but doing so leaves the comments out of context. [sent-25, score-0.903]
14 Some mechanism of refinement which avoids this problem would be great. [sent-26, score-0.265]
15 Many comments develop into something that should (essentially) be their own post on a new topic. [sent-27, score-0.207]
16 Doing so is currently cumbersome, and a mechanism for making that shift would be helpful. [sent-28, score-0.272]
17 Making a stream of good posts is hard and takes awhile. [sent-30, score-0.671]
18 Naturally, some were (and even still are) stored up, but that store is finite, and eventually will be exhausted. [sent-31, score-0.251]
19 Since I’m unwilling to compromise quality, this means the rate of posts may eventually fall. [sent-32, score-0.78]
20 Several of the discussions have been quite interesting, and I often find that the process of writing posts helps clarify my understanding. [sent-39, score-0.701]
wordName wordTfidf (topN-words)
[('posts', 0.524), ('comments', 0.207), ('approval', 0.179), ('blocked', 0.179), ('pending', 0.179), ('spam', 0.178), ('contradictory', 0.159), ('partly', 0.139), ('everything', 0.127), ('matter', 0.126), ('would', 0.119), ('month', 0.115), ('week', 0.112), ('eventually', 0.103), ('discussions', 0.103), ('continue', 0.101), ('site', 0.094), ('appropriate', 0.088), ('making', 0.086), ('consider', 0.083), ('seems', 0.08), ('archives', 0.079), ('misuse', 0.079), ('compromise', 0.079), ('cumbersome', 0.079), ('dissatisfied', 0.079), ('ought', 0.079), ('refinement', 0.079), ('threading', 0.079), ('naturally', 0.079), ('hard', 0.078), ('somewhat', 0.078), ('anticipate', 0.074), ('clarify', 0.074), ('coping', 0.074), ('inadequate', 0.074), ('resulted', 0.074), ('store', 0.074), ('stored', 0.074), ('unnatural', 0.074), ('unwilling', 0.074), ('altering', 0.069), ('stream', 0.069), ('many', 0.067), ('mechanism', 0.067), ('open', 0.067), ('clear', 0.067), ('commitment', 0.066), ('jl', 0.066), ('summarize', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 25 hunch net-2005-02-20-At One Month
Introduction: This is near the one month point, so it seems appropriate to consider meta-issues for the moment. The number of posts is a bit over 20. The number of people speaking up in discussions is about 10. The number of people viewing the site is somewhat more than 100. I am (naturally) dissatisfied with many things. Many of the potential uses haven’t been realized. This is partly a matter of opportunity (no conferences in the last month), partly a matter of will (no open problems because it’s hard to give them up), and partly a matter of tradition. In academia, there is a strong tradition of trying to get everything perfectly right before presentation. This is somewhat contradictory to the nature of making many posts, and it’s definitely contradictory to the idea of doing “public research”. If that sort of idea is to pay off, it must be significantly more succesful than previous methods. In an effort to continue experimenting, I’m going to use the next week as “open problems we
2 0.26563826 225 hunch net-2007-01-02-Retrospective
Introduction: It’s been almost two years since this blog began. In that time, I’ve learned enough to shift my expectations in several ways. Initially, the idea was for a general purpose ML blog where different people could contribute posts. What has actually happened is most posts come from me, with a few guest posts that I greatly value. There are a few reasons I see for this. Overload . A couple years ago, I had not fully appreciated just how busy life gets for a researcher. Making a post is not simply a matter of getting to it, but rather of prioritizing between {writing a grant, finishing an overdue review, writing a paper, teaching a class, writing a program, etc…}. This is a substantial transition away from what life as a graduate student is like. At some point the question is not “when will I get to it?” but rather “will I get to it?” and the answer starts to become “no” most of the time. Feedback failure . This blog currently receives about 3K unique visitors per day from
3 0.24841744 151 hunch net-2006-01-25-1 year
Introduction: At the one year (+5 days) anniversary, the natural question is: “Was it helpful for research?” Answer: Yes, and so it shall continue. Some evidence is provided by noticing that I am about a factor of 2 more overloaded with paper ideas than I’ve ever previously been. It is always hard to estimate counterfactual worlds, but I expect that this is also a factor of 2 more than “What if I had not started the blog?” As for “Why?”, there seem to be two primary effects. A blog is a mechanism for connecting with people who either think like you or are interested in the same problems. This allows for concentration of thinking which is very helpful in solving problems. The process of stating things you don’t understand publicly is very helpful in understanding them. Sometimes you are simply forced to express them in a way which aids understanding. Sometimes someone else says something which helps. And sometimes you discover that someone else has already solved the problem. The
4 0.21030556 137 hunch net-2005-12-09-Machine Learning Thoughts
Introduction: I added a link to Olivier Bousquet’s machine learning thoughts blog. Several of the posts may be of interest.
5 0.16365603 223 hunch net-2006-12-06-The Spam Problem
Introduction: The New York Times has an article on the growth of spam . Interesting facts include: 9/10 of all email is spam, spam source identification is nearly useless due to botnet spam senders, and image based spam (emails which consist of an image only) are on the growth. Estimates of the cost of spam are almost certainly far to low, because they do not account for the cost in time lost by people. The image based spam which is currently penetrating many filters should be catchable with a more sophisticated application of machine learning technology. For the spam I see, the rendered images come in only a few formats, which would be easy to recognize via a support vector machine (with RBF kernel), neural network, or even nearest-neighbor architecture. The mechanics of setting this up to run efficiently is the only real challenge. This is the next step in the spam war. The response to this system is to make the image based spam even more random. We should (essentially) expect to see
6 0.14081539 96 hunch net-2005-07-21-Six Months
7 0.13595201 383 hunch net-2009-12-09-Inherent Uncertainty
8 0.12869179 142 hunch net-2005-12-22-Yes , I am applying
9 0.12781303 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
10 0.12660706 354 hunch net-2009-05-17-Server Update
11 0.12001063 297 hunch net-2008-04-22-Taking the next step
12 0.11688511 107 hunch net-2005-09-05-Site Update
13 0.11230032 367 hunch net-2009-08-16-Centmail comments
14 0.095967218 22 hunch net-2005-02-18-What it means to do research.
15 0.095533878 454 hunch net-2012-01-30-ICML Posters and Scope
16 0.093163081 446 hunch net-2011-10-03-Monday announcements
17 0.090856783 122 hunch net-2005-10-13-Site tweak
18 0.08694575 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
19 0.08496958 401 hunch net-2010-06-20-2010 ICML discussion site
20 0.084444068 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
topicId topicWeight
[(0, 0.2), (1, -0.058), (2, -0.055), (3, 0.123), (4, -0.093), (5, 0.006), (6, 0.053), (7, -0.17), (8, 0.066), (9, 0.01), (10, -0.063), (11, -0.033), (12, -0.061), (13, 0.07), (14, 0.06), (15, -0.024), (16, -0.105), (17, -0.118), (18, -0.025), (19, 0.129), (20, -0.109), (21, 0.052), (22, -0.083), (23, -0.146), (24, -0.062), (25, 0.032), (26, 0.071), (27, 0.056), (28, 0.01), (29, -0.094), (30, -0.073), (31, 0.008), (32, -0.116), (33, -0.097), (34, 0.123), (35, 0.033), (36, 0.075), (37, -0.031), (38, -0.158), (39, -0.048), (40, -0.036), (41, 0.012), (42, 0.103), (43, -0.018), (44, -0.026), (45, -0.045), (46, 0.036), (47, 0.038), (48, -0.127), (49, 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 0.96462792 25 hunch net-2005-02-20-At One Month
Introduction: This is near the one month point, so it seems appropriate to consider meta-issues for the moment. The number of posts is a bit over 20. The number of people speaking up in discussions is about 10. The number of people viewing the site is somewhat more than 100. I am (naturally) dissatisfied with many things. Many of the potential uses haven’t been realized. This is partly a matter of opportunity (no conferences in the last month), partly a matter of will (no open problems because it’s hard to give them up), and partly a matter of tradition. In academia, there is a strong tradition of trying to get everything perfectly right before presentation. This is somewhat contradictory to the nature of making many posts, and it’s definitely contradictory to the idea of doing “public research”. If that sort of idea is to pay off, it must be significantly more succesful than previous methods. In an effort to continue experimenting, I’m going to use the next week as “open problems we
2 0.80658507 151 hunch net-2006-01-25-1 year
Introduction: At the one year (+5 days) anniversary, the natural question is: “Was it helpful for research?” Answer: Yes, and so it shall continue. Some evidence is provided by noticing that I am about a factor of 2 more overloaded with paper ideas than I’ve ever previously been. It is always hard to estimate counterfactual worlds, but I expect that this is also a factor of 2 more than “What if I had not started the blog?” As for “Why?”, there seem to be two primary effects. A blog is a mechanism for connecting with people who either think like you or are interested in the same problems. This allows for concentration of thinking which is very helpful in solving problems. The process of stating things you don’t understand publicly is very helpful in understanding them. Sometimes you are simply forced to express them in a way which aids understanding. Sometimes someone else says something which helps. And sometimes you discover that someone else has already solved the problem. The
3 0.67705637 354 hunch net-2009-05-17-Server Update
Introduction: The hunch.net server has been updated. I’ve taken the opportunity to upgrade the version of wordpress which caused cascading changes. Old threaded comments are now flattened. The system we used to use ( Brian’s threaded comments ) appears incompatible with the new threading system built into wordpress. I haven’t yet figured out a workaround. I setup a feedburner account . I added an RSS aggregator for both Machine Learning and other research blogs that I like to follow. This is something that I’ve wanted to do for awhile. Many other minor changes in font and format, with some help from Alina . If you have any suggestions for site tweaks, please speak up.
4 0.66986418 367 hunch net-2009-08-16-Centmail comments
Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the
5 0.66966879 137 hunch net-2005-12-09-Machine Learning Thoughts
Introduction: I added a link to Olivier Bousquet’s machine learning thoughts blog. Several of the posts may be of interest.
6 0.63065511 225 hunch net-2007-01-02-Retrospective
7 0.60331607 223 hunch net-2006-12-06-The Spam Problem
8 0.60111022 107 hunch net-2005-09-05-Site Update
9 0.57776046 96 hunch net-2005-07-21-Six Months
10 0.54937381 142 hunch net-2005-12-22-Yes , I am applying
11 0.53736961 122 hunch net-2005-10-13-Site tweak
12 0.53469896 297 hunch net-2008-04-22-Taking the next step
13 0.48975334 383 hunch net-2009-12-09-Inherent Uncertainty
14 0.48521206 246 hunch net-2007-06-13-Not Posting
15 0.4645879 294 hunch net-2008-04-12-Blog compromised
16 0.41717213 91 hunch net-2005-07-10-Thinking the Unthought
17 0.40293282 358 hunch net-2009-06-01-Multitask Poisoning
18 0.4001283 195 hunch net-2006-07-12-Who is having visa problems reaching US conferences?
19 0.39661032 182 hunch net-2006-06-05-Server Shift, Site Tweaks, Suggestions?
20 0.39056823 257 hunch net-2007-07-28-Asking questions
topicId topicWeight
[(10, 0.015), (26, 0.304), (27, 0.179), (37, 0.012), (38, 0.021), (53, 0.129), (55, 0.089), (77, 0.01), (79, 0.014), (94, 0.06), (95, 0.061)]
simIndex simValue blogId blogTitle
1 0.95695442 171 hunch net-2006-04-09-Progress in Machine Translation
Introduction: I just visited ISI where Daniel Marcu and others are working on machine translation. Apparently, machine translation is rapidly improving. A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence: 2002 2003 insistent Wednesday may recurred her trips to Libya tomorrow for flying. Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment. And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday.” E
2 0.94478899 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound
Introduction: Textbooks invariably seem to carry the proof that uses Markov’s inequality, moment-generating functions, and Taylor approximations. Here’s an easier way. For , let be the KL divergence between a coin of bias and one of bias : Theorem: Suppose you do independent tosses of a coin of bias . The probability of seeing heads or more, for , is at most . So is the probability of seeing heads or less, for . Remark: By Pinsker’s inequality, . Proof Let’s do the case; the other is identical. Let be the distribution over induced by a coin of bias , and likewise for a coin of bias . Let be the set of all sequences of tosses which contain heads or more. We’d like to show that is unlikely under . Pick any , with say heads. Then: Since for every , we have and we’re done.
3 0.88366389 305 hunch net-2008-06-30-ICML has a comment system
Introduction: Mark Reid has stepped up and created a comment system for ICML papers which Greger Linden has tightly integrated. My understanding is that Mark spent quite a bit of time on the details, and there are some cool features like working latex math mode. This is an excellent chance for the ICML community to experiment with making ICML year-round, so I hope it works out. Please do consider experimenting with it.
4 0.87475806 97 hunch net-2005-07-23-Interesting papers at ACL
Introduction: A recent discussion indicated that one goal of this blog might be to allow people to post comments about recent papers that they liked. I think this could potentially be very useful, especially for those with diverse interests but only finite time to read through conference proceedings. ACL 2005 recently completed, and here are four papers from that conference that I thought were either good or perhaps of interest to a machine learning audience. David Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation . (Best paper award.) This paper takes the standard phrase-based MT model that is popular in our field (basically, translate a sentence by individually translating phrases and reordering them according to a complicated statistical model) and extends it to take into account hierarchy in phrases, so that you can learn things like “X ‘s Y” -> “Y de X” in chinese, where X and Y are arbitrary phrases. This takes a step toward linguistic syntax for MT, whic
5 0.87296849 17 hunch net-2005-02-10-Conferences, Dates, Locations
Introduction: Conference Locate Date COLT Bertinoro, Italy June 27-30 AAAI Pittsburgh, PA, USA July 9-13 UAI Edinburgh, Scotland July 26-29 IJCAI Edinburgh, Scotland July 30 – August 5 ICML Bonn, Germany August 7-11 KDD Chicago, IL, USA August 21-24 The big winner this year is Europe. This is partly a coincidence, and partly due to the general internationalization of science over the last few years. With cuts to basic science in the US and increased hassle for visitors, conferences outside the US become more attractive. Europe and Australia/New Zealand are the immediate winners because they have the science, infrastructure, and english in place. China and India are possible future winners.
same-blog 6 0.86641294 25 hunch net-2005-02-20-At One Month
7 0.73243362 43 hunch net-2005-03-18-Binomial Weighting
8 0.61677158 141 hunch net-2005-12-17-Workshops as Franchise Conferences
9 0.61347884 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
10 0.61338544 151 hunch net-2006-01-25-1 year
11 0.60488403 134 hunch net-2005-12-01-The Webscience Future
12 0.60473126 201 hunch net-2006-08-07-The Call of the Deep
13 0.60458899 370 hunch net-2009-09-18-Necessary and Sufficient Research
14 0.60323381 202 hunch net-2006-08-10-Precision is not accuracy
15 0.60208899 207 hunch net-2006-09-12-Incentive Compatible Reviewing
16 0.59994018 297 hunch net-2008-04-22-Taking the next step
17 0.59895009 22 hunch net-2005-02-18-What it means to do research.
18 0.59877372 225 hunch net-2007-01-02-Retrospective
19 0.59763724 437 hunch net-2011-07-10-ICML 2011 and the future
20 0.59533048 358 hunch net-2009-06-01-Multitask Poisoning