hunch_net hunch_net-2005 hunch_net-2005-1 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but
sentIndex sentText sentNum sentScore
1 I have decided to run a weblog on machine learning and learning theory research. [sent-1, score-0.284]
2 Some communities have mailing lists supporting this, but not machine learning or learning theory. [sent-5, score-0.338]
3 I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. [sent-6, score-0.213]
4 ” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. [sent-9, score-0.436]
5 Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. [sent-11, score-0.236]
6 It’s easy to imagine that a public debate would be more thorough and thoughtful, producing better decisions. [sent-13, score-0.338]
7 It may be feasible to use a weblog as a mechanism for public research on a scale less than a paper. [sent-15, score-0.626]
8 Weblogs provide a natural generalization where anyone who is interested may be able to contribute. [sent-17, score-0.181]
9 Weblogs provide new capabilities, and it is natural to miss the impact of these capabilities until a number of people have thought about and used them. [sent-19, score-0.296]
10 mechanism speed scope permanency information filtration journal papers 6 months to years. [sent-23, score-0.918]
11 Very permanent reviewed conference papers 4-6 months Attendees (and often any with interest). [sent-25, score-0.669]
12 Permanent reviewed workshops 1-6 months Attendees Typically Transient inspected mailing lists a few days Anyone subscribed (or reading archives). [sent-26, score-0.794]
13 Semipermanent (with archives) inspected personal discussion thought speed Whoever is there then. [sent-27, score-0.4]
14 Transient not reviewed weblog thought speed Anyone with interest Semipermaent not reviewed Weblogs achieve “best we can imagine” in every category except permanency and quality control. [sent-28, score-1.166]
15 Permalinks are the equivalent of a citation, providing a semipermanent pointer to a piece of content. [sent-30, score-0.278]
16 This is only ‘semi’ becuase the _author_ of the content can typically revise the content at any moment in the future and the pointer is only permanet up to the permanence of the website. [sent-31, score-0.34]
17 Trackback is an explicit method for creating the reverse lookup table of citations: who cites this? [sent-32, score-0.225]
18 In addition, there are several mechanisms for information filtration such as “post is reposted in another weblog” and experimental moderation schemes. [sent-33, score-0.379]
19 The same forces driving academia into desiring permanent indelible records and very careful information filtration exist for blogs. [sent-34, score-0.677]
20 These forces may produce the ‘missing pieces’, making weblogs very compelling for academic purposes. [sent-35, score-0.567]
wordName wordTfidf (topN-words)
[('weblogs', 0.425), ('weblog', 0.284), ('filtration', 0.213), ('reviewed', 0.201), ('permanent', 0.177), ('inspected', 0.16), ('permanency', 0.16), ('semipermanent', 0.16), ('transient', 0.16), ('months', 0.147), ('public', 0.144), ('archives', 0.142), ('forces', 0.142), ('speed', 0.122), ('mechanism', 0.12), ('pointer', 0.118), ('thought', 0.118), ('anyone', 0.117), ('capabilities', 0.114), ('attendees', 0.11), ('lists', 0.11), ('mailing', 0.11), ('debate', 0.103), ('imagine', 0.091), ('evaluation', 0.087), ('information', 0.083), ('explicit', 0.083), ('mechanisms', 0.083), ('content', 0.08), ('interest', 0.08), ('research', 0.078), ('comment', 0.077), ('papers', 0.073), ('conference', 0.071), ('exclusively', 0.071), ('fortnow', 0.071), ('lookup', 0.071), ('reverse', 0.071), ('read', 0.07), ('subscribed', 0.066), ('whoever', 0.066), ('provide', 0.064), ('records', 0.062), ('revise', 0.062), ('weaknesses', 0.062), ('communicating', 0.059), ('communities', 0.059), ('sharing', 0.059), ('supporting', 0.059), ('thoughtful', 0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1 hunch net-2005-01-19-Why I decided to run a weblog.
Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but
2 0.11793315 437 hunch net-2011-07-10-ICML 2011 and the future
Introduction: Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper: Richard Socher , Cliff Lin , Andrew Y. Ng , and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks . I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists. the future Joelle and I are program chairs for ICML 2012 in Edinburgh , which I previously enjoyed visiting in 2005 . This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have: Colocation . The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGI
3 0.11159392 29 hunch net-2005-02-25-Solution: Reinforcement Learning with Classification
Introduction: I realized that the tools needed to solve the problem just posted were just created. I tried to sketch out the solution here (also in .lyx and .tex ). It is still quite sketchy (and probably only the few people who understand reductions well can follow). One of the reasons why I started this weblog was to experiment with “research in the open”, and this is an opportunity to do so. Over the next few days, I’ll be filling in details and trying to get things to make sense. If you have additions or ideas, please propose them.
4 0.10597612 22 hunch net-2005-02-18-What it means to do research.
Introduction: I want to try to describe what doing research means, especially from the point of view of an undergraduate. The shift from a class-taking mentality to a research mentality is very significant and not easy. Problem Posing Posing the right problem is often as important as solving them. Many people can get by in research by solving problems others have posed, but that’s not sufficient for really inspiring research. For learning in particular, there is a strong feeling that we just haven’t figured out which questions are the right ones to ask. You can see this, because the answers we have do not seem convincing. Gambling your life When you do research, you think very hard about new ways of solving problems, new problems, and new solutions. Many conversations are of the form “I wonder what would happen if…” These processes can be short (days or weeks) or years-long endeavours. The worst part is that you’ll only know if you were succesful at the end of the process (and some
5 0.096406303 116 hunch net-2005-09-30-Research in conferences
Introduction: Conferences exist as part of the process of doing research. They provide many roles including “announcing research”, “meeting people”, and “point of reference”. Not all conferences are alike so a basic question is: “to what extent do individual conferences attempt to aid research?” This question is very difficult to answer in any satisfying way. What we can do is compare details of the process across multiple conferences. Comments The average quality of comments across conferences can vary dramatically. At one extreme, the tradition in CS theory conferences is to provide essentially zero feedback. At the other extreme, some conferences have a strong tradition of providing detailed constructive feedback. Detailed feedback can give authors significant guidance about how to improve research. This is the most subjective entry. Blind Virtually all conferences offer single blind review where authors do not know reviewers. Some also provide double blind review where rev
6 0.095410451 30 hunch net-2005-02-25-Why Papers?
7 0.094155602 134 hunch net-2005-12-01-The Webscience Future
8 0.093142457 296 hunch net-2008-04-21-The Science 2.0 article
9 0.085451066 98 hunch net-2005-07-27-Not goal metrics
10 0.085070267 342 hunch net-2009-02-16-KDNuggets
11 0.084844172 345 hunch net-2009-03-08-Prediction Science
12 0.079720482 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
13 0.079699256 343 hunch net-2009-02-18-Decision by Vetocracy
14 0.079623379 454 hunch net-2012-01-30-ICML Posters and Scope
15 0.07742741 208 hunch net-2006-09-18-What is missing for online collaborative research?
16 0.077032708 325 hunch net-2008-11-10-ICML Reviewing Criteria
17 0.07561969 48 hunch net-2005-03-29-Academic Mechanism Design
18 0.073084421 468 hunch net-2012-06-29-ICML survey and comments
19 0.071938314 36 hunch net-2005-03-05-Funding Research
20 0.071725465 51 hunch net-2005-04-01-The Producer-Consumer Model of Research
topicId topicWeight
[(0, 0.169), (1, -0.089), (2, -0.04), (3, 0.061), (4, -0.045), (5, 0.042), (6, 0.062), (7, -0.008), (8, 0.022), (9, 0.056), (10, -0.025), (11, 0.012), (12, 0.002), (13, -0.006), (14, 0.022), (15, 0.008), (16, -0.019), (17, 0.0), (18, 0.007), (19, -0.004), (20, 0.032), (21, -0.009), (22, -0.022), (23, -0.031), (24, -0.021), (25, -0.052), (26, -0.009), (27, 0.082), (28, -0.046), (29, -0.051), (30, 0.001), (31, 0.077), (32, 0.058), (33, -0.019), (34, -0.005), (35, -0.014), (36, 0.002), (37, 0.037), (38, 0.01), (39, 0.026), (40, -0.041), (41, 0.042), (42, 0.034), (43, -0.024), (44, 0.012), (45, -0.062), (46, -0.06), (47, 0.014), (48, 0.052), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.95451623 1 hunch net-2005-01-19-Why I decided to run a weblog.
Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but
2 0.7475161 134 hunch net-2005-12-01-The Webscience Future
Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield
3 0.69138312 208 hunch net-2006-09-18-What is missing for online collaborative research?
Introduction: The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader. But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research a
4 0.6589973 30 hunch net-2005-02-25-Why Papers?
Introduction: Makc asked a good question in comments—”Why bother to make a paper, at all?” There are several reasons for writing papers which may not be immediately obvious to people not in academia. The basic idea is that papers have considerably more utility than the obvious “present an idea”. Papers are a formalized units of work. Academics (especially young ones) are often judged on the number of papers they produce. Papers have a formalized method of citing and crediting other—the bibliography. Academics (especially older ones) are often judged on the number of citations they receive. Papers enable a “more fair” anonymous review. Conferences receive many papers, from which a subset are selected. Discussion forums are inherently not anonymous for anyone who wants to build a reputation for good work. Papers are an excuse to meet your friends. Papers are the content of conferences, but much of what you do is talk to friends about interesting problems while there. Sometimes yo
5 0.65594596 233 hunch net-2007-02-16-The Forgetting
Introduction: How many papers do you remember from 2006? 2005? 2002? 1997? 1987? 1967? One way to judge this would be to look at the citations of the papers you write—how many came from which year? For myself, the answers on recent papers are: year 2006 2005 2002 1997 1987 1967 count 4 10 5 1 0 0 This spectrum is fairly typical of papers in general. There are many reasons that citations are focused on recent papers. The number of papers being published continues to grow. This is not a very significant effect, because the rate of publication has not grown nearly as fast. Dead men don’t reject your papers for not citing them. This reason seems lame, because it’s a distortion from the ideal of science. Nevertheless, it must be stated because the effect can be significant. In 1997, I started as a PhD student. Naturally, papers after 1997 are better remembered because they were absorbed in real time. A large fraction of people writing papers and a
6 0.65455723 98 hunch net-2005-07-27-Not goal metrics
7 0.61972225 296 hunch net-2008-04-21-The Science 2.0 article
8 0.59994435 48 hunch net-2005-03-29-Academic Mechanism Design
9 0.59768355 288 hunch net-2008-02-10-Complexity Illness
10 0.59282559 231 hunch net-2007-02-10-Best Practices for Collaboration
11 0.59134007 146 hunch net-2006-01-06-MLTV
12 0.58854097 297 hunch net-2008-04-22-Taking the next step
13 0.58220589 270 hunch net-2007-11-02-The Machine Learning Award goes to …
14 0.57830638 333 hunch net-2008-12-27-Adversarial Academia
15 0.57449096 106 hunch net-2005-09-04-Science in the Government
16 0.57341844 22 hunch net-2005-02-18-What it means to do research.
17 0.56962317 202 hunch net-2006-08-10-Precision is not accuracy
18 0.56623203 358 hunch net-2009-06-01-Multitask Poisoning
19 0.56322515 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
20 0.56181628 42 hunch net-2005-03-17-Going all the Way, Sometimes
topicId topicWeight
[(3, 0.03), (13, 0.027), (27, 0.122), (29, 0.098), (37, 0.187), (48, 0.022), (53, 0.074), (55, 0.087), (58, 0.021), (68, 0.021), (89, 0.01), (94, 0.095), (95, 0.079)]
simIndex simValue blogId blogTitle
same-blog 1 0.91748959 1 hunch net-2005-01-19-Why I decided to run a weblog.
Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but
2 0.88113725 431 hunch net-2011-04-18-A paper not at Snowbird
Introduction: Unfortunately, a scheduling failure meant I missed all of AIStat and most of the learning workshop , otherwise known as Snowbird, when it’s at Snowbird . At snowbird, the talk on Sum-Product networks by Hoifung Poon stood out to me ( Pedro Domingos is a coauthor.). The basic point was that by appropriately constructing networks based on sums and products, the normalization problem in probabilistic models is eliminated, yielding a highly tractable yet flexible representation+learning algorithm. As an algorithm, this is noticeably cleaner than deep belief networks with a claim to being an order of magnitude faster and working better on an image completion task. Snowbird doesn’t have real papers—just the abstract above. I look forward to seeing the paper. (added: Rodrigo points out the deep learning workshop draft .)
3 0.85813165 63 hunch net-2005-04-27-DARPA project: LAGR
Introduction: Larry Jackal has set up the LAGR (“Learning Applied to Ground Robotics”) project (and competition) which seems to be quite well designed. Features include: Many participants (8 going on 12?) Standardized hardware. In the DARPA grand challenge contestants entering with motorcycles are at a severe disadvantage to those entering with a Hummer. Similarly, contestants using more powerful sensors can gain huge advantages. Monthly contests, with full feedback (but since the hardware is standardized, only code is shipped). One of the premises of the program is that robust systems are desired. Monthly evaluations at different locations can help measure this and provide data. Attacks a known hard problem. (cross country driving)
4 0.78342408 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
5 0.75301939 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning
Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t
6 0.70045519 21 hunch net-2005-02-17-Learning Research Programs
7 0.68359554 194 hunch net-2006-07-11-New Models
8 0.66103679 357 hunch net-2009-05-30-Many ways to Learn this summer
9 0.65841073 141 hunch net-2005-12-17-Workshops as Franchise Conferences
10 0.6528396 28 hunch net-2005-02-25-Problem: Online Learning
11 0.64268416 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch
12 0.64132965 423 hunch net-2011-02-02-User preferences for search engines
13 0.64113724 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
14 0.63988847 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
15 0.63846749 416 hunch net-2010-10-29-To Vidoelecture or not
16 0.63739401 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
17 0.63456875 146 hunch net-2006-01-06-MLTV
18 0.63415754 75 hunch net-2005-05-28-Running A Machine Learning Summer School
19 0.63377517 461 hunch net-2012-04-09-ICML author feedback is open
20 0.63251877 151 hunch net-2006-01-25-1 year