hunch_net hunch_net-2008 hunch_net-2008-297 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
sentIndex sentText sentNum sentScore
1 If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. [sent-4, score-0.183]
2 For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. [sent-6, score-0.507]
3 Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. [sent-7, score-0.461]
4 For example, the ICML 2007 papers page is basically only useful via grep. [sent-10, score-0.438]
5 A much more human-readable version of the page would organize the papers by topic. [sent-11, score-0.207]
6 If the page wiki-editable, this would almost happen automatically. [sent-12, score-0.207]
7 Adding the ability for people to comment on the papers might make the website more useful beyond the time of the conference itself. [sent-13, score-0.571]
8 Here are various concerns I have: Mandate An official mandate is a must-have. [sent-16, score-0.313]
9 Permissive Comments Allowing anyone to comment on a website is somewhat scary to academics, because we are used to peer-reviewing papers before publishing. [sent-18, score-0.565]
10 net is allowing comments from anyone exhibiting evidence of intelligence—i. [sent-22, score-0.459]
11 Spam Spam is a serious issue for dynamic websites, because it adds substantially to the maintenance load. [sent-26, score-0.345]
12 There are basically two tacks to take here: Issue a userid/passwd to every conference registrant (and maybe others that request it), the just allow comments from them. [sent-27, score-0.628]
13 Allow comments from anyone, but use automated filters. [sent-28, score-0.204]
14 Open Source I have a strong preference for open source solutions, of which there appear to be several reasonable choices. [sent-36, score-0.37]
15 The reason is that open source applications leave you free (or at least freer) to switch and change things, which seems essential when experimenting. [sent-37, score-0.269]
16 Large User base When going with an open source solution, something with a large user base is likely to have fewer rough edges. [sent-38, score-0.568]
17 I have some preference for systems using flat files for datastorage rather than a database because they are easier to maintain or (if necessary) operate on. [sent-39, score-0.423]
18 This is partly due to a bad experience I had with the twiki setup for MLSS—basically an attempt to transfer data to an upgraded mysql failed because of schema issues I failed to resolve. [sent-40, score-0.791]
19 I’m sure there are many with more experience using wiki and comment systems—perhaps they can comment on exact software choices. [sent-41, score-0.808]
20 Wikimatrix seems to provide frighteningly detailed comparisons of different wiki software. [sent-42, score-0.266]
wordName wordTfidf (topN-words)
[('comments', 0.204), ('software', 0.203), ('wiki', 0.188), ('mandate', 0.176), ('permissive', 0.176), ('twiki', 0.176), ('comment', 0.17), ('maintenance', 0.157), ('basically', 0.155), ('website', 0.139), ('official', 0.137), ('source', 0.136), ('open', 0.133), ('setup', 0.131), ('page', 0.126), ('failed', 0.125), ('spam', 0.117), ('user', 0.111), ('dynamic', 0.111), ('filtering', 0.108), ('conference', 0.105), ('slow', 0.101), ('preference', 0.101), ('anyone', 0.097), ('systems', 0.094), ('base', 0.094), ('allow', 0.091), ('experiment', 0.088), ('allowing', 0.085), ('papers', 0.081), ('happen', 0.081), ('mysql', 0.078), ('commenting', 0.078), ('manages', 0.078), ('dietterich', 0.078), ('frighteningly', 0.078), ('flat', 0.078), ('preferences', 0.078), ('schema', 0.078), ('upgraded', 0.078), ('coordinating', 0.078), ('scary', 0.078), ('witness', 0.078), ('issue', 0.077), ('using', 0.077), ('useful', 0.076), ('request', 0.073), ('exhibiting', 0.073), ('files', 0.073), ('guidelines', 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
2 0.15168729 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
Introduction: How do you create an optimal environment for research? Here are some essential ingredients that I see. Stability . University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between: easy small difficult big For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things. Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt rese
3 0.14702578 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for
4 0.1412634 437 hunch net-2011-07-10-ICML 2011 and the future
Introduction: Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper: Richard Socher , Cliff Lin , Andrew Y. Ng , and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks . I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists. the future Joelle and I are program chairs for ICML 2012 in Edinburgh , which I previously enjoyed visiting in 2005 . This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have: Colocation . The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGI
5 0.13856503 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
Introduction: Chicago ’05 ended a couple of weeks ago. This was the sixth Machine Learning Summer School , and the second one that used a wiki . (The first was Berder ’04, thanks to Gunnar Raetsch.) Wikis are relatively easy to set up, greatly aid social interaction, and should be used a lot more at summer schools and workshops. They can even be used as the meeting’s webpage, as a permanent record of its participants’ collaborations — see for example the wiki/website for last year’s NVO Summer School . A basic wiki is a collection of editable webpages, maintained by software called a wiki engine . The engine used at both Berder and Chicago was TikiWiki — it is well documented and gets you something running fast. It uses PHP and MySQL, but doesn’t require you to know either. Tikiwiki has far more features than most wikis, as it is really a full Content Management System . (My thanks to Sebastian Stark for pointing this out.) Here are the features we found most useful: Bulletin boa
6 0.13421284 122 hunch net-2005-10-13-Site tweak
7 0.13099357 367 hunch net-2009-08-16-Centmail comments
8 0.1266275 223 hunch net-2006-12-06-The Spam Problem
9 0.12446942 225 hunch net-2007-01-02-Retrospective
10 0.12155269 452 hunch net-2012-01-04-Why ICML? and the summer conferences
11 0.12001063 25 hunch net-2005-02-20-At One Month
12 0.11524309 403 hunch net-2010-07-18-ICML & COLT 2010
13 0.1106477 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
14 0.10825942 454 hunch net-2012-01-30-ICML Posters and Scope
15 0.10662314 141 hunch net-2005-12-17-Workshops as Franchise Conferences
16 0.10555229 288 hunch net-2008-02-10-Complexity Illness
17 0.10437208 92 hunch net-2005-07-11-AAAI blog
18 0.10304727 208 hunch net-2006-09-18-What is missing for online collaborative research?
19 0.1024079 424 hunch net-2011-02-17-What does Watson mean?
20 0.10198189 423 hunch net-2011-02-02-User preferences for search engines
topicId topicWeight
[(0, 0.266), (1, -0.109), (2, -0.058), (3, 0.05), (4, -0.045), (5, -0.023), (6, -0.014), (7, -0.104), (8, -0.003), (9, 0.038), (10, -0.101), (11, -0.002), (12, -0.056), (13, 0.077), (14, 0.065), (15, -0.02), (16, -0.122), (17, -0.02), (18, 0.043), (19, 0.17), (20, 0.004), (21, 0.013), (22, -0.022), (23, -0.048), (24, -0.036), (25, 0.048), (26, -0.007), (27, 0.081), (28, 0.038), (29, 0.019), (30, 0.058), (31, 0.106), (32, 0.017), (33, -0.047), (34, 0.035), (35, 0.026), (36, 0.038), (37, -0.088), (38, 0.112), (39, 0.054), (40, -0.013), (41, -0.002), (42, 0.094), (43, 0.038), (44, -0.087), (45, -0.075), (46, 0.084), (47, 0.057), (48, -0.004), (49, -0.059)]
simIndex simValue blogId blogTitle
same-blog 1 0.97261739 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
2 0.66034234 367 hunch net-2009-08-16-Centmail comments
Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the
3 0.65792978 354 hunch net-2009-05-17-Server Update
Introduction: The hunch.net server has been updated. I’ve taken the opportunity to upgrade the version of wordpress which caused cascading changes. Old threaded comments are now flattened. The system we used to use ( Brian’s threaded comments ) appears incompatible with the new threading system built into wordpress. I haven’t yet figured out a workaround. I setup a feedburner account . I added an RSS aggregator for both Machine Learning and other research blogs that I like to follow. This is something that I’ve wanted to do for awhile. Many other minor changes in font and format, with some help from Alina . If you have any suggestions for site tweaks, please speak up.
4 0.6518954 122 hunch net-2005-10-13-Site tweak
Introduction: Several people have had difficulty with comments which seem to have an allowed language significantly poorer than posts. The set of allowed html tags has been increased and the markdown filter has been put in place to try to make commenting easier. I’ll put some examples into the comments of this post.
5 0.60977668 294 hunch net-2008-04-12-Blog compromised
Introduction: Iain noticed that hunch.net had zero width divs hiding spammy URLs. Some investigation reveals that the wordpress version being used (2.0.3) had security flaws. I’ve upgraded to the latest, rotated passwords, and removed the spammy URLs. I don’t believe any content was lost. You can check your own and other sites for a similar problem by greping for “width:0″ or “width: 0″ in the delivered html source.
6 0.60157174 25 hunch net-2005-02-20-At One Month
7 0.58809263 223 hunch net-2006-12-06-The Spam Problem
8 0.58784223 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
9 0.56653988 146 hunch net-2006-01-06-MLTV
10 0.56096774 363 hunch net-2009-07-09-The Machine Learning Forum
11 0.55345005 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge
12 0.5489198 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
13 0.54624188 208 hunch net-2006-09-18-What is missing for online collaborative research?
14 0.53447467 134 hunch net-2005-12-01-The Webscience Future
15 0.53303748 107 hunch net-2005-09-05-Site Update
16 0.52928084 1 hunch net-2005-01-19-Why I decided to run a weblog.
17 0.51708823 232 hunch net-2007-02-11-24
18 0.51593041 424 hunch net-2011-02-17-What does Watson mean?
19 0.50439155 93 hunch net-2005-07-13-“Sister Conference” presentations
20 0.498779 288 hunch net-2008-02-10-Complexity Illness
topicId topicWeight
[(10, 0.026), (27, 0.196), (38, 0.059), (48, 0.026), (49, 0.016), (53, 0.116), (55, 0.115), (78, 0.2), (83, 0.013), (94, 0.11), (95, 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.91721761 297 hunch net-2008-04-22-Taking the next step
Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag
2 0.89156145 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
3 0.88342375 316 hunch net-2008-09-04-Fall ML Conferences
Introduction: If you are in the New York area and interested in machine learning, consider submitting a 2 page abstract to the ML symposium by tomorrow (Sept 5th) midnight. It’s a fun one day affair on October 10 in an awesome location overlooking the world trade center site. A bit further off (but a real conference) is the AI and Stats deadline on November 5, to be held in Florida April 16-19.
4 0.865861 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
Introduction: Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems . It seems the answer to this is “yes”, in a fairly straightforward manner. The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y 1 ,y 2 ) in S where y 1 and y 2 are labels for two different tasks. Then, we could reprocess the data to the form S b (S) = {((x,i),y i ): (x,y 1 ,y 2 ) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y . Note that (x,i) is the (composite) input. At testing time, given an input x , we can query c for the predicted values of y 1 and y 2 using (x,1) and (x,2) . A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask
5 0.84790367 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
Introduction: I don’t consider myself a “Bayesian”, but I do try hard to understand why Bayesian learning works. For the purposes of this post, Bayesian learning is a simple process of: Specify a prior over world models. Integrate using Bayes law with respect to all observed information to compute a posterior over world models. Predict according to the posterior. Bayesian learning has many advantages over other learning programs: Interpolation Bayesian learning methods interpolate all the way to pure engineering. When faced with any learning problem, there is a choice of how much time and effort a human vs. a computer puts in. (For example, the mars rover pathfinding algorithms are almost entirely engineered.) When creating an engineered system, you build a model of the world and then find a good controller in that model. Bayesian methods interpolate to this extreme because the Bayesian prior can be a delta function on one model of the world. What this means is that a recipe
6 0.78656155 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
7 0.78638369 437 hunch net-2011-07-10-ICML 2011 and the future
8 0.78070027 141 hunch net-2005-12-17-Workshops as Franchise Conferences
9 0.78048557 95 hunch net-2005-07-14-What Learning Theory might do
10 0.78011954 207 hunch net-2006-09-12-Incentive Compatible Reviewing
11 0.77859378 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
12 0.77628946 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer
13 0.77338833 98 hunch net-2005-07-27-Not goal metrics
14 0.77316248 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
15 0.77232671 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
16 0.7710014 423 hunch net-2011-02-02-User preferences for search engines
17 0.76958644 134 hunch net-2005-12-01-The Webscience Future
18 0.76953548 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
19 0.76893085 370 hunch net-2009-09-18-Necessary and Sufficient Research
20 0.76875132 382 hunch net-2009-12-09-Future Publication Models @ NIPS