hunch_net hunch_net-2011 hunch_net-2011-439 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order. The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentr
sentIndex sentText sentNum sentScore
1 Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. [sent-1, score-0.195]
2 The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. [sent-2, score-0.447]
3 The first session on matrices seemed interesting for two reasons. [sent-4, score-0.435]
4 But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. [sent-6, score-0.113]
5 So it was good to see high-dimensional matrices finally make their entry at COLT. [sent-7, score-0.118]
6 Both the papers seemed to share the flavor of a learning theorists’ take at compressed sensing that I enjoyed seeing. [sent-9, score-0.619]
7 The best student paper by Amit, Sivan and Shai 2 on Multiclass Learnability and the ERM principle showed a crucial distinction between binary and multiclass classification. [sent-10, score-0.55]
8 Every ERM procedure is not equally good for multiclass classification. [sent-11, score-0.3]
9 In particular, there are multiclass problems where some ERM learners succeed while others are inconsistent, in sharp contrast to the binary case. [sent-12, score-0.269]
10 They also present some intuition on what characterizes a good ERM procedure for the multiclass setting. [sent-13, score-0.371]
11 I enjoyed all the three papers in the online learning session quite a bit. [sent-14, score-0.477]
12 Jake , Elad and Peter show the equivalence of Blackwell approachability and low regret in their paper Blackwell Approachability and No-Regret Learning are Equivalent , with applications to efficient algorithms for calibration. [sent-15, score-0.629]
13 Their paper with Dean on Complexity-Based Approach to Calibration with Checking Rules showed a nice application of these techniques to the calibration problem. [sent-17, score-0.423]
14 The impromptu session was quite a hit this year with ~15 talks. [sent-18, score-0.33]
15 I was quite disappointed to see none of them turn up on my NIPS review stack Rob managed to save some money by solving his open problem from last COLT, together with Indraneel and Cynthia . [sent-19, score-0.098]
16 Their paper The Rate of Convergence of AdaBoost was interesting as it made me realize how much difference the boundedness of rates can make to the theoretical properties of an algorithm. [sent-20, score-0.256]
17 The way this paper gets around and pays a penalty for these issues seemed quite interesting. [sent-22, score-0.453]
18 I also liked the paper by Gabor , David and Csaba on Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments . [sent-23, score-0.185]
19 This paper shows a tight characterization of partial regret games that have an optimal regret of 0, T 1/2 , T 2/3 or T. [sent-24, score-0.972]
20 While some sufficient conditions one way or the other were known before, theirs is a first complete characterization in my knowledge. [sent-25, score-0.124]
wordName wordTfidf (topN-words)
[('erm', 0.227), ('regret', 0.209), ('multiclass', 0.197), ('adaboost', 0.186), ('paper', 0.185), ('seemed', 0.17), ('games', 0.17), ('approachability', 0.159), ('averages', 0.159), ('blackwell', 0.159), ('session', 0.147), ('calibration', 0.142), ('trace', 0.142), ('rademacher', 0.131), ('minimax', 0.131), ('characterization', 0.124), ('stuff', 0.124), ('norm', 0.118), ('matrices', 0.118), ('compressed', 0.113), ('sensing', 0.113), ('shai', 0.106), ('procedure', 0.103), ('sequential', 0.103), ('matrix', 0.098), ('quite', 0.098), ('showed', 0.096), ('convergence', 0.09), ('colt', 0.088), ('year', 0.085), ('beyond', 0.084), ('guarantees', 0.084), ('three', 0.08), ('enjoyed', 0.079), ('show', 0.076), ('shows', 0.075), ('papers', 0.073), ('binary', 0.072), ('boundedness', 0.071), ('inconsistent', 0.071), ('favorites', 0.071), ('bounding', 0.071), ('ambuj', 0.071), ('volunteered', 0.071), ('flavor', 0.071), ('characterizes', 0.071), ('cynthia', 0.071), ('karthik', 0.071), ('reconstruction', 0.071), ('theorists', 0.066)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000002 439 hunch net-2011-08-01-Interesting papers at COLT 2011
Introduction: Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order. The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentr
2 0.18433957 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
3 0.18006451 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
Introduction: Here’s a list of papers that I found interesting at ICML / COLT / UAI in 2009. Elad Hazan and Comandur Seshadhri Efficient learning algorithms for changing environments at ICML. This paper shows how to adapt learning algorithms that compete with fixed predictors to compete with changing policies. The definition of regret they deal with seems particularly useful in many situation. Hal Daume , Unsupervised Search-based Structured Prediction at ICML. This paper shows a technique for reducing unsupervised learning to supervised learning which (a) make a fast unsupervised learning algorithm and (b) makes semisupervised learning both easy and highly effective. There were two papers with similar results on active learning in the KWIK framework for linear regression, both reducing the sample complexity to . One was Nicolo Cesa-Bianchi , Claudio Gentile , and Francesco Orabona Robust Bounds for Classification via Selective Sampling at ICML and the other was Thoma
4 0.16193612 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
5 0.15399297 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
6 0.150489 88 hunch net-2005-07-01-The Role of Impromptu Talks
7 0.12865801 14 hunch net-2005-02-07-The State of the Reduction
8 0.11310364 453 hunch net-2012-01-28-Why COLT?
9 0.11246498 186 hunch net-2006-06-24-Online convex optimization at COLT
10 0.11211126 309 hunch net-2008-07-10-Interesting papers, ICML 2008
11 0.10533655 406 hunch net-2010-08-22-KDD 2010
12 0.10301027 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
13 0.10057335 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS
14 0.098034941 304 hunch net-2008-06-27-Reviewing Horror Stories
15 0.093582571 437 hunch net-2011-07-10-ICML 2011 and the future
16 0.092691414 318 hunch net-2008-09-26-The SODA Program Committee
17 0.091400459 454 hunch net-2012-01-30-ICML Posters and Scope
18 0.091214359 403 hunch net-2010-07-18-ICML & COLT 2010
19 0.090133257 343 hunch net-2009-02-18-Decision by Vetocracy
20 0.089218706 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
topicId topicWeight
[(0, 0.206), (1, -0.015), (2, 0.082), (3, -0.083), (4, 0.061), (5, 0.003), (6, 0.069), (7, -0.111), (8, -0.187), (9, -0.03), (10, 0.071), (11, 0.038), (12, -0.133), (13, -0.08), (14, 0.031), (15, -0.01), (16, 0.015), (17, 0.079), (18, 0.13), (19, -0.003), (20, -0.025), (21, 0.04), (22, 0.023), (23, 0.028), (24, 0.11), (25, 0.019), (26, 0.017), (27, 0.005), (28, 0.048), (29, 0.024), (30, -0.024), (31, -0.005), (32, -0.035), (33, 0.007), (34, -0.033), (35, -0.046), (36, -0.058), (37, 0.023), (38, 0.018), (39, 0.031), (40, 0.023), (41, 0.03), (42, 0.03), (43, 0.009), (44, -0.007), (45, -0.021), (46, -0.008), (47, -0.08), (48, 0.019), (49, 0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.96725351 439 hunch net-2011-08-01-Interesting papers at COLT 2011
Introduction: Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order. The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentr
2 0.70377225 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
Introduction: Here’s a list of papers that I found interesting at ICML / COLT / UAI in 2009. Elad Hazan and Comandur Seshadhri Efficient learning algorithms for changing environments at ICML. This paper shows how to adapt learning algorithms that compete with fixed predictors to compete with changing policies. The definition of regret they deal with seems particularly useful in many situation. Hal Daume , Unsupervised Search-based Structured Prediction at ICML. This paper shows a technique for reducing unsupervised learning to supervised learning which (a) make a fast unsupervised learning algorithm and (b) makes semisupervised learning both easy and highly effective. There were two papers with similar results on active learning in the KWIK framework for linear regression, both reducing the sample complexity to . One was Nicolo Cesa-Bianchi , Claudio Gentile , and Francesco Orabona Robust Bounds for Classification via Selective Sampling at ICML and the other was Thoma
3 0.63560039 199 hunch net-2006-07-26-Two more UAI papers of interest
Introduction: In addition to Ed Snelson’s paper, there were (at least) two other papers that caught my eye at UAI. One was this paper by Sanjoy Dasgupta, Daniel Hsu and Nakul Verma at UCSD which shows in a surprisingly general and strong way that almost all linear projections of any jointly distributed vector random variable with finite first and second moments look sphereical and unimodal (in fact look like a scale mixture of Gaussians). Great result, as you’d expect from Sanjoy. The other paper which I found intriguing but which I just haven’t groked yet is this beast by Manfred and Dima Kuzmin. You can check out the (beautiful) slides if that helps. I feel like there is something deep here, but my brain is too small to understand it. The COLT and last NIPS papers/slides are also on Manfred’s page. Hopefully someone here can illuminate.
4 0.61303782 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
5 0.58058161 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction
6 0.53976983 309 hunch net-2008-07-10-Interesting papers, ICML 2008
7 0.53472626 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
8 0.530783 403 hunch net-2010-07-18-ICML & COLT 2010
9 0.52634072 186 hunch net-2006-06-24-Online convex optimization at COLT
10 0.52220249 88 hunch net-2005-07-01-The Role of Impromptu Talks
11 0.52198637 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models
12 0.51798409 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
13 0.50622898 304 hunch net-2008-06-27-Reviewing Horror Stories
14 0.50285822 385 hunch net-2009-12-27-Interesting things at NIPS 2009
15 0.50199544 448 hunch net-2011-10-24-2011 ML symposium and the bears
16 0.5007025 185 hunch net-2006-06-16-Regularization = Robustness
17 0.49852249 189 hunch net-2006-07-05-more icml papers
18 0.49623308 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
19 0.49586329 406 hunch net-2010-08-22-KDD 2010
20 0.48961717 188 hunch net-2006-06-30-ICML papers
topicId topicWeight
[(1, 0.017), (4, 0.018), (10, 0.018), (27, 0.144), (30, 0.017), (38, 0.099), (51, 0.04), (53, 0.024), (55, 0.092), (77, 0.034), (82, 0.292), (89, 0.014), (94, 0.055), (95, 0.05)]
simIndex simValue blogId blogTitle
same-blog 1 0.85863024 439 hunch net-2011-08-01-Interesting papers at COLT 2011
Introduction: Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order. The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentr
2 0.81897771 285 hunch net-2008-01-23-Why Workshop?
Introduction: I second the call for workshops at ICML/COLT/UAI . Several times before , details of why and how to run a workshop have been mentioned. There is a simple reason to prefer workshops here: attendance. The Helsinki colocation has placed workshops directly between ICML and COLT/UAI , which is optimal for getting attendees from any conference. In addition, last year ICML had relatively few workshops and NIPS workshops were overloaded. In addition to those that happened a similar number were rejected. The overload has strange consequences—for example, the best attended workshop wasn’t an official NIPS workshop. Aside from intrinsic interest, the Deep Learning workshop benefited greatly from being off schedule.
3 0.66481662 237 hunch net-2007-04-02-Contextual Scaling
Introduction: Machine learning has a new kind of “scaling to larger problems” to worry about: scaling with the amount of contextual information. The standard development path for a machine learning application in practice seems to be the following: Marginal . In the beginning, there was “majority vote”. At this stage, it isn’t necessary to understand that you have a prediction problem. People just realize that one answer is right sometimes and another answer other times. In machine learning terms, this corresponds to making a prediction without side information. First context . A clever person realizes that some bit of information x 1 could be helpful. If x 1 is discrete, they condition on it and make a predictor h(x 1 ) , typically by counting. If they are clever, then they also do some smoothing. If x 1 is some real valued parameter, it’s very common to make a threshold cutoff. Often, these tasks are simply done by hand. Second . Another clever person (or perhaps the s
4 0.55568475 353 hunch net-2009-05-08-Computability in Artificial Intelligence
Introduction: Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions. The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic). Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to
5 0.5530535 233 hunch net-2007-02-16-The Forgetting
Introduction: How many papers do you remember from 2006? 2005? 2002? 1997? 1987? 1967? One way to judge this would be to look at the citations of the papers you write—how many came from which year? For myself, the answers on recent papers are: year 2006 2005 2002 1997 1987 1967 count 4 10 5 1 0 0 This spectrum is fairly typical of papers in general. There are many reasons that citations are focused on recent papers. The number of papers being published continues to grow. This is not a very significant effect, because the rate of publication has not grown nearly as fast. Dead men don’t reject your papers for not citing them. This reason seems lame, because it’s a distortion from the ideal of science. Nevertheless, it must be stated because the effect can be significant. In 1997, I started as a PhD student. Naturally, papers after 1997 are better remembered because they were absorbed in real time. A large fraction of people writing papers and a
6 0.54434407 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
7 0.54114079 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
8 0.53992212 235 hunch net-2007-03-03-All Models of Learning have Flaws
9 0.53670579 75 hunch net-2005-05-28-Running A Machine Learning Summer School
10 0.53608948 36 hunch net-2005-03-05-Funding Research
11 0.53433716 406 hunch net-2010-08-22-KDD 2010
12 0.53323025 343 hunch net-2009-02-18-Decision by Vetocracy
13 0.53035963 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
14 0.53009123 464 hunch net-2012-05-03-Microsoft Research, New York City
15 0.52971148 454 hunch net-2012-01-30-ICML Posters and Scope
16 0.52946377 437 hunch net-2011-07-10-ICML 2011 and the future
17 0.52906239 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
18 0.52862525 403 hunch net-2010-07-18-ICML & COLT 2010
19 0.52857101 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
20 0.5277462 259 hunch net-2007-08-19-Choice of Metrics