hunch_net hunch_net-2006 hunch_net-2006-186 knowledge-graph by maker-knowledge-mining

186 hunch net-2006-06-24-Online convex optimization at COLT


meta infos for this blog

Source: html

Introduction: At ICML 2003 , Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T 0.5 ) with respect to the best predictor where T is the number of rounds. This seems to be a nice model for online learning, and there has been some significant follow-up work. At COLT 2006 Elad Hazan , Adam Kalai , Satyen Kale , and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire ) yielding some very fun graphs.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 At ICML 2003 , Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T 0. [sent-1, score-1.335]

2 5 ) with respect to the best predictor where T is the number of rounds. [sent-2, score-0.302]

3 This seems to be a nice model for online learning, and there has been some significant follow-up work. [sent-3, score-0.461]

4 At COLT 2006 Elad Hazan , Adam Kalai , Satyen Kale , and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. [sent-4, score-1.085]

5 Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire ) yielding some very fun graphs. [sent-5, score-0.806]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('marty', 0.217), ('newton', 0.217), ('zinkevich', 0.217), ('regret', 0.213), ('modification', 0.201), ('amit', 0.201), ('kale', 0.201), ('satyen', 0.201), ('management', 0.19), ('guaranteeing', 0.19), ('agarwal', 0.19), ('portfolio', 0.19), ('proposed', 0.174), ('kalai', 0.168), ('elad', 0.168), ('hazan', 0.168), ('graphs', 0.163), ('adam', 0.15), ('yielding', 0.15), ('schapire', 0.147), ('showed', 0.147), ('robert', 0.143), ('online', 0.138), ('convex', 0.138), ('fun', 0.135), ('icml', 0.135), ('presented', 0.129), ('nice', 0.125), ('descent', 0.119), ('takes', 0.115), ('optimization', 0.104), ('predictor', 0.103), ('gradient', 0.103), ('log', 0.099), ('step', 0.099), ('applied', 0.092), ('colt', 0.09), ('respect', 0.085), ('second', 0.085), ('model', 0.083), ('setting', 0.081), ('significant', 0.071), ('particular', 0.069), ('best', 0.062), ('first', 0.053), ('number', 0.052), ('algorithms', 0.049), ('algorithm', 0.049), ('seems', 0.044), ('learning', 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 186 hunch net-2006-06-24-Online convex optimization at COLT

Introduction: At ICML 2003 , Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T 0.5 ) with respect to the best predictor where T is the number of rounds. This seems to be a nice model for online learning, and there has been some significant follow-up work. At COLT 2006 Elad Hazan , Adam Kalai , Satyen Kale , and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire ) yielding some very fun graphs.

2 0.19300878 385 hunch net-2009-12-27-Interesting things at NIPS 2009

Introduction: Several papers at NIPS caught my attention. Elad Hazan and Satyen Kale , Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing . At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling. Mark Palatucci , Dean Pomerlau , Tom Mitchell , and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading. Sh

3 0.14412577 252 hunch net-2007-07-01-Watchword: Online Learning

Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra

4 0.14265883 111 hunch net-2005-09-12-Fast Gradient Descent

Introduction: Nic Schaudolph has been developing a fast gradient descent algorithm called Stochastic Meta-Descent (SMD). Gradient descent is currently untrendy in the machine learning community, but there remains a large number of people using gradient descent on neural networks or other architectures from when it was trendy in the early 1990s. There are three problems with gradient descent. Gradient descent does not necessarily produce easily reproduced results. Typical algorithms start with “set the initial parameters to small random values”. The design of the representation that gradient descent is applied to is often nontrivial. In particular, knowing exactly how to build a large neural network so that it will perform well requires knowledge which has not been made easily applicable. Gradient descent can be slow. Obviously, taking infinitesimal steps in the direction of the gradient would take forever, so some finite step size must be used. What exactly this step size should be

5 0.13134663 258 hunch net-2007-08-12-Exponentiated Gradient

Introduction: The Exponentiated Gradient algorithm by Manfred Warmuth and Jyrki Kivinen came out just as I was starting graduate school, so I missed it both at a conference and in class. It’s a fine algorithm which has a remarkable theoretical statement accompanying it. The essential statement holds in the “online learning with an adversary” setting. Initially, there are of set of n weights, which might have values (1/n,…,1/n) , (or any other values from a probability distribution). Everything happens in a round-by-round fashion. On each round, the following happens: The world reveals a set of features x in {0,1} n . In the online learning with an adversary literature, the features are called “experts” and thought of as subpredictors, but this interpretation isn’t necessary—you can just use feature values as experts (or maybe the feature value and the negation of the feature value as two experts). EG makes a prediction according to y’ = w . x (dot product). The world reve

6 0.13054883 8 hunch net-2005-02-01-NIPS: Online Bayes

7 0.12169927 403 hunch net-2010-07-18-ICML & COLT 2010

8 0.11699794 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

9 0.11372234 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

10 0.11246498 439 hunch net-2011-08-01-Interesting papers at COLT 2011

11 0.10536056 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.098699123 188 hunch net-2006-06-30-ICML papers

13 0.096333086 267 hunch net-2007-10-17-Online as the new adjective

14 0.095094889 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

15 0.094958581 309 hunch net-2008-07-10-Interesting papers, ICML 2008

16 0.094829366 388 hunch net-2010-01-24-Specializations of the Master Problem

17 0.089389406 14 hunch net-2005-02-07-The State of the Reduction

18 0.085661434 453 hunch net-2012-01-28-Why COLT?

19 0.085617401 490 hunch net-2013-11-09-Graduates and Postdocs

20 0.083321214 192 hunch net-2006-07-08-Some recent papers


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.136), (1, 0.047), (2, 0.038), (3, -0.123), (4, 0.089), (5, -0.033), (6, -0.045), (7, -0.122), (8, -0.12), (9, 0.105), (10, 0.088), (11, 0.01), (12, -0.059), (13, -0.08), (14, 0.17), (15, 0.104), (16, 0.065), (17, -0.079), (18, 0.063), (19, -0.013), (20, -0.098), (21, -0.052), (22, 0.032), (23, 0.045), (24, 0.121), (25, 0.102), (26, 0.039), (27, 0.043), (28, -0.039), (29, -0.014), (30, -0.012), (31, 0.031), (32, -0.074), (33, 0.093), (34, -0.054), (35, -0.011), (36, -0.088), (37, -0.089), (38, -0.041), (39, -0.087), (40, -0.016), (41, -0.056), (42, 0.026), (43, -0.01), (44, 0.042), (45, 0.042), (46, 0.022), (47, -0.035), (48, 0.033), (49, -0.012)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9803378 186 hunch net-2006-06-24-Online convex optimization at COLT

Introduction: At ICML 2003 , Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T 0.5 ) with respect to the best predictor where T is the number of rounds. This seems to be a nice model for online learning, and there has been some significant follow-up work. At COLT 2006 Elad Hazan , Adam Kalai , Satyen Kale , and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire ) yielding some very fun graphs.

2 0.61959034 385 hunch net-2009-12-27-Interesting things at NIPS 2009

Introduction: Several papers at NIPS caught my attention. Elad Hazan and Satyen Kale , Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing . At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling. Mark Palatucci , Dean Pomerlau , Tom Mitchell , and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading. Sh

3 0.56849259 258 hunch net-2007-08-12-Exponentiated Gradient

Introduction: The Exponentiated Gradient algorithm by Manfred Warmuth and Jyrki Kivinen came out just as I was starting graduate school, so I missed it both at a conference and in class. It’s a fine algorithm which has a remarkable theoretical statement accompanying it. The essential statement holds in the “online learning with an adversary” setting. Initially, there are of set of n weights, which might have values (1/n,…,1/n) , (or any other values from a probability distribution). Everything happens in a round-by-round fashion. On each round, the following happens: The world reveals a set of features x in {0,1} n . In the online learning with an adversary literature, the features are called “experts” and thought of as subpredictors, but this interpretation isn’t necessary—you can just use feature values as experts (or maybe the feature value and the negation of the feature value as two experts). EG makes a prediction according to y’ = w . x (dot product). The world reve

4 0.55265814 252 hunch net-2007-07-01-Watchword: Online Learning

Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra

5 0.52686709 267 hunch net-2007-10-17-Online as the new adjective

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

6 0.52022934 439 hunch net-2011-08-01-Interesting papers at COLT 2011

7 0.48596823 308 hunch net-2008-07-06-To Dual or Not

8 0.48322427 8 hunch net-2005-02-01-NIPS: Online Bayes

9 0.48158714 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

10 0.48136047 309 hunch net-2008-07-10-Interesting papers, ICML 2008

11 0.47121364 167 hunch net-2006-03-27-Gradients everywhere

12 0.45806444 111 hunch net-2005-09-12-Fast Gradient Descent

13 0.44196901 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

14 0.43843079 78 hunch net-2005-06-06-Exact Online Learning for Classification

15 0.43017039 403 hunch net-2010-07-18-ICML & COLT 2010

16 0.42446935 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

17 0.42429775 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

18 0.42074463 28 hunch net-2005-02-25-Problem: Online Learning

19 0.39703041 346 hunch net-2009-03-18-Parallel ML primitives

20 0.38934714 378 hunch net-2009-11-15-The Other Online Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.183), (45, 0.507), (55, 0.091), (94, 0.091)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90522093 186 hunch net-2006-06-24-Online convex optimization at COLT

Introduction: At ICML 2003 , Marty Zinkevich proposed the online convex optimization setting and showed that a particular gradient descent algorithm has regret O(T 0.5 ) with respect to the best predictor where T is the number of rounds. This seems to be a nice model for online learning, and there has been some significant follow-up work. At COLT 2006 Elad Hazan , Adam Kalai , Satyen Kale , and Amit Agarwal presented a modification which takes a Newton step guaranteeing O(log T) regret when the first and second derivatives are bounded. Then they applied these algorithms to portfolio management at ICML 2006 (with Robert Schapire ) yielding some very fun graphs.

2 0.61774874 218 hunch net-2006-11-20-Context and the calculation misperception

Introduction: This post is really for people not in machine learning (or related fields). It is about a common misperception which affects people who have not thought about the process of trying to predict somethinng. Hopefully, by precisely stating it, we can remove it. Suppose we have a set of events, each described by a vector of features. 0 1 0 1 1 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 1 1 0 Suppose we want to predict the value of the first feature given the others. One approach is to bin the data by one feature. For the above example, we might partition the data according to feature 2, then observe that when feature 2 is 0 the label (feature 1) is mostly 1. On the other hand, when feature 2 is 1, the label (feature 1) is mostly 0. Using this simple rule we get an observed error rate of 3/7. There are two issues here. The first is that this is really a training

3 0.40559444 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of

4 0.38893205 379 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)

Introduction: I’m the workshops chair for ICML this year. As such, I would like to personally encourage people to consider running a workshop. My general view of workshops is that they are excellent as opportunities to discuss and develop research directions—some of my best work has come from collaborations at workshops and several workshops have substantially altered my thinking about various problems. My experience running workshops is that setting them up and making them fly often appears much harder than it actually is, and the workshops often come off much better than expected in the end. Submissions are due January 18, two weeks before papers. Similarly, Ben Taskar is looking for good tutorials , which is complementary. Workshops are about exploring a subject, while a tutorial is about distilling it down into an easily taught essence, a vital part of the research process. Tutorials are due February 13, two weeks after papers.

5 0.38580614 325 hunch net-2008-11-10-ICML Reviewing Criteria

Introduction: Michael Littman and Leon Bottou have decided to use a franchise program chair approach to reviewing at ICML this year. I’ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me. I take reviewing seriously. That means papers to be reviewed are read, the implications are considered, and decisions are only made after that. I do my best to be fair, and there are zero subjects that I consider categorical rejects. I don’t consider several arguments for rejection-not-on-the-merits reasonable . I am generally interested in papers that (a) analyze new models of machine learning, (b) provide new algorithms, and (c) show that they work empirically on plausibly real problems. If a paper has the trifecta, I’m particularly interested. With 2 out of 3, I might be interested. I often find papers with only one element harder to accept, including papers with just (a). I’m a bit tough. I rarely jump-up-and-down about a paper, because I b

6 0.38288698 51 hunch net-2005-04-01-The Producer-Consumer Model of Research

7 0.38202947 315 hunch net-2008-09-03-Bidding Problems

8 0.37952808 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.37921131 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

10 0.37858981 95 hunch net-2005-07-14-What Learning Theory might do

11 0.3781957 44 hunch net-2005-03-21-Research Styles in Machine Learning

12 0.37808818 252 hunch net-2007-07-01-Watchword: Online Learning

13 0.3775413 96 hunch net-2005-07-21-Six Months

14 0.37679917 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

15 0.37670568 424 hunch net-2011-02-17-What does Watson mean?

16 0.37623936 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

17 0.37589243 345 hunch net-2009-03-08-Prediction Science

18 0.37552974 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

19 0.37549344 371 hunch net-2009-09-21-Netflix finishes (and starts)

20 0.37467524 304 hunch net-2008-06-27-Reviewing Horror Stories