hunch_net hunch_net-2005 hunch_net-2005-28 knowledge-graph by maker-knowledge-mining

28 hunch net-2005-02-25-Problem: Online Learning


meta infos for this blog

Source: html

Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Despite my best intentions, this is not a fully specified problem, but rather a research direction. [sent-1, score-0.233]

2 Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. [sent-2, score-0.866]

3 Many people working on learning theory don’t care about particular applications much. [sent-6, score-0.218]

4 This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. [sent-7, score-0.448]

5 Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. [sent-9, score-0.343]

6 It requires that every hypothesis (called an expert in the online learning setting) be enumerated and tested on every example. [sent-10, score-0.489]

7 (This is similar to the inefficiency of using Bayes law as an algorithm directly, and there are strong similarities in the algorithms. [sent-11, score-0.576]

8 ) For an example of (1), there is a mysterious factor of 2 in the Binomial Weighting algorithm which has not been fully resolved. [sent-12, score-0.386]

9 Some succesful applications also exist such as those based on SNoW . [sent-13, score-0.183]

10 The way to combat problem (2) is to introduce more structure into the hypothesis/experts. [sent-14, score-0.326]

11 Some efforts have already been made in this direction. [sent-15, score-0.089]

12 For example, it’s generally feasible to introduce an arbitrary bias or “prior” over the experts in the form of some probability distribution, and perform well with respect to that bias. [sent-16, score-0.535]

13 Another nice piece of work by Adam Kalai and Santosh Vempala discusses how to efficiently handle several forms of structured experts. [sent-17, score-0.512]

14 At an intuitive level, further development discovering how to efficiently work with new forms of structure might payoff well. [sent-18, score-0.939]

15 The basic research direction here is to address the gap between theory and practice, possibly by solving the above problems and possibly by discovering and addressing other problems. [sent-19, score-0.83]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('inefficiency', 0.258), ('binomial', 0.226), ('introduce', 0.207), ('weighting', 0.193), ('discovering', 0.183), ('possibly', 0.174), ('forms', 0.161), ('efficiently', 0.161), ('perform', 0.153), ('fully', 0.142), ('mysterious', 0.129), ('payoff', 0.129), ('terribly', 0.129), ('online', 0.123), ('similarities', 0.12), ('usable', 0.12), ('structure', 0.119), ('theory', 0.117), ('algorithm', 0.115), ('constants', 0.113), ('winnow', 0.113), ('discusses', 0.108), ('intuitive', 0.103), ('applications', 0.101), ('kalai', 0.1), ('studies', 0.1), ('applicable', 0.097), ('every', 0.094), ('feasible', 0.094), ('produced', 0.094), ('specified', 0.091), ('rely', 0.091), ('gap', 0.091), ('addressing', 0.091), ('perspective', 0.091), ('tested', 0.091), ('adam', 0.089), ('efforts', 0.089), ('quantities', 0.089), ('therefore', 0.089), ('algorithms', 0.088), ('competitive', 0.087), ('expert', 0.087), ('law', 0.083), ('optimized', 0.083), ('development', 0.083), ('piece', 0.082), ('succesful', 0.082), ('majority', 0.082), ('form', 0.081)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 28 hunch net-2005-02-25-Problem: Online Learning

Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set

2 0.21092767 43 hunch net-2005-03-18-Binomial Weighting

Introduction: Suppose we have a set of classifiers c making binary predictions from an input x and we see examples in an online fashion. In particular, we repeatedly see an unlabeled example x , make a prediction y’ (possibly based on the classifiers c ), and then see the correct label y . When one of these classifiers is perfect, there is a great algorithm available: predict according to the majority vote over every classifier consistent with every previous example. This is called the Halving algorithm. It makes at most log 2 |c| mistakes since on any mistake, at least half of the classifiers are eliminated. Obviously, we can’t generally hope that the there exists a classifier which never errs. The Binomial Weighting algorithm is an elegant technique allowing a variant Halving algorithm to cope with errors by creating a set of virtual classifiers for every classifier which occasionally disagree with the original classifier. The Halving algorithm on this set of virtual clas

3 0.16548215 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

4 0.16121265 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les

5 0.13749221 8 hunch net-2005-02-01-NIPS: Online Bayes

Introduction: One nice use for this blog is to consider and discuss papers that that have appeared at recent conferences. I really enjoyed Andrew Ng and Sham Kakade’s paper Online Bounds for Bayesian Algorithms . From the paper: The philosophy taken in the Bayesian methodology is often at odds with that in the online learning community…. the online learning setting makes rather minimal assumptions on the conditions under which the data are being presented to the learner —usually, Nature could provide examples in an adversarial manner. We study the performance of Bayesian algorithms in a more adversarial setting… We provide competitive bounds when the cost function is the log loss, and we compare our performance to the best model in our model class (as in the experts setting). It’s a very nice analysis of some of my favorite algorithms that all hinges around a beautiful theorem: Let Q be any distribution over parameters theta. Then for all sequences S: L_{Bayes}(S) leq L_Q(S)

6 0.13147677 12 hunch net-2005-02-03-Learning Theory, by assumption

7 0.1270885 388 hunch net-2010-01-24-Specializations of the Master Problem

8 0.11866391 95 hunch net-2005-07-14-What Learning Theory might do

9 0.11610167 131 hunch net-2005-11-16-The Everything Ensemble Edge

10 0.11149801 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

11 0.1074748 454 hunch net-2012-01-30-ICML Posters and Scope

12 0.10598285 252 hunch net-2007-07-01-Watchword: Online Learning

13 0.10213345 235 hunch net-2007-03-03-All Models of Learning have Flaws

14 0.10110869 22 hunch net-2005-02-18-What it means to do research.

15 0.10109165 34 hunch net-2005-03-02-Prior, “Prior” and Bias

16 0.10064296 378 hunch net-2009-11-15-The Other Online Learning

17 0.098649532 68 hunch net-2005-05-10-Learning Reductions are Reductionist

18 0.097076021 267 hunch net-2007-10-17-Online as the new adjective

19 0.095091082 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

20 0.094562091 286 hunch net-2008-01-25-Turing’s Club for Machine Learning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.236), (1, 0.126), (2, -0.021), (3, 0.01), (4, 0.039), (5, -0.054), (6, 0.007), (7, 0.023), (8, 0.009), (9, 0.078), (10, 0.055), (11, -0.002), (12, 0.088), (13, -0.039), (14, 0.109), (15, 0.018), (16, 0.053), (17, -0.099), (18, -0.021), (19, -0.015), (20, -0.021), (21, -0.07), (22, 0.051), (23, -0.024), (24, 0.028), (25, -0.011), (26, 0.019), (27, 0.051), (28, 0.019), (29, 0.03), (30, 0.002), (31, 0.016), (32, -0.006), (33, 0.052), (34, 0.024), (35, 0.046), (36, 0.002), (37, -0.06), (38, 0.088), (39, -0.027), (40, -0.067), (41, 0.058), (42, -0.068), (43, -0.045), (44, 0.054), (45, 0.032), (46, -0.046), (47, 0.021), (48, -0.026), (49, 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95730728 28 hunch net-2005-02-25-Problem: Online Learning

Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set

2 0.80647641 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les

3 0.71150297 163 hunch net-2006-03-12-Online learning or online preservation of learning?

Introduction: In the online learning with experts setting, you observe a set of predictions, make a decision, and then observe the truth. This process repeats indefinitely. In this setting, it is possible to prove theorems of the sort: master algorithm error count < = k* best predictor error count + c*log(number of predictors) Is this a statement about learning or about preservation of learning? We did some experiments to analyze the new Binning algorithm which works in this setting. For several UCI datasets, we reprocessed them so that features could be used as predictors and then applied several master algorithms. The first graph confirms that Binning is indeed a better algorithm according to the tightness of the upper bound. Here, “Best” is the performance of the best expert. “V. Bound” is the bound for Vovk ‘s algorithm (the previous best). “Bound” is the bound for the Binning algorithm. “Binning” is the performance of the Binning algorithm. The Binning algorithm clearly h

4 0.70535243 397 hunch net-2010-05-02-What’s the difference between gambling and rewarding good prediction?

Introduction: After a major financial crisis , there is much discussion about how finance has become a casino gambling with other’s money, keeping the winnings, and walking away when the money is lost. When thinking about financial reform, all the many losers in the above scenario are apt to take the view that this activity should be completely, or nearly completely curtailed. But, a more thoughtful view is that sometimes there is a real sense in which there are right and wrong decisions, and we as a society would really prefer that the people most likely to make right decisions are making them. A crucial question then is: “What is the difference between gambling and rewarding good prediction?” We discussed this before the financial crisis . The cheat-sheet sketch is that the online learning against an adversary problem, algorithm, and theorems, provide a good mathematical model for thinking about this question. What I would like to do here is map this onto various types of financial t

5 0.69317824 43 hunch net-2005-03-18-Binomial Weighting

Introduction: Suppose we have a set of classifiers c making binary predictions from an input x and we see examples in an online fashion. In particular, we repeatedly see an unlabeled example x , make a prediction y’ (possibly based on the classifiers c ), and then see the correct label y . When one of these classifiers is perfect, there is a great algorithm available: predict according to the majority vote over every classifier consistent with every previous example. This is called the Halving algorithm. It makes at most log 2 |c| mistakes since on any mistake, at least half of the classifiers are eliminated. Obviously, we can’t generally hope that the there exists a classifier which never errs. The Binomial Weighting algorithm is an elegant technique allowing a variant Halving algorithm to cope with errors by creating a set of virtual classifiers for every classifier which occasionally disagree with the original classifier. The Halving algorithm on this set of virtual clas

6 0.66961426 235 hunch net-2007-03-03-All Models of Learning have Flaws

7 0.66696364 332 hunch net-2008-12-23-Use of Learning Theory

8 0.66189277 104 hunch net-2005-08-22-Do you believe in induction?

9 0.64364195 347 hunch net-2009-03-26-Machine Learning is too easy

10 0.64170146 8 hunch net-2005-02-01-NIPS: Online Bayes

11 0.63472283 133 hunch net-2005-11-28-A question of quantification

12 0.63393611 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

13 0.63385063 351 hunch net-2009-05-02-Wielding a New Abstraction

14 0.63230366 12 hunch net-2005-02-03-Learning Theory, by assumption

15 0.62246901 388 hunch net-2010-01-24-Specializations of the Master Problem

16 0.61834162 148 hunch net-2006-01-13-Benchmarks for RL

17 0.61132258 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

18 0.6092732 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

19 0.60872161 378 hunch net-2009-11-15-The Other Online Learning

20 0.60379231 252 hunch net-2007-07-01-Watchword: Online Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.023), (3, 0.049), (27, 0.218), (29, 0.282), (38, 0.068), (53, 0.114), (55, 0.075), (94, 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.87730336 28 hunch net-2005-02-25-Problem: Online Learning

Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set

2 0.83686388 357 hunch net-2009-05-30-Many ways to Learn this summer

Introduction: There are at least 3 summer schools related to machine learning this summer. The first is at University of Chicago June 1-11 organized by Misha Belkin , Partha Niyogi , and Steve Smale . Registration is closed for this one, meaning they met their capacity limit. The format is essentially an extended Tutorial/Workshop. I was particularly interested to see Valiant amongst the speakers. I’m also presenting Saturday June 6, on logarithmic time prediction. Praveen Srinivasan points out the second at Peking University in Beijing, China, July 20-27. This one differs substantially, as it is about vision, machine learning, and their intersection. The deadline for applications is June 10 or 15. This is also another example of the growth of research in China, with active support from NSF . The third one is at Cambridge , England, August 29-September 10. It’s in the MLSS series . Compared to the Chicago one, this one is more about the Bayesian side of ML, alth

3 0.74479985 409 hunch net-2010-09-13-AIStats

Introduction: Geoff Gordon points out AIStats 2011 in Ft. Lauderdale, Florida. The call for papers is now out, due Nov. 1. The plan is to experiment with the review process to encourage quality in several ways. I expect to submit a paper and would encourage others with good research to do likewise.

4 0.69602823 1 hunch net-2005-01-19-Why I decided to run a weblog.

Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but

5 0.68908358 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.

6 0.6829381 131 hunch net-2005-11-16-The Everything Ensemble Edge

7 0.67620504 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

8 0.67619151 19 hunch net-2005-02-14-Clever Methods of Overfitting

9 0.67130446 201 hunch net-2006-08-07-The Call of the Deep

10 0.66971421 41 hunch net-2005-03-15-The State of Tight Bounds

11 0.66912514 351 hunch net-2009-05-02-Wielding a New Abstraction

12 0.66908318 347 hunch net-2009-03-26-Machine Learning is too easy

13 0.66810435 95 hunch net-2005-07-14-What Learning Theory might do

14 0.66744149 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

15 0.6670627 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

16 0.66684389 370 hunch net-2009-09-18-Necessary and Sufficient Research

17 0.66610783 358 hunch net-2009-06-01-Multitask Poisoning

18 0.66519773 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

19 0.66511607 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

20 0.66497248 177 hunch net-2006-05-05-An ICML reject