hunch_net hunch_net-2007 hunch_net-2007-267 knowledge-graph by maker-knowledge-mining

267 hunch net-2007-10-17-Online as the new adjective

meta infos for this blog

Source: html

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Online learning is in vogue, which means we should expect to see in the near future: Online boosting. [sent-1, score-0.139]

2 (actually, we’ve already seen) Online deep learning. [sent-4, score-0.127]

3 etc… There are three fundamental drivers of this trend. [sent-6, score-0.077]

4 Increasing size of datasets makes online algorithms attractive. [sent-7, score-0.698]

5 Online learning can simply be more efficient than batch learning. [sent-8, score-0.239]

6 Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. [sent-9, score-3.073]

7 To see this, note that there is a minimal number of gradient updates (i. [sent-10, score-0.554]

8 2) required in order to reach the minima in the typical case. [sent-12, score-0.55]

9 Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i. [sent-13, score-0.92]

10 Note that this is the simplest possible setting—making the constraints nonlinear, increasing the dimensionality, introducing noise, etc… all make online learning preferred. [sent-16, score-1.213]

11 Online learning is conceptually harder, and hence more researchable. [sent-17, score-0.201]

12 The essential difficulty is that there is no well defined static global optima, as exists for many other optimization-based learning algorithms. [sent-18, score-0.487]

13 Grappling with this lack is of critical importance. [sent-19, score-0.138]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('online', 0.562), ('minima', 0.328), ('picture', 0.227), ('increasing', 0.204), ('updates', 0.198), ('batch', 0.173), ('constraints', 0.154), ('static', 0.136), ('introducing', 0.126), ('optima', 0.126), ('finding', 0.124), ('note', 0.122), ('conceptually', 0.119), ('etc', 0.113), ('dimensions', 0.105), ('fashion', 0.105), ('seeing', 0.099), ('dimensionality', 0.099), ('minimal', 0.097), ('nonlinear', 0.097), ('reach', 0.094), ('global', 0.086), ('possible', 0.084), ('simplest', 0.083), ('hence', 0.082), ('noise', 0.082), ('quickly', 0.082), ('parallel', 0.082), ('defined', 0.078), ('update', 0.078), ('three', 0.077), ('importance', 0.076), ('faster', 0.074), ('see', 0.072), ('harder', 0.072), ('critical', 0.071), ('datasets', 0.07), ('even', 0.067), ('lack', 0.067), ('near', 0.067), ('typical', 0.066), ('size', 0.066), ('efficient', 0.066), ('gradient', 0.065), ('already', 0.065), ('exists', 0.063), ('difficulty', 0.062), ('deep', 0.062), ('required', 0.062), ('essential', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 267 hunch net-2007-10-17-Online as the new adjective

2 0.30820951 252 hunch net-2007-07-01-Watchword: Online Learning

Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra

3 0.19450614 378 hunch net-2009-11-15-The Other Online Learning

Introduction: If you search for “online learning” with any major search engine , it’s interesting to note that zero of the results are for online machine learning. This may not be a mistake if you are committed to a global ordering. In other words, the number of people specifically interested in the least interesting top-10 online human learning result might exceed the number of people interested in online machine learning, even given the presence of the other 9 results. The essential observation here is that the process of human learning is a big business (around 5% of GDP) effecting virtually everyone. The internet is changing this dramatically, by altering the economics of teaching. Consider two possibilities: The classroom-style teaching environment continues as is, with many teachers for the same subject. All the teachers for one subject get together, along with perhaps a factor of 2 more people who are experts in online delivery. They spend a factor of 4 more time designing

4 0.17756985 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael BrÃƒÂ¼eckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair

5 0.16700713 8 hunch net-2005-02-01-NIPS: Online Bayes

Introduction: One nice use for this blog is to consider and discuss papers that that have appeared at recent conferences. I really enjoyed Andrew Ng and Sham Kakade’s paper Online Bounds for Bayesian Algorithms . From the paper: The philosophy taken in the Bayesian methodology is often at odds with that in the online learning community…. the online learning setting makes rather minimal assumptions on the conditions under which the data are being presented to the learner â€”usually, Nature could provide examples in an adversarial manner. We study the performance of Bayesian algorithms in a more adversarial setting… We provide competitive bounds when the cost function is the log loss, and we compare our performance to the best model in our model class (as in the experts setting). It’s a very nice analysis of some of my favorite algorithms that all hinges around a beautiful theorem: Let Q be any distribution over parameters theta. Then for all sequences S: L_{Bayes}(S) leq L_Q(S)

6 0.16082762 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

7 0.14822035 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

8 0.13953672 346 hunch net-2009-03-18-Parallel ML primitives

9 0.1390802 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

10 0.12671365 258 hunch net-2007-08-12-Exponentiated Gradient

11 0.1238158 388 hunch net-2010-01-24-Specializations of the Master Problem

12 0.11756741 385 hunch net-2009-12-27-Interesting things at NIPS 2009

13 0.11165247 134 hunch net-2005-12-01-The Webscience Future

14 0.10968354 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

15 0.10502458 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

16 0.10292365 308 hunch net-2008-07-06-To Dual or Not

17 0.10275127 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

18 0.10200346 309 hunch net-2008-07-10-Interesting papers, ICML 2008

19 0.099042587 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

20 0.097817086 43 hunch net-2005-03-18-Binomial Weighting

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.189), (1, 0.103), (2, -0.043), (3, -0.021), (4, 0.144), (5, -0.001), (6, -0.146), (7, -0.082), (8, -0.092), (9, 0.23), (10, 0.108), (11, -0.009), (12, 0.142), (13, -0.142), (14, 0.114), (15, 0.013), (16, 0.043), (17, -0.084), (18, 0.098), (19, 0.03), (20, -0.049), (21, -0.079), (22, -0.014), (23, 0.028), (24, 0.123), (25, 0.056), (26, 0.081), (27, 0.003), (28, 0.004), (29, 0.061), (30, 0.081), (31, 0.087), (32, -0.026), (33, -0.033), (34, -0.02), (35, 0.058), (36, 0.107), (37, 0.032), (38, -0.1), (39, -0.039), (40, 0.025), (41, 0.123), (42, -0.022), (43, -0.019), (44, -0.068), (45, -0.006), (46, -0.087), (47, 0.078), (48, -0.011), (49, -0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98994756 267 hunch net-2007-10-17-Online as the new adjective

2 0.92167652 252 hunch net-2007-07-01-Watchword: Online Learning

3 0.74586874 308 hunch net-2008-07-06-To Dual or Not

Introduction: Yoram and Shai ‘s online learning tutorial at ICML brings up a question for me, “Why use the dual ?” The basic setting is learning a weight vector w i so that the function f(x)= sum i w i x i optimizes some convex loss function. The functional view of the dual is that instead of (or in addition to) keeping track of w i over the feature space, you keep track of a vector a j over the examples and define w i = sum j a j x ji . The above view of duality makes operating in the dual appear unnecessary, because in the end a weight vector is always used. The tutorial suggests that thinking about the dual gives a unified algorithmic font for deriving online learning algorithms. I haven’t worked with the dual representation much myself, but I have seen a few examples where it appears helpful. Noise When doing online optimization (i.e. online learning where you are allowed to look at individual examples multiple times), the dual representation may be helpfu

4 0.74315333 378 hunch net-2009-11-15-The Other Online Learning

5 0.71898878 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les

6 0.66114521 385 hunch net-2009-12-27-Interesting things at NIPS 2009

7 0.66031712 8 hunch net-2005-02-01-NIPS: Online Bayes

8 0.63889116 451 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial

9 0.60658163 436 hunch net-2011-06-22-Ultra LDA

10 0.60445774 186 hunch net-2006-06-24-Online convex optimization at COLT

11 0.59356117 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

12 0.5889886 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

13 0.57148075 346 hunch net-2009-03-18-Parallel ML primitives

14 0.55218732 258 hunch net-2007-08-12-Exponentiated Gradient

15 0.54234505 28 hunch net-2005-02-25-Problem: Online Learning

16 0.53180009 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

17 0.5266614 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

18 0.52600861 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch

19 0.51916105 163 hunch net-2006-03-12-Online learning or online preservation of learning?

20 0.51292711 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.023), (21, 0.245), (27, 0.251), (38, 0.027), (53, 0.074), (55, 0.049), (94, 0.113), (95, 0.1)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9707337 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for

2 0.92778838 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi

same-blog 3 0.90412283 267 hunch net-2007-10-17-Online as the new adjective

4 0.89059216 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

5 0.87895453 369 hunch net-2009-08-27-New York Area Machine Learning Events

Introduction: Several events are happening in the NY area. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence ) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions. The Singularity Summit October 3&4 . This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT . Predictive Analytics World, Oct 20-21 . This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable.

6 0.83285296 378 hunch net-2009-11-15-The Other Online Learning

7 0.77547729 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

8 0.76702315 194 hunch net-2006-07-11-New Models

9 0.76388723 360 hunch net-2009-06-15-In Active Learning, the question changes

10 0.75665784 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

11 0.75585175 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

12 0.75440025 370 hunch net-2009-09-18-Necessary and Sufficient Research

13 0.75387162 351 hunch net-2009-05-02-Wielding a New Abstraction

14 0.75368816 235 hunch net-2007-03-03-All Models of Learning have Flaws

15 0.7534979 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

16 0.75287455 371 hunch net-2009-09-21-Netflix finishes (and starts)

17 0.75082362 347 hunch net-2009-03-26-Machine Learning is too easy

18 0.75055587 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

19 0.75042957 12 hunch net-2005-02-03-Learning Theory, by assumption

20 0.74726111 95 hunch net-2005-07-14-What Learning Theory might do