hunch_net hunch_net-2006 hunch_net-2006-178 knowledge-graph by maker-knowledge-mining

178 hunch net-2006-05-08-Big machine learning


meta infos for this blog

Source: html

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 According to the New York Times , Yahoo is releasing Project Panama shortly . [sent-1, score-0.311]

2 Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . [sent-2, score-1.221]

3 There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. [sent-3, score-0.525]

4 The quantitative effect is huge—the estimate in the article is $600*10 6 . [sent-4, score-0.479]

5 Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. [sent-5, score-0.564]

6 The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. [sent-6, score-0.957]

7 Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. [sent-7, score-0.312]

8 For now, it’s reassuring to simply note that machine learning is having a big impact. [sent-8, score-0.325]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('panama', 0.405), ('yahoo', 0.209), ('project', 0.203), ('search', 0.192), ('implying', 0.19), ('income', 0.18), ('shortly', 0.167), ('quantitative', 0.167), ('reassuring', 0.167), ('war', 0.157), ('advertisements', 0.157), ('click', 0.157), ('releasing', 0.144), ('stories', 0.144), ('methodology', 0.139), ('portion', 0.139), ('future', 0.135), ('unlikely', 0.135), ('publicly', 0.131), ('microsoft', 0.131), ('exact', 0.121), ('competitive', 0.121), ('article', 0.119), ('accuracy', 0.116), ('surely', 0.112), ('estimate', 0.106), ('suggest', 0.105), ('higher', 0.103), ('hopefully', 0.103), ('google', 0.102), ('improved', 0.102), ('improvements', 0.1), ('york', 0.099), ('public', 0.091), ('times', 0.091), ('certainly', 0.09), ('huge', 0.09), ('predicting', 0.089), ('impact', 0.088), ('near', 0.088), ('discussed', 0.087), ('effect', 0.087), ('becomes', 0.086), ('already', 0.085), ('according', 0.085), ('per', 0.084), ('note', 0.08), ('machine', 0.078), ('relevant', 0.078), ('larger', 0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

2 0.24615458 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple

3 0.16417003 175 hunch net-2006-04-30-John Langford –> Yahoo Research, NY

Introduction: I will join Yahoo Research (in New York) after my contract ends at TTI-Chicago . The deciding reasons are: Yahoo is running into many hard learning problems. This is precisely the situation where basic research might hope to have the greatest impact. Yahoo Research understands research including publishing, conferences, etc… Yahoo Research is growing, so there is a chance I can help it grow well. Yahoo understands the internet, including (but not at all limited to) experimenting with research blogs. In the end, Yahoo Research seems like the place where I might have a chance to make the greatest difference. Yahoo (as a company) has made a strong bet on Yahoo Research. We-the-researchers all hope that bet will pay off, and this seems plausible. I’ll certainly have fun trying.

4 0.14611343 423 hunch net-2011-02-02-User preferences for search engines

Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later

5 0.13278544 464 hunch net-2012-05-03-Microsoft Research, New York City

Introduction: Yahoo! laid off people . Unlike every previous time there have been layoffs, this is serious for Yahoo! Research . We had advanced warning from Prabhakar through the simple act of leaving . Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment. The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there c

6 0.10816333 120 hunch net-2005-10-10-Predictive Search is Coming

7 0.091951631 345 hunch net-2009-03-08-Prediction Science

8 0.087060466 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

9 0.083955199 425 hunch net-2011-02-25-Yahoo! Machine Learning grant due March 11

10 0.079982616 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

11 0.077892177 424 hunch net-2011-02-17-What does Watson mean?

12 0.076582417 400 hunch net-2010-06-13-The Good News on Exploration and Learning

13 0.076240852 200 hunch net-2006-08-03-AOL’s data drop

14 0.075772032 389 hunch net-2010-02-26-Yahoo! ML events

15 0.072804093 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

16 0.072658494 51 hunch net-2005-04-01-The Producer-Consumer Model of Research

17 0.071898483 325 hunch net-2008-11-10-ICML Reviewing Criteria

18 0.069911972 490 hunch net-2013-11-09-Graduates and Postdocs

19 0.069892339 287 hunch net-2008-01-28-Sufficient Computation

20 0.067855626 475 hunch net-2012-10-26-ML Symposium and Strata-Hadoop World


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.143), (1, -0.013), (2, -0.101), (3, 0.054), (4, -0.051), (5, -0.011), (6, -0.037), (7, 0.046), (8, -0.113), (9, -0.043), (10, -0.012), (11, 0.078), (12, 0.005), (13, -0.023), (14, -0.112), (15, 0.072), (16, -0.087), (17, -0.048), (18, 0.047), (19, -0.124), (20, 0.155), (21, 0.007), (22, 0.079), (23, 0.051), (24, -0.023), (25, 0.14), (26, 0.123), (27, 0.038), (28, 0.002), (29, -0.048), (30, -0.039), (31, 0.036), (32, -0.047), (33, 0.0), (34, 0.086), (35, 0.094), (36, 0.028), (37, 0.002), (38, -0.059), (39, -0.015), (40, 0.013), (41, -0.05), (42, -0.088), (43, -0.03), (44, 0.079), (45, 0.04), (46, 0.096), (47, 0.129), (48, 0.071), (49, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95883679 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

2 0.77099401 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple

3 0.70139998 423 hunch net-2011-02-02-User preferences for search engines

Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later

4 0.67560798 175 hunch net-2006-04-30-John Langford –> Yahoo Research, NY

Introduction: I will join Yahoo Research (in New York) after my contract ends at TTI-Chicago . The deciding reasons are: Yahoo is running into many hard learning problems. This is precisely the situation where basic research might hope to have the greatest impact. Yahoo Research understands research including publishing, conferences, etc… Yahoo Research is growing, so there is a chance I can help it grow well. Yahoo understands the internet, including (but not at all limited to) experimenting with research blogs. In the end, Yahoo Research seems like the place where I might have a chance to make the greatest difference. Yahoo (as a company) has made a strong bet on Yahoo Research. We-the-researchers all hope that bet will pay off, and this seems plausible. I’ll certainly have fun trying.

5 0.60702455 464 hunch net-2012-05-03-Microsoft Research, New York City

Introduction: Yahoo! laid off people . Unlike every previous time there have been layoffs, this is serious for Yahoo! Research . We had advanced warning from Prabhakar through the simple act of leaving . Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment. The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there c

6 0.58520955 200 hunch net-2006-08-03-AOL’s data drop

7 0.57141095 120 hunch net-2005-10-10-Predictive Search is Coming

8 0.45923463 125 hunch net-2005-10-20-Machine Learning in the News

9 0.45777687 171 hunch net-2006-04-09-Progress in Machine Translation

10 0.43314803 173 hunch net-2006-04-17-Rexa is live

11 0.42220345 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

12 0.41958705 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

13 0.40150368 424 hunch net-2011-02-17-What does Watson mean?

14 0.39941916 457 hunch net-2012-02-29-Key Scientific Challenges and the Franklin Symposium

15 0.39424539 339 hunch net-2009-01-27-Key Scientific Challenges

16 0.38625196 490 hunch net-2013-11-09-Graduates and Postdocs

17 0.37549981 345 hunch net-2009-03-08-Prediction Science

18 0.37213677 117 hunch net-2005-10-03-Not ICML

19 0.36702681 128 hunch net-2005-11-05-The design of a computing cluster

20 0.36386347 32 hunch net-2005-02-27-Antilearning: When proximity goes bad


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(25, 0.334), (27, 0.179), (38, 0.045), (53, 0.068), (55, 0.097), (94, 0.165)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89660138 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

2 0.81446034 163 hunch net-2006-03-12-Online learning or online preservation of learning?

Introduction: In the online learning with experts setting, you observe a set of predictions, make a decision, and then observe the truth. This process repeats indefinitely. In this setting, it is possible to prove theorems of the sort: master algorithm error count < = k* best predictor error count + c*log(number of predictors) Is this a statement about learning or about preservation of learning? We did some experiments to analyze the new Binning algorithm which works in this setting. For several UCI datasets, we reprocessed them so that features could be used as predictors and then applied several master algorithms. The first graph confirms that Binning is indeed a better algorithm according to the tightness of the upper bound. Here, “Best” is the performance of the best expert. “V. Bound” is the bound for Vovk ‘s algorithm (the previous best). “Bound” is the bound for the Binning algorithm. “Binning” is the performance of the Binning algorithm. The Binning algorithm clearly h

3 0.79681963 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

4 0.72814023 134 hunch net-2005-12-01-The Webscience Future

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield

5 0.62366486 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

Introduction: This is a very difficult post to write, because it is about a perenially touchy subject. Nevertheless, it is an important one which needs to be thought about carefully. There are a few things which should be understood: The system is changing and responsive. We-the-authors are we-the-reviewers, we-the-PC, and even we-the-NIPS-board. NIPS has implemented ‘secondary program chairs’, ‘author response’, and ‘double blind reviewing’ in the last few years to help with the decision process, and more changes may happen in the future. Agreement creates a perception of correctness. When any PC meets and makes a group decision about a paper, there is a strong tendency for the reinforcement inherent in a group decision to create the perception of correctness. For the many people who have been on the NIPS PC it’s reasonable to entertain a healthy skepticism in the face of this reinforcing certainty. This post is about structural problems. What problems arise because of the structure

6 0.61443508 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

7 0.61233097 276 hunch net-2007-12-10-Learning Track of International Planning Competition

8 0.60163069 229 hunch net-2007-01-26-Parallel Machine Learning Problems

9 0.6009981 120 hunch net-2005-10-10-Predictive Search is Coming

10 0.59578872 95 hunch net-2005-07-14-What Learning Theory might do

11 0.59308761 96 hunch net-2005-07-21-Six Months

12 0.59188497 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

13 0.59144258 253 hunch net-2007-07-06-Idempotent-capable Predictors

14 0.59093535 423 hunch net-2011-02-02-User preferences for search engines

15 0.59081292 237 hunch net-2007-04-02-Contextual Scaling

16 0.58938146 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

17 0.58907753 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

18 0.58844078 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

19 0.58568501 297 hunch net-2008-04-22-Taking the next step

20 0.5843125 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning