hunch_net hunch_net-2005 hunch_net-2005-120 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: “Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon
sentIndex sentText sentNum sentScore
1 A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. [sent-3, score-0.907]
2 Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. [sent-5, score-0.664]
3 However, the big successful search-based systems have typically not used “smart” search algorithms. [sent-6, score-0.471]
4 This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. [sent-8, score-0.537]
5 The basic claim of this post is that we will see more and stronger successes of the sort exemplified by Knightcap and TD-gammon and less of those exemplified by the “pure” search techniques. [sent-10, score-0.831]
6 Here are two reasons: The strengths of computers are changing. [sent-11, score-0.321]
7 Computers have long held a huge advantage over humans in two areas: Short term memory . [sent-12, score-0.588]
8 What I mean by ‘sequential execution speed’ is simply time ordered instruction execution with arbitrary data dependencies. [sent-17, score-0.682]
9 Neurons in a human brain might be able to fire at a rate of 10 2 or 10 3 cycles/second (depending on who you ask), while computers can run at 10 9 cycles/second. [sent-18, score-0.343]
10 Sequential execution speed growth appears stalled for the moment due to heat issues. [sent-19, score-0.448]
11 These two points of excellence are precisely what search algorithms typically require for good performance which partially explains why the human-preferred method of search is so different from the computer-preferred method. [sent-20, score-0.951]
12 The advantages of humans have traditionally been: Significant long term memory For example humans might have a significant advantage using a smarter search technique becaue they can use previous search-based successes and failures to bias search on new problems. [sent-21, score-1.664]
13 Chip designers, who are stymied in the production of sequential execution speed are parallelizing via speculative execution, special instruction sets, “hyperthreading”, dual core processors, SMP machines, and clusters. [sent-25, score-0.79]
14 It will be awhile before computers can execute as many parallel instructions as computer processors, but not as long as you might expect—the cell processor claims 0. [sent-26, score-0.414]
15 Putting these observations together, we see that computers are becoming more ’rounded’ in their strengths which suggests highly parallelizable algorithms using large quantities of memory may become more viable solutions for search problems. [sent-28, score-1.063]
16 Evidence of this can be seen in the UCI machine learning repository where most learning problems are ‘projected’ (via throwing away extra information) into binary or multiclass prediction even when the original problem was significantly different. [sent-32, score-0.31]
17 At the theoretical level, it is difficult to even think about learning problems without considering some natural distribution over the instances we need to predict. [sent-36, score-0.332]
18 We now have a number of techniques for exposing this information to learning machines and can use much richer side information. [sent-42, score-0.345]
19 The basic claim here is that understanding these two concepts is key to mixing search and prediction. [sent-43, score-0.47]
20 Considering the ‘natural distribution’ is important because search systems create their own distribution over instances. [sent-44, score-0.53]
wordName wordTfidf (topN-words)
[('search', 0.414), ('execution', 0.29), ('computers', 0.21), ('sequential', 0.161), ('speed', 0.158), ('memory', 0.157), ('humans', 0.133), ('exemplified', 0.125), ('knightcap', 0.125), ('distribution', 0.116), ('classification', 0.113), ('exposing', 0.111), ('richer', 0.111), ('parallelizable', 0.111), ('strengths', 0.111), ('successes', 0.111), ('champion', 0.102), ('chess', 0.102), ('instruction', 0.102), ('processors', 0.097), ('multiclass', 0.092), ('advantage', 0.084), ('binary', 0.084), ('natural', 0.083), ('term', 0.08), ('core', 0.079), ('long', 0.078), ('instances', 0.073), ('brain', 0.073), ('level', 0.07), ('remember', 0.069), ('prediction', 0.068), ('away', 0.066), ('parallel', 0.066), ('performance', 0.066), ('machines', 0.064), ('direction', 0.061), ('might', 0.06), ('highly', 0.06), ('considering', 0.06), ('side', 0.059), ('typically', 0.057), ('claim', 0.056), ('huge', 0.056), ('pruning', 0.055), ('projected', 0.055), ('projecting', 0.055), ('qualities', 0.055), ('degrees', 0.055), ('exemplar', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999928 120 hunch net-2005-10-10-Predictive Search is Coming
Introduction: “Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon
2 0.17184018 156 hunch net-2006-02-11-Yahoo’s Learning Problems.
Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple
3 0.17137507 345 hunch net-2009-03-08-Prediction Science
Introduction: One view of machine learning is that it’s about how to program computers to predict well. This suggests a broader research program centered around the more pervasive goal of simply predicting well. There are many distinct strands of this broader research program which are only partially unified. Here are the ones that I know of: Learning Theory . Learning theory focuses on several topics related to the dynamics and process of prediction. Convergence bounds like the VC bound give an intellectual foundation to many learning algorithms. Online learning algorithms like Weighted Majority provide an alternate purely game theoretic foundation for learning. Boosting algorithms yield algorithms for purifying prediction abiliity. Reduction algorithms provide means for changing esoteric problems into well known ones. Machine Learning . A great deal of experience has accumulated in practical algorithm design from a mixture of paradigms, including bayesian, biological, opt
4 0.16370241 423 hunch net-2011-02-02-User preferences for search engines
Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later
5 0.15537947 200 hunch net-2006-08-03-AOL’s data drop
Introduction: AOL has released several large search engine related datasets. This looks like a pretty impressive data release, and it is a big opportunity for people everywhere to worry about search engine related learning problems, if they want.
6 0.14409606 229 hunch net-2007-01-26-Parallel Machine Learning Problems
7 0.14219536 424 hunch net-2011-02-17-What does Watson mean?
8 0.11363708 352 hunch net-2009-05-06-Machine Learning to AI
9 0.10961454 14 hunch net-2005-02-07-The State of the Reduction
10 0.10816333 178 hunch net-2006-05-08-Big machine learning
11 0.10497274 378 hunch net-2009-11-15-The Other Online Learning
12 0.1047308 420 hunch net-2010-12-26-NIPS 2010
13 0.10445993 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.10349647 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
15 0.10299149 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
16 0.10290128 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
17 0.10061527 269 hunch net-2007-10-24-Contextual Bandits
18 0.10014088 190 hunch net-2006-07-06-Branch Prediction Competition
19 0.099507727 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
20 0.098012246 235 hunch net-2007-03-03-All Models of Learning have Flaws
topicId topicWeight
[(0, 0.264), (1, 0.098), (2, -0.062), (3, 0.033), (4, -0.019), (5, -0.023), (6, 0.032), (7, 0.058), (8, -0.047), (9, -0.01), (10, -0.135), (11, 0.047), (12, -0.006), (13, 0.033), (14, -0.1), (15, 0.033), (16, 0.054), (17, -0.035), (18, 0.059), (19, -0.071), (20, 0.088), (21, -0.002), (22, -0.036), (23, 0.107), (24, -0.014), (25, 0.115), (26, 0.153), (27, -0.03), (28, 0.06), (29, 0.048), (30, -0.029), (31, 0.014), (32, -0.018), (33, -0.071), (34, 0.048), (35, -0.036), (36, -0.03), (37, 0.023), (38, -0.005), (39, -0.028), (40, 0.029), (41, -0.042), (42, -0.037), (43, -0.061), (44, 0.064), (45, -0.048), (46, 0.046), (47, 0.014), (48, 0.086), (49, 0.029)]
simIndex simValue blogId blogTitle
same-blog 1 0.95726192 120 hunch net-2005-10-10-Predictive Search is Coming
Introduction: “Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon
2 0.73520684 423 hunch net-2011-02-02-User preferences for search engines
Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later
3 0.68733954 178 hunch net-2006-05-08-Big machine learning
Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.
4 0.66928625 424 hunch net-2011-02-17-What does Watson mean?
Introduction: Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work. Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success. Alina invited me to the Jeopardy watching party
5 0.6634717 156 hunch net-2006-02-11-Yahoo’s Learning Problems.
Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple
6 0.66009158 287 hunch net-2008-01-28-Sufficient Computation
7 0.63795507 200 hunch net-2006-08-03-AOL’s data drop
8 0.63731086 128 hunch net-2005-11-05-The design of a computing cluster
9 0.63628256 345 hunch net-2009-03-08-Prediction Science
10 0.60639435 352 hunch net-2009-05-06-Machine Learning to AI
11 0.55827266 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
12 0.55724269 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
13 0.55215371 237 hunch net-2007-04-02-Contextual Scaling
14 0.53951252 112 hunch net-2005-09-14-The Predictionist Viewpoint
15 0.52662885 366 hunch net-2009-08-03-Carbon in Computer Science Research
16 0.52508235 276 hunch net-2007-12-10-Learning Track of International Planning Competition
17 0.52343154 411 hunch net-2010-09-21-Regretting the dead
18 0.52098632 153 hunch net-2006-02-02-Introspectionism as a Disease
19 0.52082586 190 hunch net-2006-07-06-Branch Prediction Competition
20 0.51864988 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem
topicId topicWeight
[(0, 0.011), (1, 0.029), (3, 0.032), (10, 0.022), (27, 0.184), (37, 0.011), (38, 0.048), (49, 0.012), (53, 0.028), (55, 0.06), (94, 0.45), (95, 0.02)]
simIndex simValue blogId blogTitle
1 0.9880715 42 hunch net-2005-03-17-Going all the Way, Sometimes
Introduction: At many points in research, you face a choice: should I keep on improving some old piece of technology or should I do something new? For example: Should I refine bounds to make them tighter? Should I take some learning theory and turn it into a learning algorithm? Should I implement the learning algorithm? Should I test the learning algorithm widely? Should I release the algorithm as source code? Should I go see what problems people actually need to solve? The universal temptation of people attracted to research is doing something new. That is sometimes the right decision, but is also often not. I’d like to discuss some reasons why not. Expertise Once expertise are developed on some subject, you are the right person to refine them. What is the real problem? Continually improving a piece of technology is a mechanism forcing you to confront this question. In many cases, this confrontation is uncomfortable because you discover that your method has fundamen
2 0.98469549 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
Introduction: Chicago ’05 ended a couple of weeks ago. This was the sixth Machine Learning Summer School , and the second one that used a wiki . (The first was Berder ’04, thanks to Gunnar Raetsch.) Wikis are relatively easy to set up, greatly aid social interaction, and should be used a lot more at summer schools and workshops. They can even be used as the meeting’s webpage, as a permanent record of its participants’ collaborations — see for example the wiki/website for last year’s NVO Summer School . A basic wiki is a collection of editable webpages, maintained by software called a wiki engine . The engine used at both Berder and Chicago was TikiWiki — it is well documented and gets you something running fast. It uses PHP and MySQL, but doesn’t require you to know either. Tikiwiki has far more features than most wikis, as it is really a full Content Management System . (My thanks to Sebastian Stark for pointing this out.) Here are the features we found most useful: Bulletin boa
3 0.98414236 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
4 0.9800449 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
5 0.97705835 35 hunch net-2005-03-04-The Big O and Constants in Learning
Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera
same-blog 6 0.95999348 120 hunch net-2005-10-10-Predictive Search is Coming
7 0.92817283 276 hunch net-2007-12-10-Learning Track of International Planning Competition
8 0.89164841 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
9 0.83121628 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
10 0.82661676 229 hunch net-2007-01-26-Parallel Machine Learning Problems
11 0.78178185 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
12 0.76497197 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
13 0.75455755 73 hunch net-2005-05-17-A Short Guide to PhD Graduate Study
14 0.75305474 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
15 0.74998057 253 hunch net-2007-07-06-Idempotent-capable Predictors
16 0.74711025 306 hunch net-2008-07-02-Proprietary Data in Academic Research?
17 0.74155414 162 hunch net-2006-03-09-Use of Notation
18 0.73728156 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy
19 0.73674297 178 hunch net-2006-05-08-Big machine learning
20 0.73604316 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem