hunch_net hunch_net-2011 hunch_net-2011-423 knowledge-graph by maker-knowledge-mining

423 hunch net-2011-02-02-User preferences for search engines

meta infos for this blog

Source: html

Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . [sent-7, score-0.47]

2 The issue Google engineers inserted synthetic responses to synthetic queries on google. [sent-10, score-0.631]

3 com using Internet Explorer with the Bing toolbar and later noticed some synthetic responses from Bing with the synthetic queries. [sent-12, score-0.55]

4 One is the privacy disagreement “ Big Brother Microsoft is looking at what I search and using it”. [sent-14, score-0.595]

5 In the end, I think companies should simply do their best to accept a user’s wishes, so those who want privacy can have it, and those who want to contribute their data towards improving a search engine can do so. [sent-16, score-0.808]

6 What I believe happened was a user feedback process, where users queried Google, clicked on a result, informed Microsoft/Bing of the query and clicked result, and their preference was used to promote the search result within Bing. [sent-21, score-1.222]

7 Should a user be allowed to: Reveal to their chosen search engine their preferred result? [sent-23, score-0.83]

8 Reveal to a competitor’s search engine their preferred result? [sent-24, score-0.497]

9 If you answer ‘no’ to the first, you are deeply against user freedom in a manner I can’t sympathize with. [sent-25, score-0.442]

10 If you answer ‘yes’ to the first, and ‘no’ to the second, then you are still somewhat against user freedom. [sent-26, score-0.462]

11 However, in all instances I’m aware of, users knowingly agree to a contract providing access to the information with limitations. [sent-29, score-0.449]

12 You could argue that it’s ok for Microsoft to take advantage of revealed user interaction, but it’s still a matter of following rather than leading. [sent-33, score-0.497]

13 A basic truth seen in many ways, is that the proper incorporation of new sources of information always improves results. [sent-35, score-0.308]

14 More generally, it’s true in basic knowledge engineering, where people fuse sources of information to create a better system, and I’m virtually certain it’s true of the ranking algorithms behind Google and Bing, which are surely complex beasts taking into account many sources of information. [sent-37, score-0.64]

15 If that’s the case, Google will either follow Microsoft’s lead taking into account user feedback as Microsoft does, or risk becoming obsolete. [sent-39, score-0.497]

16 A basic truth, is that building a successful search engine is extraordinarily difficult. [sent-41, score-0.427]

17 This is revealed by search market share, but also by simply thinking about the logistics involved. [sent-42, score-0.363]

18 If we prefer a future where there is a healthy competition amongst search engines, then it’s important to lower these barriers to entry so new people with new ideas can more easily test them out. [sent-44, score-0.405]

19 One way to lower the barrier to entry is to accept that users can share their interaction, even with a competitor’s search engine. [sent-45, score-0.673]

20 I would be more sympathetic to a position for allowing users of Internet Explorer a built-in means to choose to share their search behavior with Google or other search engines on an equal footing. [sent-48, score-1.038]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('user', 0.333), ('bing', 0.329), ('search', 0.276), ('google', 0.239), ('synthetic', 0.205), ('users', 0.19), ('microsoft', 0.171), ('privacy', 0.152), ('engine', 0.151), ('issue', 0.139), ('sources', 0.135), ('sympathetic', 0.131), ('information', 0.11), ('disagreement', 0.109), ('internet', 0.099), ('result', 0.098), ('explorer', 0.094), ('clicked', 0.094), ('competitor', 0.094), ('revealed', 0.087), ('contract', 0.087), ('reveal', 0.087), ('share', 0.083), ('engines', 0.082), ('responses', 0.082), ('yahoo', 0.082), ('true', 0.08), ('still', 0.077), ('yes', 0.075), ('dealt', 0.075), ('incorporating', 0.073), ('informed', 0.073), ('entry', 0.073), ('preferred', 0.07), ('towards', 0.068), ('interaction', 0.065), ('feedback', 0.064), ('truth', 0.063), ('instances', 0.062), ('using', 0.058), ('manner', 0.057), ('discussion', 0.056), ('competition', 0.056), ('want', 0.055), ('entirely', 0.054), ('argument', 0.053), ('answer', 0.052), ('accept', 0.051), ('account', 0.05), ('taking', 0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999994 423 hunch net-2011-02-02-User preferences for search engines

2 0.1919615 200 hunch net-2006-08-03-AOL’s data drop

Introduction: AOL has released several large search engine related datasets. This looks like a pretty impressive data release, and it is a big opportunity for people everywhere to worry about search engine related learning problems, if they want.

3 0.19041228 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

4 0.17477106 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

Introduction: I just visited Yahoo Research which has several fundamental learning problems near to (or beyond) the set of problems we know how to solve well. Here are 3 of them. Ranking This is the canonical problem of all search engines. It is made extra difficult for several reasons. There is relatively little “good” supervised learning data and a great deal of data with some signal (such as click through rates). The learning must occur in a partially adversarial environment. Many people very actively attempt to place themselves at the top of rankings. It is not even quite clear whether the problem should be posed as ‘ranking’ or as ‘regression’ which is then used to produce a ranking. Collaborative filtering Yahoo has a large number of recommendation systems for music, movies, etc… In these sorts of systems, users specify how they liked a set of things, and then the system can (hopefully) find some more examples of things they might like by reasoning across multiple

5 0.16370241 120 hunch net-2005-10-10-Predictive Search is Coming

Introduction: “Search” is the other branch of AI research which has been succesful. Concrete examples include Deep Blue which beat the world chess champion and Chinook the champion checkers program. A set of core search techniques exist including A * , alpha-beta pruning, and others that can be applied to any of many different search problems. Given this, it may be surprising to learn that there has been relatively little succesful work on combining prediction and search. Given also that humans typically solve search problems using a number of predictive heuristics to narrow in on a solution, we might be surprised again. However, the big successful search-based systems have typically not used “smart” search algorithms. Insteady they have optimized for very fast search. This is not for lack of trying… many people have tried to synthesize search and prediction to various degrees of success. For example, Knightcap achieves good-but-not-stellar chess playing performance, and TD-gammon

6 0.15757462 260 hunch net-2007-08-25-The Privacy Problem

7 0.14611343 178 hunch net-2006-05-08-Big machine learning

8 0.12764443 345 hunch net-2009-03-08-Prediction Science

9 0.12219543 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

10 0.11565717 269 hunch net-2007-10-24-Contextual Bandits

11 0.11110725 464 hunch net-2012-05-03-Microsoft Research, New York City

12 0.10435653 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

13 0.10198189 297 hunch net-2008-04-22-Taking the next step

14 0.099962905 420 hunch net-2010-12-26-NIPS 2010

15 0.099425681 364 hunch net-2009-07-11-Interesting papers at KDD

16 0.098948769 371 hunch net-2009-09-21-Netflix finishes (and starts)

17 0.09843912 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

18 0.091675811 446 hunch net-2011-10-03-Monday announcements

19 0.0893379 16 hunch net-2005-02-09-Intuitions from applied learning

20 0.089014344 237 hunch net-2007-04-02-Contextual Scaling

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.207), (1, -0.012), (2, -0.063), (3, 0.077), (4, -0.031), (5, -0.015), (6, -0.056), (7, 0.036), (8, -0.034), (9, -0.033), (10, -0.074), (11, 0.152), (12, -0.066), (13, 0.1), (14, -0.162), (15, 0.084), (16, -0.123), (17, -0.065), (18, 0.118), (19, -0.075), (20, 0.105), (21, -0.06), (22, 0.056), (23, 0.055), (24, -0.075), (25, 0.161), (26, 0.129), (27, 0.086), (28, -0.013), (29, 0.016), (30, -0.075), (31, -0.014), (32, -0.031), (33, -0.072), (34, -0.039), (35, -0.039), (36, 0.073), (37, 0.004), (38, 0.057), (39, -0.026), (40, -0.026), (41, 0.032), (42, -0.093), (43, -0.108), (44, 0.081), (45, -0.049), (46, 0.079), (47, 0.032), (48, 0.037), (49, -0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97852379 423 hunch net-2011-02-02-User preferences for search engines

2 0.80873102 200 hunch net-2006-08-03-AOL’s data drop

3 0.71716803 156 hunch net-2006-02-11-Yahoo’s Learning Problems.

4 0.70940328 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

5 0.69889468 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

6 0.65033907 120 hunch net-2005-10-10-Predictive Search is Coming

7 0.63598937 260 hunch net-2007-08-25-The Privacy Problem

8 0.55529124 364 hunch net-2009-07-11-Interesting papers at KDD

9 0.55262929 312 hunch net-2008-08-04-Electoralmarkets.com

10 0.50956202 345 hunch net-2009-03-08-Prediction Science

11 0.50149864 125 hunch net-2005-10-20-Machine Learning in the News

12 0.48800534 406 hunch net-2010-08-22-KDD 2010

13 0.46221405 420 hunch net-2010-12-26-NIPS 2010

14 0.46201116 464 hunch net-2012-05-03-Microsoft Research, New York City

15 0.44520754 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

16 0.42738676 269 hunch net-2007-10-24-Contextual Bandits

17 0.4242028 424 hunch net-2011-02-17-What does Watson mean?

18 0.42342386 217 hunch net-2006-11-06-Data Linkage Problems

19 0.41956323 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

20 0.40926725 241 hunch net-2007-04-28-The Coming Patent Apocalypse

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.034), (27, 0.14), (38, 0.056), (48, 0.012), (53, 0.063), (55, 0.136), (64, 0.025), (68, 0.016), (77, 0.023), (79, 0.24), (94, 0.12), (95, 0.051)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.89569974 162 hunch net-2006-03-09-Use of Notation

Introduction: For most people, a mathematical notation is like a language: you learn it and stick with it. For people doing mathematical research, however, this is not enough: they must design new notations for new problems. The design of good notation is both hard and worthwhile since a bad initial notation can retard a line of research greatly. Before we had mathematical notation, equations were all written out in language. Since words have multiple meanings and variable precedences, long equations written out in language can be extraordinarily difficult and sometimes fundamentally ambiguous. A good representative example of this is the legalese in the tax code. Since we want greater precision and clarity, we adopt mathematical notation. One fundamental thing to understand about mathematical notation, is that humans as logic verifiers, are barely capable. This is the fundamental reason why one notation can be much better than another. This observation is easier to miss than you might

2 0.86706555 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?

Introduction: Compressed Sensing (CS) is a new framework developed by Emmanuel Candes , Terry Tao and David Donoho . To summarize, if you acquire a signal in some basis that is incoherent with the basis in which you know the signal to be sparse in, it is very likely you will be able to reconstruct the signal from these incoherent projections. Terry Tao, the recent Fields medalist , does a very nice job at explaining the framework here . He goes further in the theory description in this post where he mentions the central issue of the Uniform Uncertainty Principle. It so happens that random projections are on average incoherent, within the UUP meaning, with most known basis (sines, polynomials, splines, wavelets, curvelets …) and are therefore an ideal basis for Compressed Sensing. [ For more in-depth information on the subject, the Rice group has done a very good job at providing a central library of papers relevant to the growing subject: http://www.dsp.ece.rice.edu/cs/ ] The Machine

same-blog 3 0.84858483 423 hunch net-2011-02-02-User preferences for search engines

4 0.81702077 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

Introduction: Bob Williamson and I are the learning theory PC members at NIPS this year. This is some attempt to state the standards and tests I applied to the papers. I think it is a good idea to talk about this for two reasons: Making community standards a matter of public record seems healthy. It give us a chance to debate what is and is not the right standard. It might even give us a bit more consistency across the years. It may save us all time. There are a number of papers submitted which just aren’t there yet. Avoiding submitting is the right decision in this case. There are several criteria for judging a paper. All of these were active this year. Some criteria are uncontroversial while others may be so. The paper must have a theorem establishing something new for which it is possible to derive high confidence in the correctness of the results. A surprising number of papers fail this test. This criteria seems essential to the definition of “theory”. Missing theo

5 0.78613579 354 hunch net-2009-05-17-Server Update

Introduction: The hunch.net server has been updated. I’ve taken the opportunity to upgrade the version of wordpress which caused cascading changes. Old threaded comments are now flattened. The system we used to use ( Brian’s threaded comments ) appears incompatible with the new threading system built into wordpress. I haven’t yet figured out a workaround. I setup a feedburner account . I added an RSS aggregator for both Machine Learning and other research blogs that I like to follow. This is something that I’ve wanted to do for awhile. Many other minor changes in font and format, with some help from Alina . If you have any suggestions for site tweaks, please speak up.

6 0.77844572 254 hunch net-2007-07-12-ICML Trends

7 0.7386421 27 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification

8 0.68276864 40 hunch net-2005-03-13-Avoiding Bad Reviewing

9 0.67619932 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making

10 0.67544842 437 hunch net-2011-07-10-ICML 2011 and the future

11 0.66522908 75 hunch net-2005-05-28-Running A Machine Learning Summer School

12 0.66476136 169 hunch net-2006-04-05-What is state?

13 0.664258 116 hunch net-2005-09-30-Research in conferences

14 0.66417003 452 hunch net-2012-01-04-Why ICML? and the summer conferences

15 0.66341686 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

16 0.6629256 461 hunch net-2012-04-09-ICML author feedback is open

17 0.66269779 146 hunch net-2006-01-06-MLTV

18 0.65963483 96 hunch net-2005-07-21-Six Months

19 0.65739566 141 hunch net-2005-12-17-Workshops as Franchise Conferences

20 0.65728009 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy