hunch_net hunch_net-2008 hunch_net-2008-312 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information, but any other sources of information are implicitly also allowed. A side-by-side comparison of how polls compare to prediction markets might be fun in a few months.
sentIndex sentText sentNum sentScore
1 Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. [sent-1, score-1.195]
2 Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. [sent-2, score-1.827]
3 They might use polling information, but any other sources of information are implicitly also allowed. [sent-3, score-0.659]
4 A side-by-side comparison of how polls compare to prediction markets might be fun in a few months. [sent-4, score-1.281]
wordName wordTfidf (topN-words)
[('electoralmarkets', 0.51), ('polls', 0.51), ('betting', 0.227), ('reminded', 0.21), ('wins', 0.198), ('lance', 0.181), ('markets', 0.175), ('implicitly', 0.16), ('today', 0.16), ('months', 0.156), ('compare', 0.147), ('fun', 0.141), ('comparison', 0.139), ('cool', 0.136), ('information', 0.133), ('sources', 0.13), ('explicitly', 0.121), ('uses', 0.101), ('might', 0.099), ('use', 0.096), ('still', 0.093), ('enough', 0.085), ('predict', 0.084), ('people', 0.07), ('prediction', 0.07), ('point', 0.067), ('real', 0.066), ('want', 0.066), ('also', 0.041)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 312 hunch net-2008-08-04-Electoralmarkets.com
Introduction: Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information, but any other sources of information are implicitly also allowed. A side-by-side comparison of how polls compare to prediction markets might be fun in a few months.
2 0.14867298 214 hunch net-2006-10-13-David Pennock starts Oddhead
Introduction: his blog on information markets and other research topics .
3 0.088206559 345 hunch net-2009-03-08-Prediction Science
Introduction: One view of machine learning is that it’s about how to program computers to predict well. This suggests a broader research program centered around the more pervasive goal of simply predicting well. There are many distinct strands of this broader research program which are only partially unified. Here are the ones that I know of: Learning Theory . Learning theory focuses on several topics related to the dynamics and process of prediction. Convergence bounds like the VC bound give an intellectual foundation to many learning algorithms. Online learning algorithms like Weighted Majority provide an alternate purely game theoretic foundation for learning. Boosting algorithms yield algorithms for purifying prediction abiliity. Reduction algorithms provide means for changing esoteric problems into well known ones. Machine Learning . A great deal of experience has accumulated in practical algorithm design from a mixture of paradigms, including bayesian, biological, opt
4 0.067496613 237 hunch net-2007-04-02-Contextual Scaling
Introduction: Machine learning has a new kind of “scaling to larger problems” to worry about: scaling with the amount of contextual information. The standard development path for a machine learning application in practice seems to be the following: Marginal . In the beginning, there was “majority vote”. At this stage, it isn’t necessary to understand that you have a prediction problem. People just realize that one answer is right sometimes and another answer other times. In machine learning terms, this corresponds to making a prediction without side information. First context . A clever person realizes that some bit of information x 1 could be helpful. If x 1 is discrete, they condition on it and make a predictor h(x 1 ) , typically by counting. If they are clever, then they also do some smoothing. If x 1 is some real valued parameter, it’s very common to make a threshold cutoff. Often, these tasks are simply done by hand. Second . Another clever person (or perhaps the s
5 0.05581636 92 hunch net-2005-07-11-AAAI blog
Introduction: The AAAI conference is running a student blog which looks like a fun experiment.
6 0.053447019 217 hunch net-2006-11-06-Data Linkage Problems
7 0.052589662 1 hunch net-2005-01-19-Why I decided to run a weblog.
8 0.05150523 423 hunch net-2011-02-02-User preferences for search engines
9 0.04899076 16 hunch net-2005-02-09-Intuitions from applied learning
10 0.047825541 108 hunch net-2005-09-06-A link
11 0.047666863 454 hunch net-2012-01-30-ICML Posters and Scope
12 0.043122403 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
13 0.042018257 22 hunch net-2005-02-18-What it means to do research.
14 0.040584333 389 hunch net-2010-02-26-Yahoo! ML events
15 0.040576641 427 hunch net-2011-03-20-KDD Cup 2011
16 0.03958961 452 hunch net-2012-01-04-Why ICML? and the summer conferences
17 0.039538473 483 hunch net-2013-06-10-The Large Scale Learning class notes
18 0.039474078 112 hunch net-2005-09-14-The Predictionist Viewpoint
19 0.039463017 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem
20 0.038871035 429 hunch net-2011-04-06-COLT open questions
topicId topicWeight
[(0, 0.074), (1, 0.013), (2, -0.016), (3, 0.024), (4, -0.016), (5, -0.01), (6, -0.014), (7, -0.026), (8, 0.028), (9, 0.003), (10, -0.024), (11, 0.032), (12, 0.007), (13, -0.007), (14, -0.0), (15, -0.004), (16, -0.012), (17, -0.025), (18, 0.013), (19, -0.037), (20, 0.005), (21, -0.032), (22, 0.073), (23, 0.015), (24, -0.009), (25, 0.054), (26, 0.054), (27, 0.015), (28, -0.007), (29, -0.014), (30, 0.004), (31, 0.016), (32, 0.054), (33, -0.037), (34, -0.039), (35, -0.054), (36, -0.028), (37, 0.009), (38, -0.009), (39, 0.017), (40, 0.031), (41, 0.102), (42, -0.001), (43, -0.025), (44, 0.069), (45, -0.01), (46, -0.05), (47, 0.037), (48, 0.068), (49, -0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.95518136 312 hunch net-2008-08-04-Electoralmarkets.com
Introduction: Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information, but any other sources of information are implicitly also allowed. A side-by-side comparison of how polls compare to prediction markets might be fun in a few months.
2 0.55189484 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
Introduction: Say we have two random variables X,Y with mutual information I(X,Y) . Let’s say we want to represent them with a bayes net of the form X< -M->Y , such that the entropy of M equals the mutual information, i.e. H(M)=I(X,Y) . Intuitively, we would like our hidden state to be as simple as possible (entropy wise). The data processing inequality means that H(M)>=I(X,Y) , so the mutual information is a lower bound on how simple the M could be. Furthermore, if such a construction existed it would have a nice coding interpretation — one could jointly code X and Y by first coding the mutual information, then coding X with this mutual info (without Y ) and coding Y with this mutual info (without X ). It turns out that such a construction does not exist in general (Thx Alina Beygelzimer for a counterexample! see below for the sketch). What are the implications of this? Well, it’s hard for me to say, but it does suggest to me that the ‘generative’ model philosophy might be
3 0.51823539 237 hunch net-2007-04-02-Contextual Scaling
Introduction: Machine learning has a new kind of “scaling to larger problems” to worry about: scaling with the amount of contextual information. The standard development path for a machine learning application in practice seems to be the following: Marginal . In the beginning, there was “majority vote”. At this stage, it isn’t necessary to understand that you have a prediction problem. People just realize that one answer is right sometimes and another answer other times. In machine learning terms, this corresponds to making a prediction without side information. First context . A clever person realizes that some bit of information x 1 could be helpful. If x 1 is discrete, they condition on it and make a predictor h(x 1 ) , typically by counting. If they are clever, then they also do some smoothing. If x 1 is some real valued parameter, it’s very common to make a threshold cutoff. Often, these tasks are simply done by hand. Second . Another clever person (or perhaps the s
4 0.51210058 423 hunch net-2011-02-02-User preferences for search engines
Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later
5 0.47521549 217 hunch net-2006-11-06-Data Linkage Problems
Introduction: Data linkage is a problem which seems to come up in various applied machine learning problems. I have heard it mentioned in various data mining contexts, but it seems relatively less studied for systemic reasons. A very simple version of the data linkage problem is a cross hospital patient record merge. Suppose a patient (John Doe) is admitted to a hospital (General Health), treated, and released. Later, John Doe is admitted to a second hospital (Health General), treated, and released. Given a large number of records of this sort, it becomes very tempting to try and predict the outcomes of treatments. This is reasonably straightforward as a machine learning problem if there is a shared unique identifier for John Doe used by General Health and Health General along with time stamps. We can merge the records and create examples of the form “Given symptoms and treatment, did the patient come back to a hospital within the next year?” These examples could be fed into a learning algo
6 0.46505982 200 hunch net-2006-08-03-AOL’s data drop
7 0.44962782 214 hunch net-2006-10-13-David Pennock starts Oddhead
8 0.43846142 205 hunch net-2006-09-07-Objective and subjective interpretations of probability
9 0.43046257 345 hunch net-2009-03-08-Prediction Science
10 0.40837815 120 hunch net-2005-10-10-Predictive Search is Coming
11 0.39316559 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
12 0.38498783 190 hunch net-2006-07-06-Branch Prediction Competition
13 0.37728262 411 hunch net-2010-09-21-Regretting the dead
14 0.37652707 296 hunch net-2008-04-21-The Science 2.0 article
15 0.37085649 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
16 0.35915765 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
17 0.35703972 446 hunch net-2011-10-03-Monday announcements
18 0.35418084 1 hunch net-2005-01-19-Why I decided to run a weblog.
19 0.35414612 459 hunch net-2012-03-13-The Submodularity workshop and Lucca Professorship
20 0.34707773 208 hunch net-2006-09-18-What is missing for online collaborative research?
topicId topicWeight
[(10, 0.028), (13, 0.112), (15, 0.353), (27, 0.175), (53, 0.058), (55, 0.071), (95, 0.031)]
simIndex simValue blogId blogTitle
same-blog 1 0.87070435 312 hunch net-2008-08-04-Electoralmarkets.com
Introduction: Lance reminded me about electoralmarkets today, which is cool enough that I want to point it out explicitly here. Most people still use polls to predict who wins, while electoralmarkets uses people betting real money. They might use polling information, but any other sources of information are implicitly also allowed. A side-by-side comparison of how polls compare to prediction markets might be fun in a few months.
2 0.52341586 386 hunch net-2010-01-13-Sam Roweis died
Introduction: and I can’t help but remember him. I first met Sam as an undergraduate at Caltech where he was TA for Hopfield ‘s class, and again when I visited Gatsby , when he invited me to visit Toronto , and at too many conferences to recount. His personality was a combination of enthusiastic and thoughtful, with a great ability to phrase a problem so it’s solution must be understood. With respect to my own work, Sam was the one who advised me to make my first tutorial , leading to others, and to other things, all of which I’m grateful to him for. In fact, my every interaction with Sam was positive, and that was his way. His death is being called a suicide which is so incompatible with my understanding of Sam that it strains my credibility. But we know that his many responsibilities were great, and it is well understood that basically all sane researchers have legions of inner doubts. Having been depressed now and then myself, it’s helpful to understand at least intellectually
3 0.50641763 9 hunch net-2005-02-01-Watchword: Loss
Introduction: A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss. There seems to be a strong dichotomy between two views of what “loss” means in learning. Loss is determined by the problem. Loss is a part of the specification of the learning problem. Examples of problems specified by the loss function include “binary classification”, “multiclass classification”, “importance weighted classification”, “l 2 regression”, etc… This is the decision theory view of what loss means, and the view that I prefer. Loss is determined by the solution. To solve a problem, you optimize some particular loss function not given by the problem. Examples of these loss functions are “hinge loss” (for SV
4 0.4980514 406 hunch net-2010-08-22-KDD 2010
Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp
5 0.49200639 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4
Introduction: At NIPS I’m giving a tutorial on Learning to Interact . In essence this is about dealing with causality in a contextual bandit framework. Relative to previous tutorials , I’ll be covering several new results that changed my understanding of the nature of the problem. Note that Judea Pearl and Elias Bareinboim have a tutorial on causality . This might appear similar, but is quite different in practice. Pearl and Bareinboim’s tutorial will be about the general concepts while mine will be about total mastery of the simplest nontrivial case, including code. Luckily, they have the right order. I recommend going to both I also just released version 7.4 of Vowpal Wabbit . When I was a frustrated learning theorist, I did not understand why people were not using learning reductions to solve problems. I’ve been slowly discovering why with VW, and addressing the issues. One of the issues is that machine learning itself was not automatic enough, while another is that creatin
6 0.48900762 51 hunch net-2005-04-01-The Producer-Consumer Model of Research
7 0.48769641 137 hunch net-2005-12-09-Machine Learning Thoughts
8 0.48593596 212 hunch net-2006-10-04-Health of Conferences Wiki
9 0.47734937 360 hunch net-2009-06-15-In Active Learning, the question changes
10 0.46298972 225 hunch net-2007-01-02-Retrospective
11 0.45573533 117 hunch net-2005-10-03-Not ICML
12 0.45233241 304 hunch net-2008-06-27-Reviewing Horror Stories
13 0.45224991 194 hunch net-2006-07-11-New Models
14 0.45213214 220 hunch net-2006-11-27-Continuizing Solutions
15 0.45209771 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
16 0.45126322 343 hunch net-2009-02-18-Decision by Vetocracy
17 0.45103803 309 hunch net-2008-07-10-Interesting papers, ICML 2008
18 0.44980088 483 hunch net-2013-06-10-The Large Scale Learning class notes
19 0.44871491 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS
20 0.44836998 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?