hunch_net hunch_net-2005 hunch_net-2005-39 knowledge-graph by maker-knowledge-mining

39 hunch net-2005-03-10-Breaking Abstractions


meta infos for this blog

Source: html

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. [sent-1, score-0.358]

2 Most real numbers can not be represented with any machine. [sent-3, score-0.159]

3 One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. [sent-4, score-0.338]

4 A turing machine can compute anything computable, given sufficient time. [sent-6, score-0.308]

5 A typical computer fails terribly when the state required for the computation exceeds some limit. [sent-7, score-0.284]

6 This comes up when trying to predict human behavior based on the result of the equilibria computation. [sent-9, score-0.517]

7 Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). [sent-12, score-1.14]

8 From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fundamental randomization. [sent-13, score-1.104]

9 From the Bayesian viewpoint, precisely specifying our lack of knowledge is extremely difficult and typically not done. [sent-14, score-0.687]

10 So, what should we do when we learn that our basic tools can break? [sent-15, score-0.277]

11 The answer, of course is to keep using them until something better comes along. [sent-16, score-0.218]

12 However, the uncomfortable knowledge our tools break is necessary in a few ways: When considering a new abstraction, the existence of a break does not imply that it is a useless abstraction. [sent-17, score-1.717]

13 (Just as the existence of the breaks above does not imply that they are useless abstractions. [sent-18, score-0.566]

14 ) When using an abstraction in some new way, we must generally consider “is this a reasonable use? [sent-19, score-0.338]

15 We should actively consider the “rate of breakage” when deciding amongst tools. [sent-21, score-0.231]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('break', 0.326), ('tools', 0.277), ('abstraction', 0.258), ('viewpoint', 0.253), ('knowledge', 0.226), ('equilibria', 0.213), ('frequentist', 0.201), ('existence', 0.184), ('lack', 0.169), ('turing', 0.167), ('useless', 0.163), ('comes', 0.15), ('imply', 0.123), ('terribly', 0.115), ('desk', 0.115), ('roweis', 0.115), ('nash', 0.107), ('reminds', 0.107), ('aren', 0.106), ('computable', 0.101), ('abstractions', 0.101), ('floating', 0.101), ('probability', 0.099), ('fundamental', 0.098), ('box', 0.096), ('expressing', 0.096), ('breaks', 0.096), ('exceeds', 0.096), ('sam', 0.096), ('bayesian', 0.096), ('randomization', 0.092), ('represented', 0.092), ('uncomfortable', 0.092), ('implication', 0.086), ('deciding', 0.081), ('consider', 0.08), ('behavior', 0.079), ('typically', 0.079), ('knowing', 0.078), ('implemented', 0.076), ('specifying', 0.076), ('based', 0.075), ('fails', 0.073), ('anything', 0.072), ('precisely', 0.072), ('actively', 0.07), ('compute', 0.069), ('keep', 0.068), ('real', 0.067), ('extremely', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

2 0.21799894 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

3 0.19296248 351 hunch net-2009-05-02-Wielding a New Abstraction

Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo

4 0.12799132 333 hunch net-2008-12-27-Adversarial Academia

Introduction: One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants. This is not a viewpoint that I want to promote, as I consider it flawed. However, I know several people believe strongly in this viewpoint, and I have found it to have substantial explanatory power. For example: It explains why your paper was rejected based on poor logic. The reviewer wasn’t concerned with research quality, but rather with rejecting a competitor. It explains why professors rarely work together. The goal of a non-tenured professor (at least) is to get tenure, and a case for tenure comes from a portfolio of work that is undisputably yours. It explains why new research programs are not quickly adopted. Adopting a competitor’s program is impossible, if your career is based on the competitor being wrong. Different academic groups subscribe to the adversarial viewp

5 0.11817691 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning

Introduction: All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum. Human Necessary Human partially necessary Human unnecessary Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what

6 0.10998069 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

7 0.09674041 6 hunch net-2005-01-27-Learning Complete Problems

8 0.089459352 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

9 0.087917566 345 hunch net-2009-03-08-Prediction Science

10 0.082992755 183 hunch net-2006-06-14-Explorations of Exploration

11 0.08146549 235 hunch net-2007-03-03-All Models of Learning have Flaws

12 0.077622622 295 hunch net-2008-04-12-It Doesn’t Stop

13 0.077034764 140 hunch net-2005-12-14-More NIPS Papers II

14 0.074528217 35 hunch net-2005-03-04-The Big O and Constants in Learning

15 0.074343696 386 hunch net-2010-01-13-Sam Roweis died

16 0.074063227 282 hunch net-2008-01-06-Research Political Issues

17 0.071910314 215 hunch net-2006-10-22-Exemplar programming

18 0.069908515 330 hunch net-2008-12-07-A NIPS paper

19 0.069161452 246 hunch net-2007-06-13-Not Posting

20 0.067058511 21 hunch net-2005-02-17-Learning Research Programs


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.16), (1, 0.059), (2, -0.022), (3, 0.069), (4, -0.023), (5, -0.031), (6, 0.005), (7, 0.075), (8, 0.088), (9, 0.001), (10, -0.056), (11, -0.055), (12, 0.03), (13, -0.002), (14, 0.02), (15, -0.049), (16, -0.066), (17, -0.074), (18, 0.07), (19, -0.002), (20, -0.041), (21, 0.036), (22, 0.031), (23, 0.107), (24, -0.009), (25, 0.015), (26, -0.016), (27, -0.04), (28, 0.015), (29, 0.071), (30, 0.032), (31, -0.008), (32, 0.042), (33, -0.03), (34, -0.071), (35, 0.003), (36, -0.003), (37, -0.046), (38, 0.007), (39, 0.136), (40, -0.044), (41, 0.015), (42, 0.035), (43, 0.078), (44, 0.069), (45, 0.066), (46, 0.046), (47, -0.03), (48, -0.043), (49, 0.144)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97539234 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

2 0.78863472 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

3 0.72498608 123 hunch net-2005-10-16-Complexity: It’s all in your head

Introduction: One of the central concerns of learning is to understand and to prevent overfitting. Various notion of “function complexity” often arise: VC dimension, Rademacher complexity, comparison classes of experts, and program length are just a few. The term “complexity” to me seems somehow misleading; the terms never capture something that meets my intuitive notion of complexity. The Bayesian notion clearly captures what’s going on. Functions aren’t “complex”– they’re just “surprising”: we assign to them low probability. Most (all?) complexity notions I know boil down to some (generally loose) bound on the prior probability of the function. In a sense, “complexity” fundementally arises because probability distributions must sum to one. You can’t believe in all possibilities at the same time, or at least not equally. Rather you have to carefully spread the probability mass over the options you’d like to consider. Large complexity classes means that beliefs are spread thinly. In

4 0.61198032 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

Introduction: A few weeks ago I read this . David Blei and I spent some time thinking hard about this a few years back (thanks to Kary Myers for pointing us to it): In short I was thinking that “bayesian belief updating” and “maximum entropy” were two othogonal principles. But it appear that they are not, and that they can even be in conflict ! Example (from Kass 1996); consider a Die (6 sides), consider prior knowledge E[X]=3.5. Maximum entropy leads to P(X)= (1/6, 1/6, 1/6, 1/6, 1/6, 1/6). Now consider a new piece of evidence A=”X is an odd number” Bayesian posterior P(X|A)= P(A|X) P(X) = (1/3, 0, 1/3, 0, 1/3, 0). But MaxEnt with the constraints E[X]=3.5 and E[Indicator function of A]=1 leads to (.22, 0, .32, 0, .47, 0) !! (note that E[Indicator function of A]=P(A)) Indeed, for MaxEnt, because there is no more ‘6′, big numbers must be more probable to ensure an average of 3.5. For bayesian updating, P(X|A) doesn’t have to have a 3.5

5 0.60682595 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

Introduction: I don’t consider myself a “Bayesian”, but I do try hard to understand why Bayesian learning works. For the purposes of this post, Bayesian learning is a simple process of: Specify a prior over world models. Integrate using Bayes law with respect to all observed information to compute a posterior over world models. Predict according to the posterior. Bayesian learning has many advantages over other learning programs: Interpolation Bayesian learning methods interpolate all the way to pure engineering. When faced with any learning problem, there is a choice of how much time and effort a human vs. a computer puts in. (For example, the mars rover pathfinding algorithms are almost entirely engineered.) When creating an engineered system, you build a model of the world and then find a good controller in that model. Bayesian methods interpolate to this extreme because the Bayesian prior can be a delta function on one model of the world. What this means is that a recipe

6 0.58174396 205 hunch net-2006-09-07-Objective and subjective interpretations of probability

7 0.58010674 62 hunch net-2005-04-26-To calibrate or not?

8 0.52562487 33 hunch net-2005-02-28-Regularization

9 0.4912858 165 hunch net-2006-03-23-The Approximation Argument

10 0.49074429 330 hunch net-2008-12-07-A NIPS paper

11 0.48891538 303 hunch net-2008-06-09-The Minimum Sample Complexity of Importance Weighting

12 0.47744223 34 hunch net-2005-03-02-Prior, “Prior” and Bias

13 0.46970481 118 hunch net-2005-10-07-On-line learning of regular decision rules

14 0.46344566 491 hunch net-2013-11-21-Ben Taskar is gone

15 0.46140411 351 hunch net-2009-05-02-Wielding a New Abstraction

16 0.45030823 282 hunch net-2008-01-06-Research Political Issues

17 0.44839665 218 hunch net-2006-11-20-Context and the calculation misperception

18 0.44828039 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

19 0.43596438 68 hunch net-2005-05-10-Learning Reductions are Reductionist

20 0.43237683 153 hunch net-2006-02-02-Introspectionism as a Disease


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.339), (3, 0.042), (27, 0.186), (38, 0.013), (53, 0.129), (55, 0.091), (94, 0.097)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9435485 205 hunch net-2006-09-07-Objective and subjective interpretations of probability

Introduction: An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the st

same-blog 2 0.89909041 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

3 0.85002613 76 hunch net-2005-05-29-Bad ideas

Introduction: I found these two essays on bad ideas interesting. Neither of these is written from the viewpoint of research, but they are both highly relevant. Why smart people have bad ideas by Paul Graham Why smart people defend bad ideas by Scott Berkun (which appeared on slashdot ) In my experience, bad ideas are common and over confidence in ideas is common. This overconfidence can take either the form of excessive condemnation or excessive praise. Some of this is necessary to the process of research. For example, some overconfidence in the value of your own research is expected and probably necessary to motivate your own investigation. Since research is a rather risky business, much of it does not pan out. Learning to accept when something does not pan out is a critical skill which is sometimes never acquired. Excessive condemnation can be a real ill when it’s encountered. This has two effects: When the penalty for being wrong is too large, it means people have a

4 0.82459944 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

Introduction: This post is about a technology which could develop in the future. Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the average outcome for those treated is measurably better than the average outcome for those not treated, the drug might become a standard treatment. Generalizing this, a filter F sorts people into two groups: those for treatment A and those not for treatment B based upon observations x . To measure the outcome, you randomize between treatment and nontreatment of group A and measure the relative performance of the treatment. A problem often arises: in many cases the treated group does not do better than the nontreated group. A basic question is: does this mean the treatment is bad? With respect to the filter F it may mean that, but with respect to another filter F’ , the treatment might be very effective. For exampl

5 0.78559536 486 hunch net-2013-07-10-Thoughts on Artificial Intelligence

Introduction: David McAllester starts a blog .

6 0.65330195 8 hunch net-2005-02-01-NIPS: Online Bayes

7 0.62928128 140 hunch net-2005-12-14-More NIPS Papers II

8 0.59885317 5 hunch net-2005-01-26-Watchword: Probability

9 0.58899146 40 hunch net-2005-03-13-Avoiding Bad Reviewing

10 0.58560514 96 hunch net-2005-07-21-Six Months

11 0.58384818 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

12 0.58154017 201 hunch net-2006-08-07-The Call of the Deep

13 0.58071673 141 hunch net-2005-12-17-Workshops as Franchise Conferences

14 0.57851458 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

15 0.57808602 297 hunch net-2008-04-22-Taking the next step

16 0.57725871 358 hunch net-2009-06-01-Multitask Poisoning

17 0.57680577 134 hunch net-2005-12-01-The Webscience Future

18 0.57660246 152 hunch net-2006-01-30-Should the Input Representation be a Vector?

19 0.5759517 207 hunch net-2006-09-12-Incentive Compatible Reviewing

20 0.57367778 347 hunch net-2009-03-26-Machine Learning is too easy