hunch_net hunch_net-2006 hunch_net-2006-205 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the st
sentIndex sentText sentNum sentScore
1 An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. [sent-1, score-0.444]
2 The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. [sent-2, score-1.564]
3 The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. [sent-3, score-0.621]
4 The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. [sent-4, score-0.586]
5 The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. [sent-5, score-0.971]
6 “On the contrary, you have to do that,” said the statistician. [sent-7, score-0.09]
7 In a “larger” design or sample, the effect of a reasonable randomization scheme could be such that this obvious difficulty would almost certainly not happen. [sent-9, score-0.808]
8 In this small problem, the effect may not be cancelled out, but the statistician still has a right to close his eyes to the design actually selected if he is satisfied with “playing fair”. [sent-11, score-0.939]
9 That is, if he instructs an agent to select the design and he analyzes the results, assuming there are no gradients, his conclusions will be unbiased in the sense that a tendency to overestimate is balanced on the average by a tendency to underestimate the desired quantities. [sent-12, score-0.767]
10 However, this tendency may be substantial as measured by the variability of the estimates which will be affected by substantial gradients. [sent-13, score-0.314]
11 On the other hand, following the natural inclination to reject an obviously unsatisfactory design resulting from randomization puts the statistician in the position of not “playing fair”. [sent-14, score-1.35]
12 What is worse for an objective statistician, he has no way of evaluating in advance how good his procedure is if he can change the rules in the middle of the experiment. [sent-15, score-0.061]
13 The Bayesian statistician , who uses subjective probability and must consider all information, is unsatisfied to simply play fair. [sent-16, score-0.441]
14 When randomization leads to the original unsatisfactory design, he is aware of this information and unwilling to accept the design. [sent-17, score-0.585]
15 In general, the religious Bayesian states that no good and only harm can come from randomized experiments. [sent-18, score-0.101]
16 In principle, he is opposed even to random sampling in opinion polling. [sent-19, score-0.061]
17 However, this principle puts him in untenable computational positions, and a pragmatic Bayesian will often ignore what seems useless design information if there are no obvious quirks in a randomly selected sample. [sent-20, score-0.675]
wordName wordTfidf (topN-words)
[('statistician', 0.441), ('randomization', 0.314), ('oven', 0.294), ('strength', 0.218), ('design', 0.208), ('bar', 0.171), ('effect', 0.159), ('metal', 0.147), ('tendency', 0.139), ('heat', 0.131), ('selected', 0.131), ('unsatisfactory', 0.131), ('puts', 0.121), ('estimates', 0.114), ('hot', 0.109), ('medium', 0.101), ('randomized', 0.101), ('playing', 0.095), ('principle', 0.092), ('assuming', 0.09), ('said', 0.09), ('two', 0.089), ('bayesian', 0.081), ('fair', 0.08), ('original', 0.079), ('cool', 0.079), ('obviously', 0.074), ('amusing', 0.065), ('underestimate', 0.065), ('reproduced', 0.065), ('raises', 0.065), ('gradients', 0.065), ('cancel', 0.065), ('monograph', 0.065), ('overestimate', 0.065), ('six', 0.065), ('would', 0.065), ('obvious', 0.062), ('gradient', 0.062), ('unwilling', 0.061), ('permission', 0.061), ('planned', 0.061), ('conclusions', 0.061), ('middle', 0.061), ('affected', 0.061), ('chernoff', 0.061), ('herman', 0.061), ('inclination', 0.061), ('opposed', 0.061), ('pragmatic', 0.061)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 205 hunch net-2006-09-07-Objective and subjective interpretations of probability
Introduction: An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the st
2 0.12384157 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
Introduction: There are a number of learning algorithms which explicitly incorporate randomness into their execution. This includes at amongst others: Neural Networks. Neural networks use randomization to assign initial weights. Boltzmann Machines/ Deep Belief Networks . Boltzmann machines are something like a stochastic version of multinode logistic regression. The use of randomness is more essential in Boltzmann machines, because the predicted value at test time also uses randomness. Bagging. Bagging is a process where a learning algorithm is run several different times on several different datasets, creating a final predictor which makes a majority vote. Policy descent. Several algorithms in reinforcement learning such as Conservative Policy Iteration use random bits to create stochastic policies. Experts algorithms. Randomized weighted majority use random bits as a part of the prediction process to achieve better theoretical guarantees. A basic question is: “Should there
3 0.083599009 411 hunch net-2010-09-21-Regretting the dead
Introduction: Nikos pointed out this new york times article about poor clinical design killing people . For those of us who study learning from exploration information this is a reminder that low regret algorithms are particularly important, as regret in clinical trials is measured by patient deaths. Two obvious improvements on the experimental design are: With reasonable record keeping of existing outcomes for the standard treatments, there is no need to explicitly assign people to a control group with the standard treatment, as that approach is effectively explored with great certainty. Asserting otherwise would imply that the nature of effective treatments for cancer has changed between now and a year ago, which denies the value of any clinical trial. An optimal experimental design will smoothly phase between exploration and exploitation as evidence for a new treatment shows that it can be effective. This is old tech, for example in the EXP3.P algorithm (page 12 aka 59) although
4 0.080312379 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
Introduction: The large scale learning challenge for ICML interests me a great deal, although I have concerns about the way it is structured. From the instructions page , several issues come up: Large Definition My personal definition of dataset size is: small A dataset is small if a human could look at the dataset and plausibly find a good solution. medium A dataset is mediumsize if it fits in the RAM of a reasonably priced computer. large A large dataset does not fit in the RAM of a reasonably priced computer. By this definition, all of the datasets are medium sized. This might sound like a pissing match over dataset size, but I believe it is more than that. The fundamental reason for these definitions is that they correspond to transitions in the sorts of approaches which are feasible. From small to medium, the ability to use a human as the learning algorithm degrades. From medium to large, it becomes essential to have learning algorithms that don’t require ran
5 0.075046867 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
Introduction: I don’t consider myself a “Bayesian”, but I do try hard to understand why Bayesian learning works. For the purposes of this post, Bayesian learning is a simple process of: Specify a prior over world models. Integrate using Bayes law with respect to all observed information to compute a posterior over world models. Predict according to the posterior. Bayesian learning has many advantages over other learning programs: Interpolation Bayesian learning methods interpolate all the way to pure engineering. When faced with any learning problem, there is a choice of how much time and effort a human vs. a computer puts in. (For example, the mars rover pathfinding algorithms are almost entirely engineered.) When creating an engineered system, you build a model of the world and then find a good controller in that model. Bayesian methods interpolate to this extreme because the Bayesian prior can be a delta function on one model of the world. What this means is that a recipe
6 0.070389837 345 hunch net-2009-03-08-Prediction Science
7 0.069125406 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
8 0.061457679 48 hunch net-2005-03-29-Academic Mechanism Design
9 0.060888361 111 hunch net-2005-09-12-Fast Gradient Descent
10 0.059163354 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
11 0.05883462 120 hunch net-2005-10-10-Predictive Search is Coming
12 0.057477489 39 hunch net-2005-03-10-Breaking Abstractions
13 0.055727258 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
14 0.055635646 95 hunch net-2005-07-14-What Learning Theory might do
15 0.054208055 167 hunch net-2006-03-27-Gradients everywhere
16 0.054179154 332 hunch net-2008-12-23-Use of Learning Theory
17 0.053899717 5 hunch net-2005-01-26-Watchword: Probability
18 0.053875409 40 hunch net-2005-03-13-Avoiding Bad Reviewing
19 0.053490944 131 hunch net-2005-11-16-The Everything Ensemble Edge
20 0.05327604 375 hunch net-2009-10-26-NIPS workshops
topicId topicWeight
[(0, 0.122), (1, 0.022), (2, 0.001), (3, 0.02), (4, 0.011), (5, -0.002), (6, 0.005), (7, 0.041), (8, 0.04), (9, -0.011), (10, -0.02), (11, -0.007), (12, 0.002), (13, -0.033), (14, -0.003), (15, 0.015), (16, -0.039), (17, -0.002), (18, -0.027), (19, -0.02), (20, -0.045), (21, 0.016), (22, 0.038), (23, 0.059), (24, -0.038), (25, 0.072), (26, 0.054), (27, -0.034), (28, 0.071), (29, -0.048), (30, -0.031), (31, -0.028), (32, 0.021), (33, -0.07), (34, -0.083), (35, -0.051), (36, -0.135), (37, -0.037), (38, -0.047), (39, 0.074), (40, 0.024), (41, 0.056), (42, 0.038), (43, -0.0), (44, 0.006), (45, 0.031), (46, 0.051), (47, -0.058), (48, -0.012), (49, 0.056)]
simIndex simValue blogId blogTitle
same-blog 1 0.97274876 205 hunch net-2006-09-07-Objective and subjective interpretations of probability
Introduction: An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the st
2 0.60872656 167 hunch net-2006-03-27-Gradients everywhere
Introduction: One of the basic observations from the atomic learning workshop is that gradient-based optimization is pervasive. For example, at least 7 (of 12) speakers used the word ‘gradient’ in their talk and several others may be approximating a gradient. The essential useful quality of a gradient is that it decouples local updates from global optimization. Restated: Given a gradient, we can determine how to change individual parameters of the system so as to improve overall performance. It’s easy to feel depressed about this and think “nothing has happened”, but that appears untrue. Many of the talks were about clever techniques for computing gradients where your calculus textbook breaks down. Sometimes there are clever approximations of the gradient. ( Simon Osindero ) Sometimes we can compute constrained gradients via iterated gradient/project steps. ( Ben Taskar ) Sometimes we can compute gradients anyways over mildly nondifferentiable functions. ( Drew Bagnell ) Even give
3 0.59141707 411 hunch net-2010-09-21-Regretting the dead
Introduction: Nikos pointed out this new york times article about poor clinical design killing people . For those of us who study learning from exploration information this is a reminder that low regret algorithms are particularly important, as regret in clinical trials is measured by patient deaths. Two obvious improvements on the experimental design are: With reasonable record keeping of existing outcomes for the standard treatments, there is no need to explicitly assign people to a control group with the standard treatment, as that approach is effectively explored with great certainty. Asserting otherwise would imply that the nature of effective treatments for cancer has changed between now and a year ago, which denies the value of any clinical trial. An optimal experimental design will smoothly phase between exploration and exploitation as evidence for a new treatment shows that it can be effective. This is old tech, for example in the EXP3.P algorithm (page 12 aka 59) although
4 0.55583966 39 hunch net-2005-03-10-Breaking Abstractions
Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu
5 0.52182448 241 hunch net-2007-04-28-The Coming Patent Apocalypse
Introduction: Many people in computer science believe that patents are problematic. The truth is even worse—the patent system in the US is fundamentally broken in ways that will require much more significant reform than is being considered now . The myth of the patent is the following: Patents are a mechanism for inventors to be compensated according to the value of their inventions while making the invention available to all. This myth sounds pretty desirable, but the reality is a strange distortion slowly leading towards collapse. There are many problems associated with patents, but I would like to focus on just two of them: Patent Trolls The way that patents have generally worked over the last several decades is that they were a tool of large companies. Large companies would amass a large number of patents and then cross-license each other’s patents—in effect saying “we agree to owe each other nothing”. Smaller companies would sometimes lose in this game, essentially because they
6 0.47926489 197 hunch net-2006-07-17-A Winner
7 0.47484556 179 hunch net-2006-05-16-The value of the orthodox view of Boosting
8 0.46195507 111 hunch net-2005-09-12-Fast Gradient Descent
9 0.45267028 312 hunch net-2008-08-04-Electoralmarkets.com
10 0.44826761 345 hunch net-2009-03-08-Prediction Science
11 0.43569148 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?
12 0.41153941 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
13 0.41069636 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms
14 0.40920269 48 hunch net-2005-03-29-Academic Mechanism Design
15 0.40886775 5 hunch net-2005-01-26-Watchword: Probability
16 0.4045082 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
17 0.40406221 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
18 0.40109491 348 hunch net-2009-04-02-Asymmophobia
19 0.39466497 165 hunch net-2006-03-23-The Approximation Argument
20 0.39164945 237 hunch net-2007-04-02-Contextual Scaling
topicId topicWeight
[(1, 0.469), (3, 0.023), (27, 0.087), (38, 0.067), (51, 0.027), (53, 0.067), (55, 0.058), (77, 0.015), (94, 0.065), (95, 0.024), (98, 0.011)]
simIndex simValue blogId blogTitle
1 0.94526196 486 hunch net-2013-07-10-Thoughts on Artificial Intelligence
Introduction: David McAllester starts a blog .
same-blog 2 0.90197843 205 hunch net-2006-09-07-Objective and subjective interpretations of probability
Introduction: An amusing tidbit (reproduced without permission) from Herman Chernoff’s delightful monograph, “Sequential analysis and optimal design”: The use of randomization raises a philosophical question which is articulated by the following probably apocryphal anecdote. The metallurgist told his friend the statistician how he planned to test the effect of heat on the strength of a metal bar by sawing the bar into six pieces. The first two would go into the hot oven, the next two into the medium oven, and the last two into the cool oven. The statistician, horrified, explained how he should randomize to avoid the effect of a possible gradient of strength in the metal bar. The method of randomization was applied, and it turned out that the randomized experiment called for putting the first two pieces into the hot oven, the next two into the medium oven, and the last two into the cool oven. “Obviously, we can’t do that,” said the metallurgist. “On the contrary, you have to do that,” said the st
3 0.6790309 39 hunch net-2005-03-10-Breaking Abstractions
Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu
4 0.64438766 76 hunch net-2005-05-29-Bad ideas
Introduction: I found these two essays on bad ideas interesting. Neither of these is written from the viewpoint of research, but they are both highly relevant. Why smart people have bad ideas by Paul Graham Why smart people defend bad ideas by Scott Berkun (which appeared on slashdot ) In my experience, bad ideas are common and over confidence in ideas is common. This overconfidence can take either the form of excessive condemnation or excessive praise. Some of this is necessary to the process of research. For example, some overconfidence in the value of your own research is expected and probably necessary to motivate your own investigation. Since research is a rather risky business, much of it does not pan out. Learning to accept when something does not pan out is a critical skill which is sometimes never acquired. Excessive condemnation can be a real ill when it’s encountered. This has two effects: When the penalty for being wrong is too large, it means people have a
5 0.59270269 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
Introduction: This post is about a technology which could develop in the future. Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the average outcome for those treated is measurably better than the average outcome for those not treated, the drug might become a standard treatment. Generalizing this, a filter F sorts people into two groups: those for treatment A and those not for treatment B based upon observations x . To measure the outcome, you randomize between treatment and nontreatment of group A and measure the relative performance of the treatment. A problem often arises: in many cases the treated group does not do better than the nontreated group. A basic question is: does this mean the treatment is bad? With respect to the filter F it may mean that, but with respect to another filter F’ , the treatment might be very effective. For exampl
6 0.41151443 8 hunch net-2005-02-01-NIPS: Online Bayes
7 0.37914696 140 hunch net-2005-12-14-More NIPS Papers II
8 0.31720042 5 hunch net-2005-01-26-Watchword: Probability
9 0.31419447 40 hunch net-2005-03-13-Avoiding Bad Reviewing
10 0.31325296 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
11 0.30283418 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
12 0.29777578 233 hunch net-2007-02-16-The Forgetting
13 0.29460993 19 hunch net-2005-02-14-Clever Methods of Overfitting
14 0.29377735 439 hunch net-2011-08-01-Interesting papers at COLT 2011
15 0.29085678 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
16 0.29039356 147 hunch net-2006-01-08-Debugging Your Brain
17 0.28991449 423 hunch net-2011-02-02-User preferences for search engines
18 0.2894831 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
19 0.28727826 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms
20 0.28653273 131 hunch net-2005-11-16-The Everything Ensemble Edge