hunch_net hunch_net-2005 hunch_net-2005-115 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
sentIndex sentText sentNum sentScore
1 “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. [sent-1, score-1.008]
2 The ordering of these steps is very important to the scientific method. [sent-4, score-0.544]
3 In particular, predictions must be made before experiments are run. [sent-5, score-0.327]
4 Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. [sent-6, score-0.897]
5 Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. [sent-9, score-0.548]
6 It isn’t too surprising to observe this: when the testers of a drug have $10 9 or more riding on the outcome the temptation to make the outcome “right” is extreme. [sent-11, score-0.679]
7 When conducting experiments of some new phenomena, it is common for the experimental apparatus to simply not work right. [sent-13, score-0.583]
8 (Of course, the more common outcome is that the drugs effectiveness is just overstated. [sent-18, score-0.422]
9 Each prediction bound has a number of things in common: They assume that the data is independently and identically drawn. [sent-21, score-0.411]
10 This is well suited to experimental situations where experimenters work very hard to make different experiments be independent. [sent-22, score-0.692]
11 This is important for experimental testing of predictions because the distribution that observations are expected to come from is a part of the theory under test. [sent-25, score-0.31]
12 These two properties above form an ‘equivalence class’ over different mathematical bounds where each bound can be trusted to an equivalent degree. [sent-26, score-0.548]
13 Inside of this equivalent class there are several that may be helpful in determining whether deviations from the scientific method are reasonable or not. [sent-27, score-0.845]
14 The most basic test set bound corresponds to the scientific method above. [sent-28, score-0.822]
15 The Occam’s Razor bound allows a careful reordering of steps (1), (2) and step (3). [sent-29, score-0.664]
16 More “interesting” bounds like the VC-bound and the PAC-Bayes bound allow more radical alterations of these steps. [sent-30, score-0.62]
17 The Sample Compression bound allows careful disposal of some datapoints. [sent-32, score-0.509]
18 Progressive Validation bounds (such as here , here or here ) allow hypotheses to be safely reformulated in arbitrary ways as experiments progress. [sent-33, score-0.439]
19 Scientific experimenters looking for a little extra flexibility in the scientific method may find these approaches useful. [sent-34, score-0.821]
20 (And if they don’t, maybe there is another bound in this equivalence class that needs to be worked out. [sent-35, score-0.485]
wordName wordTfidf (topN-words)
[('scientific', 0.394), ('bound', 0.247), ('cheating', 0.215), ('experiments', 0.21), ('method', 0.181), ('drugs', 0.174), ('experimenters', 0.174), ('outcome', 0.157), ('drug', 0.155), ('steps', 0.15), ('bounds', 0.139), ('equivalence', 0.135), ('experimental', 0.128), ('predictions', 0.117), ('surprising', 0.107), ('class', 0.103), ('make', 0.103), ('data', 0.099), ('careful', 0.098), ('hypothesis', 0.093), ('allows', 0.092), ('common', 0.091), ('equivalent', 0.09), ('allow', 0.09), ('reprobleming', 0.077), ('reordering', 0.077), ('suited', 0.077), ('apparatus', 0.077), ('cleaner', 0.077), ('clinical', 0.077), ('conduct', 0.077), ('conducting', 0.077), ('deviations', 0.077), ('oil', 0.077), ('compression', 0.072), ('flexibility', 0.072), ('alterations', 0.072), ('disposal', 0.072), ('radical', 0.072), ('trusted', 0.072), ('wrong', 0.069), ('right', 0.068), ('meanings', 0.068), ('attacks', 0.068), ('heart', 0.068), ('investigating', 0.068), ('science', 0.066), ('distribution', 0.065), ('identically', 0.065), ('phenomena', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
2 0.24440098 41 hunch net-2005-03-15-The State of Tight Bounds
Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec
3 0.14832127 85 hunch net-2005-06-28-A COLT paper
Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.
4 0.1460541 244 hunch net-2007-05-09-The Missing Bound
Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,
5 0.13134158 425 hunch net-2011-02-25-Yahoo! Machine Learning grant due March 11
Introduction: Yahoo!’s Key Scientific Challenges for Machine Learning grant applications are due March 11. If you are a student working on relevant research, please consider applying. It’s for $5K of unrestricted funding.
6 0.1263624 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
7 0.11425486 306 hunch net-2008-07-02-Proprietary Data in Academic Research?
8 0.11388376 170 hunch net-2006-04-06-Bounds greater than 1
9 0.11239773 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
10 0.10931051 163 hunch net-2006-03-12-Online learning or online preservation of learning?
11 0.10729883 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
12 0.1059165 6 hunch net-2005-01-27-Learning Complete Problems
13 0.10373084 235 hunch net-2007-03-03-All Models of Learning have Flaws
14 0.10042904 44 hunch net-2005-03-21-Research Styles in Machine Learning
15 0.096893661 19 hunch net-2005-02-14-Clever Methods of Overfitting
16 0.09628927 485 hunch net-2013-06-29-The Benefits of Double-Blind Review
17 0.094769321 332 hunch net-2008-12-23-Use of Learning Theory
18 0.094694227 351 hunch net-2009-05-02-Wielding a New Abstraction
19 0.093027368 91 hunch net-2005-07-10-Thinking the Unthought
20 0.09055195 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
topicId topicWeight
[(0, 0.214), (1, 0.067), (2, 0.017), (3, 0.032), (4, 0.021), (5, -0.083), (6, 0.067), (7, 0.061), (8, 0.032), (9, -0.059), (10, 0.07), (11, 0.168), (12, 0.096), (13, -0.05), (14, 0.021), (15, -0.056), (16, -0.019), (17, 0.004), (18, -0.132), (19, -0.032), (20, 0.121), (21, 0.017), (22, -0.171), (23, -0.042), (24, 0.051), (25, 0.025), (26, -0.012), (27, 0.038), (28, 0.024), (29, 0.068), (30, -0.094), (31, -0.022), (32, -0.063), (33, -0.021), (34, -0.042), (35, -0.025), (36, -0.044), (37, 0.049), (38, -0.113), (39, 0.023), (40, -0.074), (41, 0.037), (42, -0.004), (43, 0.085), (44, -0.011), (45, 0.015), (46, 0.054), (47, -0.086), (48, 0.007), (49, -0.081)]
simIndex simValue blogId blogTitle
same-blog 1 0.97813755 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
2 0.80954492 244 hunch net-2007-05-09-The Missing Bound
Introduction: Sham Kakade points out that we are missing a bound. Suppose we have m samples x drawn IID from some distribution D . Through the magic of exponential moment method we know that: If the range of x is bounded by an interval of size I , a Chernoff/Hoeffding style bound gives us a bound on the deviations like O(I/m 0.5 ) (at least in crude form). A proof is on page 9 here . If the range of x is bounded, and the variance (or a bound on the variance) is known, then Bennett’s bound can give tighter results (*). This can be a huge improvment when the true variance small. What’s missing here is a bound that depends on the observed variance rather than a bound on the variance. This means that many people attempt to use Bennett’s bound (incorrectly) by plugging the observed variance in as the true variance, invalidating the bound application. Most of the time, they get away with it, but this is a dangerous move when doing machine learning. In machine learning,
3 0.77190077 41 hunch net-2005-03-15-The State of Tight Bounds
Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec
4 0.75666839 85 hunch net-2005-06-28-A COLT paper
Introduction: I found Tong Zhang’s paper on Data Dependent Concentration Bounds for Sequential Prediction Algorithms interesting. Roughly speaking, it states a tight bound on the future error rate for online learning algorithms assuming that samples are drawn independently. This bound is easily computed and will make the progressive validation approaches used here significantly more practical.
5 0.6781832 170 hunch net-2006-04-06-Bounds greater than 1
Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da
6 0.60958904 163 hunch net-2006-03-12-Online learning or online preservation of learning?
7 0.5951423 392 hunch net-2010-03-26-A Variance only Deviation Bound
8 0.56911594 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
9 0.54849374 247 hunch net-2007-06-14-Interesting Papers at COLT 2007
10 0.51736975 55 hunch net-2005-04-10-Is the Goal Understanding or Prediction?
11 0.51461864 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning
12 0.47780362 26 hunch net-2005-02-21-Problem: Cross Validation
13 0.4662447 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions
14 0.46223503 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
15 0.45639637 6 hunch net-2005-01-27-Learning Complete Problems
16 0.4527494 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
17 0.45237228 306 hunch net-2008-07-02-Proprietary Data in Academic Research?
18 0.44099697 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
19 0.43811885 270 hunch net-2007-11-02-The Machine Learning Award goes to …
20 0.43643582 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
topicId topicWeight
[(10, 0.011), (21, 0.01), (27, 0.172), (30, 0.011), (38, 0.026), (53, 0.058), (55, 0.053), (94, 0.549), (95, 0.024)]
simIndex simValue blogId blogTitle
1 0.99304694 42 hunch net-2005-03-17-Going all the Way, Sometimes
Introduction: At many points in research, you face a choice: should I keep on improving some old piece of technology or should I do something new? For example: Should I refine bounds to make them tighter? Should I take some learning theory and turn it into a learning algorithm? Should I implement the learning algorithm? Should I test the learning algorithm widely? Should I release the algorithm as source code? Should I go see what problems people actually need to solve? The universal temptation of people attracted to research is doing something new. That is sometimes the right decision, but is also often not. I’d like to discuss some reasons why not. Expertise Once expertise are developed on some subject, you are the right person to refine them. What is the real problem? Continually improving a piece of technology is a mechanism forcing you to confront this question. In many cases, this confrontation is uncomfortable because you discover that your method has fundamen
2 0.97985148 81 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
Introduction: Chicago ’05 ended a couple of weeks ago. This was the sixth Machine Learning Summer School , and the second one that used a wiki . (The first was Berder ’04, thanks to Gunnar Raetsch.) Wikis are relatively easy to set up, greatly aid social interaction, and should be used a lot more at summer schools and workshops. They can even be used as the meeting’s webpage, as a permanent record of its participants’ collaborations — see for example the wiki/website for last year’s NVO Summer School . A basic wiki is a collection of editable webpages, maintained by software called a wiki engine . The engine used at both Berder and Chicago was TikiWiki — it is well documented and gets you something running fast. It uses PHP and MySQL, but doesn’t require you to know either. Tikiwiki has far more features than most wikis, as it is really a full Content Management System . (My thanks to Sebastian Stark for pointing this out.) Here are the features we found most useful: Bulletin boa
same-blog 3 0.97889084 115 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
Introduction: “Science” has many meanings, but one common meaning is “the scientific method ” which is a principled method for investigating the world using the following steps: Form a hypothesis about the world. Use the hypothesis to make predictions. Run experiments to confirm or disprove the predictions. The ordering of these steps is very important to the scientific method. In particular, predictions must be made before experiments are run. Given that we all believe in the scientific method of investigation, it may be surprising to learn that cheating is very common. This happens for many reasons, some innocent and some not. Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here : including “reprobleming”, “data set selection”, and probably “overfitting by review”
4 0.97446978 35 hunch net-2005-03-04-The Big O and Constants in Learning
Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera
5 0.9688105 346 hunch net-2009-03-18-Parallel ML primitives
Introduction: Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML. Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second? One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit . Antoine Bordes showed a variant was competitive in the large scale learning challenge . It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimiz
6 0.93079937 120 hunch net-2005-10-10-Predictive Search is Coming
7 0.89852488 276 hunch net-2007-12-10-Learning Track of International Planning Competition
8 0.85177612 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
9 0.78590655 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
10 0.78326005 229 hunch net-2007-01-26-Parallel Machine Learning Problems
11 0.73808998 441 hunch net-2011-08-15-Vowpal Wabbit 6.0
12 0.73434025 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
13 0.71243745 73 hunch net-2005-05-17-A Short Guide to PhD Graduate Study
14 0.70716661 253 hunch net-2007-07-06-Idempotent-capable Predictors
15 0.70413208 13 hunch net-2005-02-04-JMLG
16 0.69923043 178 hunch net-2006-05-08-Big machine learning
17 0.69833106 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
18 0.69309074 471 hunch net-2012-08-24-Patterns for research in machine learning
19 0.69108474 162 hunch net-2006-03-09-Use of Notation
20 0.69058388 306 hunch net-2008-07-02-Proprietary Data in Academic Research?