hunch_net hunch_net-2008 hunch_net-2008-289 knowledge-graph by maker-knowledge-mining

289 hunch net-2008-02-17-The Meaning of Confidence

meta infos for this blog

Source: html

Introduction: In many machine learning papers experiments are done and little confidence bars are reported for the results. This often seems quite clear, until you actually try to figure out what it means. There are several different kinds of ‘confidence’ being used, and it’s easy to become confused. Confidence = Probability . For those who haven’t worried about confidence for a long time, confidence is simply the probability of some event. You are confident about events which have a large probability. This meaning of confidence is inadequate in many applications because we want to reason about how much more information we have, how much more is needed, and where to get it. As an example, a learning algorithm might predict that the probability of an event is 0.5 , but it’s unclear if the probability is 0.5 because no examples have been provided or 0.5 because many examples have been provided and the event is simply fundamentally uncertain. Classical Confidence Intervals . These a

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 For those who haven’t worried about confidence for a long time, confidence is simply the probability of some event. [sent-5, score-1.231]

2 This meaning of confidence is inadequate in many applications because we want to reason about how much more information we have, how much more is needed, and where to get it. [sent-7, score-0.611]

3 Given observations from the world (such as err-or-not on examples), an interval is constructed around the hidden value. [sent-15, score-0.696]

4 The semantics of the classical confidence interval is: the (random) interval contains the (determistic but unknown) value, with high probability. [sent-16, score-1.762]

5 Classical confidence intervals (as applied in machine learning) typically require that observations are independent. [sent-17, score-1.159]

6 One drawback of concern is that classical confidence intervals breakdown rapidly when conditioning on information. [sent-19, score-1.337]

7 If you have a prior distribution over the way the world creates observations, then you can use Bayes law to construct a posterior distribution over the way the world creates observations. [sent-22, score-0.737]

8 With respect to this posterior distribution, you construct an interval containing the truth with high probability. [sent-23, score-0.695]

9 The semantics of a Bayesian confidence interval is “If the world is drawn from the prior the interval contains the truth with high probability”. [sent-24, score-1.89]

10 Unlike classical confidence intervals, it’s easy to have a statement conditioned on features. [sent-26, score-0.815]

11 My principal source of uneasiness with respect to Bayesian confidence intervals is the “If the world is drawn from the prior” clause—I believe it is difficult to know and specify a correct prior distribution. [sent-29, score-1.309]

12 Many Bayesians aren’t bothered by this, but the meaning of a Bayesian confidence interval becomes unclear if you work with an incorrect (or subjective) prior. [sent-30, score-1.054]

13 The basic line of reasoning seems to be: “Someone once told me that if observations are IID, then their average converges to a normal distribution, so let’s use an unbiased estimate of the mean and variance, assume convergence, and then construct a confidence interval for the mean of a gaussian”. [sent-33, score-1.203]

14 Asymptotic intervals are asymptotically equivalent to classical confidence intervals, but they can differ spectacularly with finite sample sizes. [sent-34, score-1.262]

15 A classical confidence interval for the error rate is [0,log(1/d)/n] where n is the size of the test set and d is the probability that the interval contains the truth. [sent-36, score-1.884]

16 For asymptotic intervals you get [0,0] which is bogus in all applications I’ve encountered. [sent-37, score-0.664]

17 The essential idea, is that we cease to make intervals about the world, and instead make intervals around our predictions of the world. [sent-40, score-1.187]

18 A basic question is: can this notion of internal confidence guide other forms of exploration? [sent-44, score-0.641]

19 In this setting, a confidence interval is (roughly) a set of predictions output by an adaptive rule with the property that it contains the true observation a large fraction of the time. [sent-47, score-1.08]

20 This approach has yet to catch on, but it is interesting because it provides a feature dependent confidence interval without making strong assumptions about the world. [sent-48, score-0.927]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('confidence', 0.528), ('intervals', 0.518), ('interval', 0.399), ('classical', 0.216), ('world', 0.149), ('probability', 0.138), ('contains', 0.113), ('observations', 0.113), ('asymptotic', 0.111), ('truth', 0.081), ('construct', 0.077), ('prior', 0.072), ('distribution', 0.067), ('bayesian', 0.066), ('semantics', 0.06), ('label', 0.059), ('guide', 0.058), ('posterior', 0.056), ('internal', 0.055), ('provided', 0.055), ('creates', 0.05), ('high', 0.047), ('meaning', 0.046), ('error', 0.046), ('event', 0.046), ('unclear', 0.046), ('rate', 0.045), ('given', 0.044), ('mean', 0.043), ('drawn', 0.042), ('predictions', 0.04), ('reported', 0.04), ('conditioning', 0.04), ('cease', 0.04), ('clause', 0.04), ('inadequate', 0.037), ('confident', 0.037), ('observing', 0.037), ('bayesians', 0.037), ('worried', 0.037), ('essential', 0.036), ('easy', 0.036), ('around', 0.035), ('common', 0.035), ('actually', 0.035), ('incorrect', 0.035), ('bogus', 0.035), ('containing', 0.035), ('conditioned', 0.035), ('breakdown', 0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 289 hunch net-2008-02-17-The Meaning of Confidence

2 0.75520468 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

Introduction: Classical confidence intervals satisfy a theorem of the form: For some data sources D , Pr S ~ D (f(D) > g(S)) > 1-d where f is some function of the distribution (such as the mean) and g is some function of the observed sample S . The constraints on D can vary between “Independent and identically distributed (IID) samples from a gaussian with an unknown mean” to “IID samples from an arbitrary distribution D “. There are even some confidence intervals which do not require IID samples. Classical confidence intervals often confuse people. They do not say “with high probability, for my observed sample, the bounds holds”. Instead, they tell you that if you reason according to the confidence interval in the future (and the constraints on D are satisfied), then you are not often wrong. Restated, they tell you something about what a safe procedure is in a stochastic world where d is the safety parameter. There are a number of results in theoretical machine learn

3 0.13572675 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

4 0.13271001 145 hunch net-2005-12-29-Deadline Season

Introduction: Many different paper deadlines are coming up soon so I made a little reference table. Out of curiosity, I also computed the interval between submission deadline and conference. Conference Location Date Deadline interval COLT Pittsburgh June 22-25 January 21 152 ICML Pittsburgh June 26-28 January 30/February 6 140 UAI MIT July 13-16 March 9/March 16 119 AAAI Boston July 16-20 February 16/21 145 KDD Philadelphia August 23-26 March 3/March 10 166 It looks like the northeastern US is the big winner as far as location this year.

5 0.12937094 309 hunch net-2008-07-10-Interesting papers, ICML 2008

Introduction: Here are some papers from ICML 2008 that I found interesting. Risi Kondor and Karsten Borgwardt , The Skew Spectrum of Graphs . This paper is about a new family of functions on graphs which is invariant under node label permutation. They show that these quantities appear to yield good features for learning. Sanjoy Dasgupta and Daniel Hsu . Hierarchical sampling for active learning. This is the first published practical consistent active learning algorithm. The abstract is also pretty impressive. Lihong Li , Michael Littman , and Thomas Walsh Knows What It Knows: A Framework For Self-Aware Learning. This is an attempt to create learning algorithms that know when they err, (other work includes Vovk ). It’s not yet clear to me what the right model for feature-dependent confidence intervals is. Novi Quadrianto , Alex Smola , TIberio Caetano , and Quoc Viet Le Estimating Labels from Label Proportions . This is an example of learning in a speciali

6 0.11081854 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

7 0.10969181 282 hunch net-2008-01-06-Research Political Issues

8 0.10173588 235 hunch net-2007-03-03-All Models of Learning have Flaws

9 0.10092787 34 hunch net-2005-03-02-Prior, “Prior” and Bias

10 0.1008238 2 hunch net-2005-01-24-Holy grails of machine learning?

11 0.09917599 165 hunch net-2006-03-23-The Approximation Argument

12 0.095200151 41 hunch net-2005-03-15-The State of Tight Bounds

13 0.087082326 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

14 0.085758872 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

15 0.085421138 19 hunch net-2005-02-14-Clever Methods of Overfitting

16 0.08239194 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

17 0.079603791 259 hunch net-2007-08-19-Choice of Metrics

18 0.075574607 244 hunch net-2007-05-09-The Missing Bound

19 0.071309336 218 hunch net-2006-11-20-Context and the calculation misperception

20 0.070610672 258 hunch net-2007-08-12-Exponentiated Gradient

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.162), (1, 0.112), (2, 0.032), (3, -0.043), (4, -0.001), (5, -0.085), (6, 0.038), (7, 0.057), (8, 0.125), (9, -0.07), (10, 0.142), (11, 0.075), (12, 0.137), (13, -0.065), (14, 0.018), (15, -0.177), (16, -0.274), (17, -0.007), (18, -0.029), (19, 0.035), (20, -0.012), (21, 0.21), (22, 0.035), (23, 0.176), (24, -0.147), (25, 0.099), (26, -0.236), (27, -0.16), (28, -0.176), (29, 0.172), (30, 0.105), (31, 0.042), (32, -0.113), (33, -0.094), (34, 0.059), (35, -0.012), (36, -0.138), (37, -0.126), (38, -0.022), (39, -0.191), (40, 0.185), (41, 0.134), (42, 0.021), (43, -0.136), (44, 0.054), (45, -0.084), (46, -0.025), (47, -0.027), (48, 0.072), (49, 0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97921413 289 hunch net-2008-02-17-The Meaning of Confidence

2 0.96323806 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

3 0.40523297 5 hunch net-2005-01-26-Watchword: Probability

4 0.39286894 169 hunch net-2006-04-05-What is state?

Introduction: In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. Newtonian State. State is the physical pose of the world. Under this definition, there are very many states, often too many for explicit representation. This is also the definition typically used in games. Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?” Mathematical State. State is a sufficient stati

5 0.34979984 282 hunch net-2008-01-06-Research Political Issues

Introduction: I’ve avoided discussing politics here, although not for lack of interest. The problem with discussing politics is that it’s customary for people to say much based upon little information. Nevertheless, politics can have a substantial impact on science (and we might hope for the vice-versa). It’s primary election time in the United States, so the topic is timely, although the issues are not. There are several policy decisions which substantially effect development of science and technology in the US. Education The US has great contrasts in education. The top universities are very good places, yet the grade school education system produces mediocre results. For me, the contrast between a public education and Caltech was bracing. For many others attending Caltech, it clearly was not. Upgrading the k-12 education system in the US is a long-standing chronic problem which I know relatively little about. My own experience is that a basic attitude of “no child unrealized” i

6 0.3310689 176 hunch net-2006-05-01-A conversation between Theo and Pat

7 0.30203941 244 hunch net-2007-05-09-The Missing Bound

8 0.2979846 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

9 0.29045227 183 hunch net-2006-06-14-Explorations of Exploration

10 0.28685227 123 hunch net-2005-10-16-Complexity: It’s all in your head

11 0.27376974 34 hunch net-2005-03-02-Prior, “Prior” and Bias

12 0.26201004 76 hunch net-2005-05-29-Bad ideas

13 0.2486888 39 hunch net-2005-03-10-Breaking Abstractions

14 0.24765207 2 hunch net-2005-01-24-Holy grails of machine learning?

15 0.24558003 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

16 0.24481861 218 hunch net-2006-11-20-Context and the calculation misperception

17 0.2391261 41 hunch net-2005-03-15-The State of Tight Bounds

18 0.23824753 118 hunch net-2005-10-07-On-line learning of regular decision rules

19 0.23788375 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets

20 0.23607914 309 hunch net-2008-07-10-Interesting papers, ICML 2008

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.216), (8, 0.018), (27, 0.258), (31, 0.012), (38, 0.064), (46, 0.018), (48, 0.018), (53, 0.037), (55, 0.07), (77, 0.029), (94, 0.062), (95, 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96181965 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

2 0.95574278 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

Introduction: I’m offering a reward of $1000 for a solution to this problem. This joins the cross validation problem which I’m offering a $500 reward for. I believe both of these problems are hard but plausibly solvable, and plausibly with a solution of substantial practical value. While it’s unlikely these rewards are worth your time on an hourly wage basis, the recognition for solving them definitely should be The Problem The problem is finding a general, robust, and efficient mechanism for estimating a conditional probability P(y|x) where robustness and efficiency are measured using techniques from learning reductions. In particular, suppose we have access to a binary regression oracle B which has two interfaces—one for specifying training information and one for testing. Training information is specified as B(x’,y’) where x’ is a feature vector and y’ is a scalar in [0,1] with no value returned. Testing is done according to B(x’) with a value in [0,1] returned.

3 0.95258737 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

Introduction: I want to expand on this post which describes one of the core tricks for making Vowpal Wabbit fast and easy to use when learning from text. The central trick is converting a word (or any other parseable quantity) into a number via a hash function. Kishore tells me this is a relatively old trick in NLP land, but it has some added advantages when doing online learning, because you can learn directly from the existing data without preprocessing the data to create features (destroying the online property) or using an expensive hashtable lookup (slowing things down). A central concern for this approach is collisions, which create a loss of information. If you use m features in an index space of size n the birthday paradox suggests a collision if m > n 0.5 , essentially because there are m 2 pairs. This is pretty bad, because it says that with a vocabulary of 10 5 features, you might need to have 10 10 entries in your table. It turns out that redundancy is gr

4 0.94528598 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim

same-blog 5 0.9374575 289 hunch net-2008-02-17-The Meaning of Confidence

6 0.93619877 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

7 0.90155685 183 hunch net-2006-06-14-Explorations of Exploration

8 0.84431052 484 hunch net-2013-06-16-Representative Reviewing

9 0.83506936 223 hunch net-2006-12-06-The Spam Problem

10 0.83292115 259 hunch net-2007-08-19-Choice of Metrics

11 0.83221841 34 hunch net-2005-03-02-Prior, “Prior” and Bias

12 0.81938654 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

13 0.81223422 401 hunch net-2010-06-20-2010 ICML discussion site

14 0.810543 41 hunch net-2005-03-15-The State of Tight Bounds

15 0.80753821 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

16 0.80683029 194 hunch net-2006-07-11-New Models

17 0.80603987 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

18 0.80598551 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.79911816 67 hunch net-2005-05-06-Don’t mix the solution into the problem

20 0.79827237 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification