hunch_net hunch_net-2005 hunch_net-2005-5 knowledge-graph by maker-knowledge-mining

5 hunch net-2005-01-26-Watchword: Probability


meta infos for this blog

Source: html

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Probability is one of the most confusingly used words in machine learning. [sent-1, score-0.138]

2 There are at least 3 distinct ways the word is used. [sent-2, score-0.189]

3 Bayesian The Bayesian notion of probability is a ‘degree of belief’. [sent-3, score-0.636]

4 “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds? [sent-6, score-1.325]

5 ” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . [sent-7, score-0.22]

6 Bayesian probabilities express lack of knowledge rather than randomization. [sent-9, score-0.607]

7 They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. [sent-10, score-0.475]

8 Frequentist The Frequentist notion of probability is a rate of occurence. [sent-12, score-0.711]

9 A rate of occurrence can be measured by doing an experiment many times. [sent-13, score-0.333]

10 If an event occurs k times in n experiments then it has probability about k/n . [sent-14, score-0.731]

11 Frequentist probabilities can be used to measure how sure you are about something. [sent-15, score-0.313]

12 They may be appropriate in a learning context for measuring confidence in various predictors. [sent-16, score-0.216]

13 The frequentist notion of probability is common in physics, other sciences, and computer science theory. [sent-17, score-1.073]

14 Estimated The estimated notion of probability is measured by running some learning algorithm which predicts the probability of events rather than events. [sent-18, score-1.628]

15 I tend to dislike this use of the word because it confuses the world with the model of the world. [sent-19, score-0.266]

16 To avoid confusion, you should be careful to understand what other people mean for this word. [sent-20, score-0.117]

17 It is helpful to always be explicit about which variables are randomized and which are constant whenever probability is used because Bayesian and Frequentist probabilities commonly switch this role. [sent-21, score-1.299]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('frequentist', 0.437), ('probability', 0.43), ('probabilities', 0.232), ('stock', 0.218), ('bayesian', 0.208), ('notion', 0.206), ('measured', 0.202), ('goes', 0.187), ('estimated', 0.167), ('switch', 0.154), ('lack', 0.147), ('occurs', 0.132), ('degree', 0.12), ('word', 0.12), ('event', 0.115), ('belief', 0.105), ('measuring', 0.1), ('knowledge', 0.098), ('sciences', 0.093), ('bet', 0.087), ('dislike', 0.087), ('whenever', 0.087), ('expressing', 0.083), ('used', 0.081), ('randomized', 0.077), ('rate', 0.075), ('odds', 0.075), ('predicts', 0.075), ('express', 0.073), ('role', 0.071), ('asking', 0.069), ('confusion', 0.069), ('distinct', 0.069), ('physics', 0.069), ('variables', 0.067), ('consistent', 0.066), ('careful', 0.063), ('events', 0.061), ('confidence', 0.06), ('tend', 0.059), ('explicit', 0.058), ('constant', 0.057), ('words', 0.057), ('rather', 0.057), ('experiment', 0.056), ('sequence', 0.056), ('appropriate', 0.056), ('commonly', 0.056), ('experiments', 0.054), ('mean', 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

2 0.21799894 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

3 0.17853141 62 hunch net-2005-04-26-To calibrate or not?

Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b

4 0.1630365 123 hunch net-2005-10-16-Complexity: It’s all in your head

Introduction: One of the central concerns of learning is to understand and to prevent overfitting. Various notion of “function complexity” often arise: VC dimension, Rademacher complexity, comparison classes of experts, and program length are just a few. The term “complexity” to me seems somehow misleading; the terms never capture something that meets my intuitive notion of complexity. The Bayesian notion clearly captures what’s going on. Functions aren’t “complex”– they’re just “surprising”: we assign to them low probability. Most (all?) complexity notions I know boil down to some (generally loose) bound on the prior probability of the function. In a sense, “complexity” fundementally arises because probability distributions must sum to one. You can’t believe in all possibilities at the same time, or at least not equally. Rather you have to carefully spread the probability mass over the options you’d like to consider. Large complexity classes means that beliefs are spread thinly. In

5 0.16161557 140 hunch net-2005-12-14-More NIPS Papers II

Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating

6 0.1567149 218 hunch net-2006-11-20-Context and the calculation misperception

7 0.15013483 34 hunch net-2005-03-02-Prior, “Prior” and Bias

8 0.13931176 330 hunch net-2008-12-07-A NIPS paper

9 0.13572675 289 hunch net-2008-02-17-The Meaning of Confidence

10 0.13470075 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

11 0.12812044 118 hunch net-2005-10-07-On-line learning of regular decision rules

12 0.11852598 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

13 0.1039708 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

14 0.087212659 213 hunch net-2006-10-08-Incompatibilities between classical confidence intervals and learning.

15 0.084933013 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

16 0.084092587 33 hunch net-2005-02-28-Regularization

17 0.081402868 112 hunch net-2005-09-14-The Predictionist Viewpoint

18 0.079260938 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

19 0.074909851 14 hunch net-2005-02-07-The State of the Reduction

20 0.074311957 440 hunch net-2011-08-06-Interesting thing at UAI 2011


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.144), (1, 0.095), (2, 0.02), (3, -0.004), (4, -0.035), (5, -0.03), (6, 0.018), (7, 0.055), (8, 0.147), (9, -0.087), (10, 0.014), (11, -0.044), (12, 0.062), (13, -0.118), (14, 0.055), (15, -0.076), (16, -0.179), (17, -0.03), (18, 0.117), (19, -0.003), (20, -0.098), (21, 0.116), (22, 0.118), (23, 0.135), (24, -0.031), (25, 0.035), (26, -0.012), (27, -0.059), (28, -0.074), (29, 0.094), (30, 0.036), (31, 0.085), (32, -0.018), (33, -0.014), (34, -0.086), (35, 0.02), (36, -0.039), (37, 0.041), (38, 0.053), (39, 0.089), (40, -0.037), (41, -0.019), (42, -0.039), (43, 0.135), (44, 0.01), (45, 0.152), (46, 0.059), (47, 0.022), (48, -0.12), (49, 0.189)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98110473 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

2 0.75535142 62 hunch net-2005-04-26-To calibrate or not?

Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b

3 0.73403347 39 hunch net-2005-03-10-Breaking Abstractions

Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu

4 0.7321524 123 hunch net-2005-10-16-Complexity: It’s all in your head

Introduction: One of the central concerns of learning is to understand and to prevent overfitting. Various notion of “function complexity” often arise: VC dimension, Rademacher complexity, comparison classes of experts, and program length are just a few. The term “complexity” to me seems somehow misleading; the terms never capture something that meets my intuitive notion of complexity. The Bayesian notion clearly captures what’s going on. Functions aren’t “complex”– they’re just “surprising”: we assign to them low probability. Most (all?) complexity notions I know boil down to some (generally loose) bound on the prior probability of the function. In a sense, “complexity” fundementally arises because probability distributions must sum to one. You can’t believe in all possibilities at the same time, or at least not equally. Rather you have to carefully spread the probability mass over the options you’d like to consider. Large complexity classes means that beliefs are spread thinly. In

5 0.62204212 118 hunch net-2005-10-07-On-line learning of regular decision rules

Introduction: Many decision problems can be represented in the form FOR n =1,2,…: — Reality chooses a datum x n . — Decision Maker chooses his decision d n . — Reality chooses an observation y n . — Decision Maker suffers loss L ( y n , d n ). END FOR. The observation y n can be, for example, tomorrow’s stock price and the decision d n the number of shares Decision Maker chooses to buy. The datum x n ideally contains all information that might be relevant in making this decision. We do not want to assume anything about the way Reality generates the observations and data. Suppose there is a good and not too complex decision rule D mapping each datum x to a decision D ( x ). Can we perform as well, or almost as well, as D , without knowing it? This is essentially a special case of the problem of on-line learning . This is a simple result of this kind. Suppose the data x n are taken from [0,1] and L ( y , d )=| y – d |. A norm || h || of a function h on

6 0.58302975 330 hunch net-2008-12-07-A NIPS paper

7 0.58109385 218 hunch net-2006-11-20-Context and the calculation misperception

8 0.55303824 34 hunch net-2005-03-02-Prior, “Prior” and Bias

9 0.53830951 302 hunch net-2008-05-25-Inappropriate Mathematics for Machine Learning

10 0.5273369 140 hunch net-2005-12-14-More NIPS Papers II

11 0.47251076 33 hunch net-2005-02-28-Regularization

12 0.46442753 191 hunch net-2006-07-08-MaxEnt contradicts Bayes Rule?

13 0.46183863 157 hunch net-2006-02-18-Multiplication of Learned Probabilities is Dangerous

14 0.45455033 413 hunch net-2010-10-08-An easy proof of the Chernoff-Hoeffding bound

15 0.44432926 289 hunch net-2008-02-17-The Meaning of Confidence

16 0.4350701 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning

17 0.43475404 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

18 0.40972137 440 hunch net-2011-08-06-Interesting thing at UAI 2011

19 0.40419224 160 hunch net-2006-03-02-Why do people count for learning?

20 0.39891955 205 hunch net-2006-09-07-Objective and subjective interpretations of probability


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.022), (3, 0.052), (10, 0.049), (27, 0.202), (52, 0.275), (53, 0.069), (55, 0.104), (94, 0.094), (95, 0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.88349575 169 hunch net-2006-04-05-What is state?

Introduction: In reinforcement learning (and sometimes other settings), there is a notion of “state”. Based upon the state various predictions are made such as “Which action should be taken next?” or “How much cumulative reward do I expect if I take some action from this state?” Given the importance of state, it is important to examine the meaning. There are actually several distinct options and it turns out the definition variation is very important in motivating different pieces of work. Newtonian State. State is the physical pose of the world. Under this definition, there are very many states, often too many for explicit representation. This is also the definition typically used in games. Abstracted State. State is an abstracted physical state of the world. “Is the door open or closed?” “Are you in room A or not?” The number of states is much smaller here. A basic issue here is: “How do you compute the state from observations?” Mathematical State. State is a sufficient stati

same-blog 2 0.83268732 5 hunch net-2005-01-26-Watchword: Probability

Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in

3 0.73408514 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

Introduction: Urs Hoelzle from Google gave an invited presentation at NIPS . In the presentation, he strongly advocates interacting with data in a particular scalable manner which is something like the following: Make a cluster of machines. Build a unified filesystem. (Google uses GFS, but NFS or other approaches work reasonably well for smaller clusters.) Interact with data via MapReduce . Creating a cluster of machines is, by this point, relatively straightforward. Unified filesystems are a little bit tricky—GFS is capable by design of essentially unlimited speed throughput to disk. NFS can bottleneck because all of the data has to move through one machine. Nevertheless, this may not be a limiting factor for smaller clusters. MapReduce is a programming paradigm. Essentially, it is a combination of a data element transform (map) and an agreggator/selector (reduce). These operations are highly parallelizable and the claim is that they support the forms of data interacti

4 0.66334814 40 hunch net-2005-03-13-Avoiding Bad Reviewing

Introduction: If we accept that bad reviewing often occurs and want to fix it, the question is “how”? Reviewing is done by paper writers just like yourself, so a good proxy for this question is asking “How can I be a better reviewer?” Here are a few things I’ve learned by trial (and error), as a paper writer, and as a reviewer. The secret ingredient is careful thought. There is no good substitution for a deep and careful understanding. Avoid reviewing papers that you feel competitive about. You almost certainly will be asked to review papers that feel competitive if you work on subjects of common interest. But, the feeling of competition can easily lead to bad judgement. If you feel biased for some other reason, then you should avoid reviewing. For example… Feeling angry or threatened by a paper is a form of bias. See above. Double blind yourself (avoid looking at the name even in a single-blind situation). The significant effect of a name you recognize is making you pay close a

5 0.66225868 484 hunch net-2013-06-16-Representative Reviewing

Introduction: When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a “good paper” looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas? An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change fo

6 0.66205764 96 hunch net-2005-07-21-Six Months

7 0.66041273 207 hunch net-2006-09-12-Incentive Compatible Reviewing

8 0.66035157 437 hunch net-2011-07-10-ICML 2011 and the future

9 0.66023535 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?

10 0.6599375 343 hunch net-2009-02-18-Decision by Vetocracy

11 0.65725166 95 hunch net-2005-07-14-What Learning Theory might do

12 0.65689582 286 hunch net-2008-01-25-Turing’s Club for Machine Learning

13 0.65647691 461 hunch net-2012-04-09-ICML author feedback is open

14 0.65312225 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

15 0.65257174 98 hunch net-2005-07-27-Not goal metrics

16 0.65092397 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

17 0.65071911 332 hunch net-2008-12-23-Use of Learning Theory

18 0.65035552 454 hunch net-2012-01-30-ICML Posters and Scope

19 0.64996225 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

20 0.64920646 347 hunch net-2009-03-26-Machine Learning is too easy