brendan_oconnor_ai brendan_oconnor_ai-2013 brendan_oconnor_ai-2013-195 knowledge-graph by maker-knowledge-mining

195 brendan oconnor ai-2013-04-21-What inputs do Monte Carlo algorithms need?


meta infos for this blog

Source: html

Introduction: Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution.  They can be organized by what inputs or prior knowledge about the distribution they require.  This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value.  \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. Here are the required inputs for a few algorithms.  (For an overview, see e.g.  Ch 29 of MacKay .)  There are many more out there of course.  I’m leaving off tuning parameters. Black-box samplers:  Slice sampling ,  Affine-invariant ensemble - unnorm density \(f(x)\


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. [sent-1, score-0.486]

2 They can be organized by what inputs or prior knowledge about the distribution they require. [sent-2, score-0.232]

3 This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). [sent-3, score-1.388]

4 Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value. [sent-4, score-0.697]

5 \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. [sent-5, score-0.296]

6 Here are the required inputs for a few algorithms. [sent-6, score-0.104]

7 I’m distinguishing a sampling procedure \(g\) from a density evaluation function \(f\) because having the latter doesn’t necessarily give you the former. [sent-14, score-0.828]

8 (For the one-dimension case, having an inverse CDF indeed gives you a a sampler, but multidimensional gets harder — part of why all these techniques were invented in the first place! [sent-15, score-0.115]

9 )   Shay points out their relationship is analogous to 3-SAT: it’s easy to evaluate a full variable setting, but hard to generate them. [sent-16, score-0.104]

10 (Or specifically, think about a 3-SAT PMF \(p(x) = 1\{\text{\(x\) is boolean satisfiable}\}\) where only one \(x\) has non-zero probability; PMF evaluation is easy but the best known sampler is exponential time. [sent-17, score-0.157]

11 It’s nice to do to ensure you’re actually optimizing or exploring the posterior, but strictly speaking the algorithms don’t require it. [sent-20, score-0.201]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('unnorm', 0.468), ('proposal', 0.445), ('sampling', 0.346), ('generator', 0.281), ('density', 0.244), ('gradient', 0.187), ('gf', 0.14), ('objective', 0.138), ('inputs', 0.104), ('gibbs', 0.104), ('variable', 0.104), ('require', 0.098), ('asymmetric', 0.094), ('carlo', 0.094), ('generators', 0.094), ('monte', 0.094), ('optimizers', 0.094), ('pmf', 0.094), ('slice', 0.094), ('unnormalized', 0.094), ('function', 0.082), ('mcmc', 0.081), ('sgd', 0.074), ('samples', 0.074), ('distribution', 0.071), ('probability', 0.069), ('algorithms', 0.066), ('sampler', 0.065), ('give', 0.064), ('knowledge', 0.057), ('amount', 0.051), ('evaluation', 0.051), ('distinguishing', 0.041), ('tuning', 0.041), ('hastings', 0.041), ('leaving', 0.041), ('samplers', 0.041), ('exponential', 0.041), ('shay', 0.041), ('decompose', 0.041), ('metropolis', 0.041), ('cdf', 0.041), ('ensemble', 0.041), ('inverse', 0.041), ('rough', 0.037), ('techniques', 0.037), ('strictly', 0.037), ('ranges', 0.037), ('worry', 0.037), ('invented', 0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 195 brendan oconnor ai-2013-04-21-What inputs do Monte Carlo algorithms need?

Introduction: Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution.  They can be organized by what inputs or prior knowledge about the distribution they require.  This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value.  \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. Here are the required inputs for a few algorithms.  (For an overview, see e.g.  Ch 29 of MacKay .)  There are many more out there of course.  I’m leaving off tuning parameters. Black-box samplers:  Slice sampling ,  Affine-invariant ensemble - unnorm density \(f(x)\

2 0.080992684 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

Introduction: Here’s Gibbs sampling for a Dirichlet process 1-d mixture of Gaussians . On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the number of points in a cluster. The bottom are the cluster means. Every cluster has a unique color. During MCMC, clusters are created and destroyed. Every cluster has a unique color; when a cluster dies, its color is never reused. I’m showing clusters every 100 iterations. If there is a single point, that cluster was at that iteration but not before or after. If there is a line, the cluster lived for at least 100 iterations. Some clusters live long, some live short, but all eventually die. Usually the model likes to think there are about two clusters, occupying positions at the two modes in the data distribution. It also entertains the existence of several much more minor ones. Usually these are shortlived clusters that die away. But

3 0.053371586 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology

Introduction: I was cleaning my office and found a back-of-envelope diagram Shay drew me once, so I’m writing it up to not forget.  The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to the normal distribution.  If you draw samples from one, the arrows below show the transformation to make it such you have samples from another. For example, if x ~ Normal , then transforming as  y=exp(x) implies y ~ LogNormal .  The adjective terminology is inverted: the logistic function goes from normal to logistic-normal, but the log function goes from log-normal to normal (other way!).  The log of the log-normal is normal, but it’s the logit of the logistic normal that’s normal. Here are densities of these different distributions via transformations from a standard normal. In R:   x=rnorm(1e6); hist(x); hist(exp(x)/(1+exp(x)); hist(exp(x)) Just to make things more confusing, note the logistic-normal distributi

4 0.042098463 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

5 0.041016098 74 brendan oconnor ai-2007-08-08-When’s the last time you dug through 19th century English mortuary records

Introduction: Standard problem: humans lived like crap for thousands and thousands of years, then suddenly some two hundred years ago dramatic industrialization and economic growth happened, though unevenly even through today. Here’s an interesting proposal to explain all this. Gregory Clark found startling empirical evidence that, in the time around the Industrial Revolution in England, wealthier families had more children than poorer families, while middle-class social values — non-violence, literacy, work ethic, high savings rates — also became more widespread during this time. According to the article at least, he actually seems to favor the explanation that human biological evolution was at work; though he notes cultural evolution is possible too. (That is, the children of wealthier families are socialized with their values; as the children of middle-class-valued families increase in proportion in society, the prevalence of those values increases too.) In any case, the argument is tha

6 0.038042866 201 brendan oconnor ai-2013-10-31-tanh is a rescaled logistic sigmoid function

7 0.036862478 146 brendan oconnor ai-2009-07-15-Beta conjugate explorer

8 0.036606204 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

9 0.033101797 178 brendan oconnor ai-2011-11-13-Bayes update view of pointwise mutual information

10 0.032874674 199 brendan oconnor ai-2013-08-31-Probabilistic interpretation of the B3 coreference resolution metric

11 0.029176084 175 brendan oconnor ai-2011-09-25-Information theory stuff

12 0.02835574 174 brendan oconnor ai-2011-09-19-End-to-end NLP packages

13 0.026701095 157 brendan oconnor ai-2009-12-31-List of probabilistic model mini-language toolkits

14 0.026542909 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

15 0.025992138 82 brendan oconnor ai-2007-11-14-Pop cog neuro is so sigh

16 0.02480685 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

17 0.023600705 189 brendan oconnor ai-2012-11-24-Graphs for SANCL-2012 web parsing results

18 0.023535777 99 brendan oconnor ai-2008-04-02-Datawocky: More data usually beats better algorithms

19 0.022835335 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts

20 0.022285827 23 brendan oconnor ai-2005-08-01-Bayesian analysis of intelligent design (revised!)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.077), (1, -0.019), (2, 0.071), (3, -0.034), (4, -0.01), (5, 0.054), (6, 0.011), (7, -0.055), (8, -0.071), (9, 0.006), (10, -0.02), (11, -0.003), (12, -0.027), (13, -0.006), (14, -0.012), (15, 0.016), (16, -0.005), (17, -0.007), (18, 0.003), (19, -0.066), (20, -0.015), (21, -0.015), (22, 0.002), (23, -0.016), (24, -0.002), (25, 0.004), (26, 0.035), (27, 0.03), (28, 0.017), (29, 0.022), (30, -0.011), (31, 0.046), (32, 0.04), (33, -0.06), (34, 0.05), (35, -0.043), (36, -0.027), (37, 0.001), (38, 0.05), (39, -0.043), (40, 0.072), (41, -0.068), (42, -0.04), (43, -0.023), (44, -0.086), (45, 0.123), (46, 0.024), (47, -0.023), (48, 0.003), (49, 0.012)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98885739 195 brendan oconnor ai-2013-04-21-What inputs do Monte Carlo algorithms need?

Introduction: Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution.  They can be organized by what inputs or prior knowledge about the distribution they require.  This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value.  \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. Here are the required inputs for a few algorithms.  (For an overview, see e.g.  Ch 29 of MacKay .)  There are many more out there of course.  I’m leaving off tuning parameters. Black-box samplers:  Slice sampling ,  Affine-invariant ensemble - unnorm density \(f(x)\

2 0.51381224 199 brendan oconnor ai-2013-08-31-Probabilistic interpretation of the B3 coreference resolution metric

Introduction: Here is an intuitive justification for the B3 evaluation metric often used in coreference resolution, based on whether mention pairs are coreferent. If a mention from the document is chosen at random, B3-Recall is the (expected) proportion of its actual coreferents that the system thinks are coreferent with it. B3-Precision is the (expected) proportion of its system-hypothesized coreferents that are actually coreferent with it. Does this look correct to people? Details below: In B3′s basic form, it’s a clustering evaluation metric, to evaluate a gold-standard clustering of mentions against a system-produced clustering of mentions. Let \(G\) mean a gold-standard entity and \(S\) mean a system-predicted entity, where an entity is a set of mentions. \(i\) refers to a mention; there are \(n\) mentions in the document. \(G_i\) means the gold entity that contains mention \(i\); and \(S_i\) means the system entity that has \(i\). The B3 precision and recall for a document

3 0.48818073 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

Introduction: Here’s Gibbs sampling for a Dirichlet process 1-d mixture of Gaussians . On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the number of points in a cluster. The bottom are the cluster means. Every cluster has a unique color. During MCMC, clusters are created and destroyed. Every cluster has a unique color; when a cluster dies, its color is never reused. I’m showing clusters every 100 iterations. If there is a single point, that cluster was at that iteration but not before or after. If there is a line, the cluster lived for at least 100 iterations. Some clusters live long, some live short, but all eventually die. Usually the model likes to think there are about two clusters, occupying positions at the two modes in the data distribution. It also entertains the existence of several much more minor ones. Usually these are shortlived clusters that die away. But

4 0.48242 178 brendan oconnor ai-2011-11-13-Bayes update view of pointwise mutual information

Introduction: This is fun. Pointwise Mutual Information (e.g. Church and Hanks 1990 ) between two variable outcomes \(x\) and \(y\) is \[ PMI(x,y) = \log \frac{p(x,y)}{p(x)p(y)} \] It’s called “pointwise” because Mutual Information , between two (discrete) variables X and Y, is the expectation of PMI over possible outcomes of X and Y: \( MI(X,Y) = \sum_{x,y} p(x,y) PMI(x,y) \). One interpretation of PMI is it’s measuring how much deviation from independence there is — since \(p(x,y)=p(x)p(y)\) if X and Y were independent, so the ratio is how non-independent they (the outcomes) are. You can get another interpretation of this quantity if you switch into conditional probabilities. Looking just at the ratio, apply the definition of conditional probability: \[ \frac{p(x,y)}{p(x)p(y)} = \frac{p(x|y)}{p(x)} \] Think about doing a Bayes update for your belief about \(x\). Start with the prior \(p(x)\), then learn \(y\) and you update to the posterior belief \(p(x|y)\). How much your belief

5 0.46296075 183 brendan oconnor ai-2012-04-11-F-scores, Dice, and Jaccard set similarity

Introduction: The Dice similarity is the same as F1-score ; and they are monotonic in Jaccard similarity . I worked this out recently but couldn’t find anything about it online so here’s a writeup. Let \(A\) be the set of found items, and \(B\) the set of wanted items. \(Prec=|AB|/|A|\), \(Rec=|AB|/|B|\). Their harmonic mean, the \(F1\)-measure, is the same as the Dice coefficient: \begin{align*} F1(A,B) &= \frac{2}{1/P+ 1/R} = \frac{2}{|A|/|AB| + |B|/|AB|} \\ Dice(A,B) &= \frac{2|AB|}{ |A| + |B| } \\ &= \frac{2 |AB|}{ (|AB| + |A \setminus B|) + (|AB| + |B \setminus A|)} \\ &= \frac{|AB|}{|AB| + \frac{1}{2}|A \setminus B| + \frac{1}{2} |B \setminus A|} \end{align*} It’s nice to characterize the set comparison into the three mutually exclusive partitions \(AB\), \(A \setminus B\), and \(B \setminus A\). This illustrates Dice’s close relationship to the Jaccard metric, \begin{align*} Jacc(A,B) &= \frac{|AB|}{|A \cup B|} \\ &= \frac{|AB|}{|AB| + |A \setminus B| + |B \setminus

6 0.43734664 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology

7 0.42351788 74 brendan oconnor ai-2007-08-08-When’s the last time you dug through 19th century English mortuary records

8 0.41835991 201 brendan oconnor ai-2013-10-31-tanh is a rescaled logistic sigmoid function

9 0.38832924 152 brendan oconnor ai-2009-09-08-Another R flashmob today

10 0.38135794 175 brendan oconnor ai-2011-09-25-Information theory stuff

11 0.37610152 66 brendan oconnor ai-2007-06-29-Evangelicals vs. Aquarians

12 0.36241636 68 brendan oconnor ai-2007-07-08-Game outcome graphs — prisoner’s dilemma with FUN ARROWS!!!

13 0.34276247 146 brendan oconnor ai-2009-07-15-Beta conjugate explorer

14 0.31233376 91 brendan oconnor ai-2008-01-27-Graphics! Atari Breakout and religious text NLP

15 0.30937493 190 brendan oconnor ai-2013-01-07-Perplexity as branching factor; as Shannon diversity index

16 0.29729664 154 brendan oconnor ai-2009-09-10-Don’t MAWK AWK – the fastest and most elegant big data munging language!

17 0.28320116 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

18 0.28187642 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts

19 0.27568829 157 brendan oconnor ai-2009-12-31-List of probabilistic model mini-language toolkits

20 0.27471581 182 brendan oconnor ai-2012-03-13-Cosine similarity, Pearson correlation, and OLS coefficients


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(11, 0.594), (16, 0.012), (44, 0.059), (48, 0.027), (72, 0.016), (74, 0.067), (80, 0.099)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96869159 195 brendan oconnor ai-2013-04-21-What inputs do Monte Carlo algorithms need?

Introduction: Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution.  They can be organized by what inputs or prior knowledge about the distribution they require.  This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value.  \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. Here are the required inputs for a few algorithms.  (For an overview, see e.g.  Ch 29 of MacKay .)  There are many more out there of course.  I’m leaving off tuning parameters. Black-box samplers:  Slice sampling ,  Affine-invariant ensemble - unnorm density \(f(x)\

2 0.21383378 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

Introduction: Here’s Gibbs sampling for a Dirichlet process 1-d mixture of Gaussians . On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the number of points in a cluster. The bottom are the cluster means. Every cluster has a unique color. During MCMC, clusters are created and destroyed. Every cluster has a unique color; when a cluster dies, its color is never reused. I’m showing clusters every 100 iterations. If there is a single point, that cluster was at that iteration but not before or after. If there is a line, the cluster lived for at least 100 iterations. Some clusters live long, some live short, but all eventually die. Usually the model likes to think there are about two clusters, occupying positions at the two modes in the data distribution. It also entertains the existence of several much more minor ones. Usually these are shortlived clusters that die away. But

3 0.21329632 54 brendan oconnor ai-2007-03-21-Statistics is big-N logic?

Introduction: I think I believe one of these things, but I’m not quite sure. Statistics is just like logic, except with uncertainty. This would be true if statistics is Bayesian statistics and you buy the Bayesian inductive logic story — add induction to propositional logic, via a conditional credibility operator, and the Cox axioms imply standard probability theory as a consequence. (That is, probability theory is logic with uncertainty. And then a good Bayesian thinks probability theory and statistics are the same.) Links: Jaynes’ explanation ; SEP article ; also Fitelson’s article . (Though there are negative results; all I can think of right now is a Halpern article on Cox; and also interesting is Halpern and Koller .) Secondly, here is another statement. Statistics is just like logic, except with a big N. This is a more data-driven view — the world is full of things and they need to be described. Logical rules can help you describe things, but you also have to deal wit

4 0.20641349 66 brendan oconnor ai-2007-06-29-Evangelicals vs. Aquarians

Introduction: Just read an interesting analysis on the the simultaneous rise of the cultural left and right (“hippies and evangelicals”) through the 50′s and 60′s. Brink Lindsey argues here that they were both reactions to post-war material prosperity: On the left gathered those who were most alive to the new possibilities created by the unprecedented mass affluence of the postwar years but at the same time were hostile to the social institutions — namely, the market and the middle-class work ethic — that created those possibilities. On the right rallied those who staunchly supported the institutions that created prosperity but who shrank from the social dynamism they were unleashing. One side denounced capitalism but gobbled its fruits; the other cursed the fruits while defending the system that bore them. Both causes were quixotic, and consequently neither fully realized its ambitions. I love neat sweeping theories of history ; I can’t take it overly seriously but it is so fun. Lindsey

5 0.20353809 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology

Introduction: I was cleaning my office and found a back-of-envelope diagram Shay drew me once, so I’m writing it up to not forget.  The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to the normal distribution.  If you draw samples from one, the arrows below show the transformation to make it such you have samples from another. For example, if x ~ Normal , then transforming as  y=exp(x) implies y ~ LogNormal .  The adjective terminology is inverted: the logistic function goes from normal to logistic-normal, but the log function goes from log-normal to normal (other way!).  The log of the log-normal is normal, but it’s the logit of the logistic normal that’s normal. Here are densities of these different distributions via transformations from a standard normal. In R:   x=rnorm(1e6); hist(x); hist(exp(x)/(1+exp(x)); hist(exp(x)) Just to make things more confusing, note the logistic-normal distributi

6 0.15835321 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

7 0.1479826 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

8 0.14452437 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

9 0.14108954 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

10 0.13848771 136 brendan oconnor ai-2009-04-01-Binary classification evaluation in R via ROCR

11 0.13798389 53 brendan oconnor ai-2007-03-15-Feminists, anarchists, computational complexity, bounded rationality, nethack, and other things to do

12 0.13729046 188 brendan oconnor ai-2012-10-02-Powerset’s natural language search system

13 0.1365965 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics

14 0.13493817 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

15 0.1323645 111 brendan oconnor ai-2008-08-16-A better Obama vs McCain poll aggregation

16 0.1310551 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

17 0.13089308 175 brendan oconnor ai-2011-09-25-Information theory stuff

18 0.13087988 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

19 0.13039248 84 brendan oconnor ai-2007-11-26-How did Freud become a respected humanist?!

20 0.12928328 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R