andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1509 knowledge-graph by maker-knowledge-mining

1509 andrew gelman stats-2012-09-24-Analyzing photon counts


meta infos for this blog

Source: html

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. [sent-1, score-0.306]

2 We are all experimentalists, but need an input on a fine statistics point. [sent-2, score-0.171]

3 The data set consists of photon counts measured at a series of coordinates. [sent-4, score-0.637]

4 The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. [sent-5, score-0.86]

5 The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. [sent-6, score-0.908]

6 The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. [sent-7, score-0.721]

7 At present, my labmates are not sure how to properly calculate and use the weights. [sent-8, score-0.573]

8 The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. [sent-9, score-1.192]

9 Could you recommend a source they could use to guide their calculations? [sent-10, score-0.234]

10 My reply: I don’t know anything about this (although I assume I could figure out a good answer easily enough if I knew more about the model). [sent-11, score-0.221]

11 I just thought this was worth sharing, partly because maybe some readers have a good answer and partly as an example of the wide variety of terminology used in different statistical applications. [sent-12, score-0.644]

12 My general advice in this sort of problem is to forget about the weights as weights and instead think about where they came from, and include in the model (typically in a likelihood function or a poststratification summary) the information that went into the weights. [sent-13, score-1.081]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('weights', 0.35), ('labmates', 0.306), ('photons', 0.306), ('photon', 0.279), ('gaussian', 0.254), ('binomial', 0.197), ('counts', 0.172), ('input', 0.171), ('partly', 0.149), ('transmitted', 0.139), ('experimentalists', 0.131), ('transmission', 0.121), ('coordinate', 0.118), ('known', 0.113), ('boris', 0.112), ('consists', 0.108), ('terminology', 0.104), ('width', 0.104), ('modified', 0.101), ('edge', 0.1), ('distribution', 0.096), ('properly', 0.096), ('equations', 0.095), ('spatial', 0.093), ('problem', 0.091), ('poststratification', 0.091), ('calculate', 0.09), ('answer', 0.088), ('sharing', 0.087), ('guide', 0.087), ('number', 0.082), ('designed', 0.082), ('tom', 0.081), ('use', 0.081), ('calculations', 0.08), ('wide', 0.079), ('measured', 0.078), ('forget', 0.077), ('variety', 0.075), ('near', 0.072), ('knew', 0.07), ('ones', 0.068), ('needs', 0.067), ('source', 0.066), ('follows', 0.066), ('easily', 0.063), ('distributions', 0.062), ('via', 0.061), ('likelihood', 0.061), ('function', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a

2 0.20370899 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

Introduction: Yajuan Si, Natesh Pillai, and I write : It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference using inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency. More work needs to be done for this to be a general practical tool—in particular, in the setup of this paper you only have survey weights and no direct poststratification variab

3 0.19519417 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

4 0.1367263 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

5 0.11969432 1431 andrew gelman stats-2012-07-27-Overfitting

Introduction: Ilya Esteban writes: In traditional machine learning and statistical learning techniques, you spend a lot of time selecting your input features, fiddling with model parameter values, etc., all of which leads to the problem of overfitting the data and producing overly optimistic estimates for how good the model really is. You can use techniques such as cross-validation and out-of-sample validation data to try to limit the damage, but they are imperfect solutions at best. While Bayesian models have the great advantage of not forcing you to manually select among the various weights and input features, you still often end up trying different priors and model structures (especially with hierarchical models), before coming up with a “final” model. When applying Bayesian modeling to real world data sets, how does should you evaluate alternate priors and topologies for the model without falling into the same overfitting trap as you do with non-Bayesian models? If you try several different

6 0.11758547 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

7 0.11586386 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

8 0.11040634 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

9 0.10952535 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

10 0.10752365 833 andrew gelman stats-2011-07-31-Untunable Metropolis

11 0.1016076 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

12 0.088139959 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

13 0.087615542 2139 andrew gelman stats-2013-12-19-Happy birthday

14 0.087513752 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

15 0.086968422 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

16 0.08515273 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

17 0.084396258 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

18 0.083684675 1881 andrew gelman stats-2013-06-03-Boot

19 0.08249601 1409 andrew gelman stats-2012-07-08-Is linear regression unethical in that it gives more weight to cases that are far from the average?

20 0.080824085 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.142), (1, 0.088), (2, 0.014), (3, 0.012), (4, 0.064), (5, 0.003), (6, 0.011), (7, -0.006), (8, 0.024), (9, -0.0), (10, 0.029), (11, -0.023), (12, -0.02), (13, -0.003), (14, -0.048), (15, -0.007), (16, 0.009), (17, -0.016), (18, 0.02), (19, -0.043), (20, 0.033), (21, 0.016), (22, -0.012), (23, 0.005), (24, -0.017), (25, 0.035), (26, -0.015), (27, -0.004), (28, 0.042), (29, 0.043), (30, 0.044), (31, 0.019), (32, -0.007), (33, 0.057), (34, -0.021), (35, 0.012), (36, -0.005), (37, 0.037), (38, -0.027), (39, 0.026), (40, 0.039), (41, -0.012), (42, 0.016), (43, -0.016), (44, 0.03), (45, 0.019), (46, 0.056), (47, 0.005), (48, 0.022), (49, 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95963597 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a

2 0.79799527 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

Introduction: John Cook considers how people justify probability distribution assumptions: Sometimes distribution assumptions are not justified. Sometimes distributions can be derived from fundamental principles [or] . . . on theoretical grounds. For example, large samples and the central limit theorem together may justify assuming that something is normally distributed. Often the choice of distribution is somewhat arbitrary, chosen by intuition or for convenience, and then empirically shown to work well enough. Sometimes a distribution can be a bad fit and still work well, depending on what you’re asking of it. Cook continues: The last point is particularly interesting. It’s not hard to imagine that a poor fit would produce poor results. It’s surprising when a poor fit produces good results. And then he gives an example of an effective but inaccurate model used to model survival times in a clinical trial. Cook explains: The [poorly-fitting] method works well because of the q

3 0.79433912 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

Introduction: Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured for each student as scores derived from averaging a student’s responses on particular subsets of survey questions. (I suppose it is not uncontroversial to treat these scores as interval measures and would be interested to hear if you have any objections.) I am comparing genders on each construct. Researchers typically use t-tests to do so. To use a Bayesian approach I applied the programs written in R and JAGS by John Kruschke for estimating the difference of means: http://www.indiana.edu/~kruschke/BEST/ An issue in that analysis is that the distributions of student scores are not normal. There was skewness in some of the distributions and not always in

4 0.77983081 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction

Introduction: Shravan writes: I have a problem very similar to the one presented chapter 6 of BDA, the speed of light example. You use the distribution of the minimum scores from the posterior predictive distribution, show that it’s not realistic given the data, and suggest that an asymmetric contaminated normal distribution or a symmetric long-tailed distribution would be better. How does one use such a distribution? My reply: You can actually use a symmetric long-tailed distribution such as t with low degrees of freedom. One striking feature of symmetric long-tailed distributions is that a small random sample from such a distribution can have outliers on one side or the other and look asymmetric. Just to see this, try the following in R: par (mfrow=c(3,3), mar=c(1,1,1,1)) for (i in 1:9) hist (rt (100, 2), xlab="", ylab="", main="") You’ll see some skewed distributions. So that’s the message (which I learned from an offhand comment of Rubin, actually): if you want to model

5 0.75230783 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

Introduction: Michael McLaughlin sent me the following query with the above title. Some time ago, I [McLaughlin] was handed a dataset that needed to be modeled. It was generated as follows: 1. Random navigation errors, historically a binary mixture of normal and Laplace with a common mean, were collected by observation. 2. Sadly, these data were recorded with too few decimal places so that the resulting quantization is clearly visible in a scatterplot. 3. The quantized data were then interpolated (to an unobserved location). The final result looks like fuzzy points (small scale jitter) at quantized intervals spanning a much larger scale (the parent mixture distribution). This fuzziness, likely ~normal or ~Laplace, results from the interpolation. Otherwise, the data would look like a discrete analogue of the normal/Laplace mixture. I would like to characterize the latent normal/Laplace mixture distribution but the quantization is “getting in the way”. When I tried MCMC on this proble

6 0.72695327 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

7 0.72606641 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

8 0.72244406 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

9 0.71849161 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

10 0.71737915 14 andrew gelman stats-2010-05-01-Imputing count data

11 0.7160244 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

12 0.71306711 2342 andrew gelman stats-2014-05-21-Models with constraints

13 0.71213782 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

14 0.71173435 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

15 0.71035194 569 andrew gelman stats-2011-02-12-Get the Data

16 0.70982534 938 andrew gelman stats-2011-10-03-Comparing prediction errors

17 0.70961767 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

18 0.70828021 1881 andrew gelman stats-2013-06-03-Boot

19 0.6982109 782 andrew gelman stats-2011-06-29-Putting together multinomial discrete regressions by combining simple logits

20 0.69772708 1383 andrew gelman stats-2012-06-18-Hierarchical modeling as a framework for extrapolation


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(13, 0.215), (16, 0.07), (24, 0.134), (27, 0.011), (45, 0.011), (53, 0.052), (59, 0.013), (61, 0.013), (66, 0.012), (77, 0.01), (79, 0.013), (86, 0.017), (88, 0.051), (89, 0.022), (99, 0.241)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94358993 234 andrew gelman stats-2010-08-25-Modeling constrained parameters

Introduction: Mike McLaughlin writes: In general, is there any way to do MCMC with a fixed constraint? E.g., suppose I measure the three internal angles of a triangle with errors ~dnorm(0, tau) where tau might be different for the three measurements. This would be an easy BUGS/WinBUGS/JAGS exercise but suppose, in addition, I wanted to include prior information to the effect that the three angles had to total 180 degrees exactly. Is this feasible? Could you point me to any BUGS model in which a constraint of this type is implemented? Note: Even in my own (non-hierarchical) code which tends to be component-wise, random-walk Metropolis with tuned Laplacian proposals, I cannot see how I could incorporate such a constraint. My reply: See page 508 of Bayesian Data Analysis (2nd edition). We have an example of such a model there (from this paper with Bois and Jiang).

2 0.92212737 800 andrew gelman stats-2011-07-13-I like lineplots

Introduction: These particular lineplots are called parallel coordinate plots.

3 0.92028093 1789 andrew gelman stats-2013-04-05-Elites have alcohol problems too!

Introduction: Speaking of Tyler Cowen, I’m puzzled by this paragraph of his: Guns, like alcohol, have many legitimate uses, and they are enjoyed by many people in a responsible manner. In both cases, there is an elite which has absolutely no problems handling the institution in question, but still there is the question of whether the nation really can have such bifurcated social norms, namely one set of standards for the elite and another set for everybody else. I don’t know anything about guns so I’ll set that part aside. My bafflement is with the claim that “there is an elite which has absolutely no problem handling [alcohol].” Is he kidding? Unless Cowen is circularly defining “an elite” as the subset of elites who don’t have an alcohol problem, I don’t buy this claim. And I actually think it’s a serious problem, that various “elites” are so sure that they have “absolutely no problem” that they do dangerous, dangerous things. Consider the notorious incident when Dick Cheney shot a

4 0.91373765 1559 andrew gelman stats-2012-11-02-The blog is back

Introduction: We had some security problem: not an actual virus or anything, but a potential leak which caused Google to blacklist us. Cord fixed us and now we’re fine. Good job, Google! Better to find the potential problem before there is any harm!

same-blog 5 0.91052294 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a

6 0.90885001 172 andrew gelman stats-2010-07-30-Why don’t we have peer reviewing for oral presentations?

7 0.89779967 437 andrew gelman stats-2010-11-29-The mystery of the U-shaped relationship between happiness and age

8 0.89256716 1514 andrew gelman stats-2012-09-28-AdviseStat 47% Campaign Ad

9 0.88015568 980 andrew gelman stats-2011-10-29-When people meet this guy, can they resist the temptation to ask him what he’s doing for breakfast??

10 0.8758558 971 andrew gelman stats-2011-10-25-Apply now for Earth Institute postdoctoral fellowships at Columbia University

11 0.87335706 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R

12 0.87271416 1137 andrew gelman stats-2012-01-24-Difficulties in publishing non-replications of implausible findings

13 0.87189525 1648 andrew gelman stats-2013-01-02-A important new survey of Bayesian predictive methods for model assessment, selection and comparison

14 0.86817658 345 andrew gelman stats-2010-10-15-Things we do on sabbatical instead of actually working

15 0.86587155 1942 andrew gelman stats-2013-07-17-“Stop and frisk” statistics

16 0.86378825 1852 andrew gelman stats-2013-05-12-Crime novels for economists

17 0.85499454 1916 andrew gelman stats-2013-06-27-The weirdest thing about the AJPH story

18 0.84134626 1907 andrew gelman stats-2013-06-20-Amazing retro gnu graphics!

19 0.83316517 2069 andrew gelman stats-2013-10-19-R package for effect size calculations for psychology researchers

20 0.82921624 424 andrew gelman stats-2010-11-21-Data cleaning tool!