andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1164 knowledge-graph by maker-knowledge-mining

1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes


meta infos for this blog

Source: html

Introduction: Corrected equation                 This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that! Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemic


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemicals (gasoline, car exhaust, etc. [sent-10, score-0.788]

2 The fan is ramped up in speed, or is sped up in stages, so as to gradually pressurize the house relative to the outdoors. [sent-13, score-0.364]

3 The flow rate through the fan is measured, so you know how much air is going into the house. [sent-14, score-0.407]

4 That amount has to leak out of the house, either to the outdoors or into the garage. [sent-15, score-0.448]

5 The researcher measures the pressure difference between the house and the outdoors, and between the garage and the outdoors; of course, from these s/he can determine the pressure difference between the house and the garage. [sent-16, score-1.744]

6 All of the air that flows in (Q_{ho}, and don’t try to tell me it should be Q_{oh} because I know) has to flow out. [sent-17, score-0.573]

7 The first equation in the graphic is the conservation equation if you consider drawing a boundary around just the house: the air that flows into the house is the amount that flows directly to the outdoors plus the amount that flows into the garage. [sent-18, score-1.801]

8 P_ho and P_hg are the pressure differences between the house and the outdoors and between the house and the garage, which are measured. [sent-19, score-1.233]

9 If a pressure is negative, then P^n is to be interpreted as sign(P)*abs(P)^n. [sent-24, score-0.362]

10 The second equation shows what happens if you draw look at the flows through the entire house-garage boundary. [sent-25, score-0.333]

11 Again, everything that flows in has to flow out, either from the house to the outdoors or the garage to the outdoors. [sent-26, score-1.393]

12 The flow from the garage to outdoors introduces two additional parameters. [sent-27, score-0.863]

13 I have a bunch of measurements of pressures (P_ij) and flows through the blower door (Q_ho). [sent-29, score-0.619]

14 One of the main problems is that the pressure measurements can be systematically wrong: for example, if the outdoor pressure measurement is made on the lee side of the building, the house-outdoor pressure difference will tend to be overestimated. [sent-32, score-1.325]

15 Here’s what I want: The actual pressure difference between the house and outdoors P_{ho} is normally distributed about the measured pressure Pmeas_{ho}, with uncorrelated errors with standard deviation of 2 Pascals. [sent-39, score-1.84]

16 (In practice the blower door operator sets a desired pressure, and the blower door adjusts its flow automatically until the measured pressure matches the desired pressure). [sent-41, score-1.545]

17 The measured pressure difference between the garage and outdoors Pmeas_{go} is normally distributed about the actual pressure P_{go}, uncorrelated errors with s. [sent-42, score-1.87]

18 The error in the house-garage pressure difference is the difference between the P_{ho} error and the P_{go} error. [sent-45, score-0.8]

19 The measured flow Qmeas_{ho} is normally distributed about the actual Q_{ho} with error 20 cubic feet per minute. [sent-46, score-0.818]

20 The actual value of Q_{ho} that is predicted from the right side of either of the equations in the graphic has normal error with standard deviation 20 cubic feet per minute. [sent-47, score-0.588]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ho', 0.409), ('pressure', 0.362), ('outdoors', 0.355), ('garage', 0.26), ('house', 0.258), ('flow', 0.248), ('flows', 0.233), ('blower', 0.173), ('door', 0.145), ('jags', 0.143), ('difference', 0.122), ('measured', 0.11), ('equation', 0.1), ('error', 0.097), ('bugs', 0.095), ('air', 0.092), ('actual', 0.079), ('exponents', 0.079), ('normally', 0.078), ('desired', 0.075), ('distributed', 0.075), ('ij', 0.074), ('cubic', 0.069), ('measurements', 0.068), ('fan', 0.067), ('uncorrelated', 0.067), ('values', 0.065), ('feet', 0.062), ('describe', 0.056), ('amount', 0.054), ('equations', 0.054), ('go', 0.053), ('code', 0.052), ('graphic', 0.05), ('side', 0.049), ('parameters', 0.048), ('deviation', 0.047), ('thanks', 0.043), ('predicted', 0.042), ('post', 0.041), ('analog', 0.039), ('rms', 0.039), ('conservation', 0.039), ('hg', 0.039), ('sped', 0.039), ('toxic', 0.039), ('either', 0.039), ('sets', 0.039), ('mentioned', 0.038), ('chemicals', 0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

Introduction: Corrected equation                 This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that! Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemic

2 0.13176902 1290 andrew gelman stats-2012-04-30-I suppose it’s too late to add Turing’s run-around-the-house-chess to the 2012 London Olympics?

Introduction: Daniel Murrell writes: I see you have a blog post about turing chess . . . I’ve seen another reference to it but am unable to find a definitive source. Do you know of a source where I could find out about the history of the idea? My reply: You mean the run-around-the-house thing? I don’t know where it comes from. It’s a well known story, if you google Turing chess run around the house you can find lots of references but I don’t know the definitive source. I can blog and see if anything comes up! I’ve never actually played the game. I’ll try it outdoors sometime, perhaps. When I last posted on the topic, we had a fun discussion, revealing that the rules are not as clear as one might think. It makes me wonder if anyone’s thought hard about it and come up with a good set of “official rules.” Any thoughts?

3 0.10518617 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

Introduction: David Shor writes: I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components: 1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055) 2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06) Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I’m not sure what the correct way to “blend” these models would be, either for individual or top-line seat estimates. In the mean-time, I’m just creating variance-weighted avera

4 0.10090533 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

Introduction: Antti Rasinen writes: I’m a former undergrad machine learning student and a current software engineer with a Bayesian hobby. Today my two worlds collided. I ask for some enlightenment. On your blog you’ve repeatedly advocated continuous distributions with Bayesian models. Today I read this article by Ricky Ho, who writes: The strength of Bayesian network is it is highly scalable and can learn incrementally because all we do is to count the observed variables and update the probability distribution table. Similar to Neural Network, Bayesian network expects all data to be binary, categorical variable will need to be transformed into multiple binary variable as described above. Numeric variable is generally not a good fit for Bayesian network. The last sentence seems to be at odds with what you’ve said. Sadly, I don’t have enough expertise to say which view of the world is correct. During my undergrad years our team wrote an implementation of the Junction Tree algorithm. We r

5 0.099780343 1045 andrew gelman stats-2011-12-07-Martyn Plummer’s Secret JAGS Blog

Introduction: Martyn Plummer , the creator of the open-source, C++, graphical-model compiler JAGS (aka “Just Another Gibbs Sampler”), runs a forum on the JAGS site that has a very similar feel to the mail-bag posts on this blog. Martyn answers general statistical computing questions (e.g., why slice sampling rather than Metropolis-Hastings?) and general modeling (e.g., why won’t my model converge with this prior?). Here’s the link to the top-level JAGS site, and to the forum: JAGS Forum JAGS Home Page The forum’s pretty active, with the stats page showing hundreds of views per day and very regular posts and answers. Martyn’s last post was today. Martyn also has a blog devoted to JAGS and other stats news: JAGS News Blog

6 0.099469796 55 andrew gelman stats-2010-05-27-In Linux, use jags() to call Jags instead of using bugs() to call OpenBugs

7 0.09877336 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing

8 0.096725114 463 andrew gelman stats-2010-12-11-Compare p-values from privately funded medical trials to those in publicly funded research?

9 0.094818883 158 andrew gelman stats-2010-07-22-Tenants and landlords

10 0.088407941 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November

11 0.08655183 1782 andrew gelman stats-2013-03-30-“Statistical Modeling: A Fresh Approach”

12 0.085715033 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

13 0.08456897 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

14 0.076779217 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

15 0.073012635 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

16 0.069193572 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

17 0.067723759 1573 andrew gelman stats-2012-11-11-Incredibly strange spam

18 0.067490429 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

19 0.06730286 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

20 0.066676699 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.116), (1, 0.016), (2, 0.033), (3, 0.006), (4, 0.045), (5, 0.005), (6, 0.02), (7, -0.036), (8, 0.027), (9, -0.017), (10, 0.007), (11, 0.027), (12, 0.008), (13, -0.023), (14, -0.033), (15, 0.007), (16, -0.012), (17, 0.005), (18, 0.013), (19, 0.007), (20, -0.001), (21, 0.049), (22, 0.005), (23, -0.002), (24, 0.004), (25, 0.022), (26, 0.007), (27, -0.007), (28, -0.002), (29, -0.0), (30, 0.017), (31, 0.017), (32, -0.015), (33, -0.014), (34, 0.033), (35, -0.008), (36, -0.027), (37, -0.011), (38, 0.01), (39, -0.004), (40, -0.045), (41, -0.028), (42, -0.012), (43, 0.036), (44, -0.027), (45, 0.004), (46, 0.023), (47, 0.009), (48, 0.017), (49, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95787573 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

Introduction: Corrected equation                 This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that! Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemic

2 0.77934778 245 andrew gelman stats-2010-08-31-Predicting marathon times

Introduction: Frank Hansen writes: I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited. So I decided to do some analysis of my own. I was going to put together a web page where people could get their race time predictions, maybe sell some ads for sports gps watches, but it might also be publishable. I have 2 requests which obviously I don’t want you to spend more than a few seconds on. 1. I was wondering if you knew of any sports performance researchers working on performance of not just elite athletes, but the full range of runners. 2. Can you suggest a way to do multilevel modeling of this. There are several natural subsets for the data but it’s not obvious what makes sense. I describe the data below. 3. Phil (the runner/co-blogger who posted about weight loss) might be interested. I collected race results for the Chicago marathon and 3

3 0.74051768 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

Introduction: David Shor writes: I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components: 1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055) 2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06) Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I’m not sure what the correct way to “blend” these models would be, either for individual or top-line seat estimates. In the mean-time, I’m just creating variance-weighted avera

4 0.71521926 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i

5 0.70822388 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

6 0.70019507 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

7 0.69842917 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

8 0.69175577 938 andrew gelman stats-2011-10-03-Comparing prediction errors

9 0.68059093 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

10 0.67673707 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

11 0.67382145 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

12 0.66854042 273 andrew gelman stats-2010-09-13-Update on marathon statistics

13 0.66264135 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

14 0.6580081 401 andrew gelman stats-2010-11-08-Silly old chi-square!

15 0.65623033 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

16 0.65596628 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

17 0.65262949 818 andrew gelman stats-2011-07-23-Parallel JAGS RNGs

18 0.6498087 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

19 0.64591539 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

20 0.64506239 1918 andrew gelman stats-2013-06-29-Going negative


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.011), (16, 0.081), (24, 0.125), (36, 0.029), (40, 0.01), (65, 0.028), (76, 0.019), (86, 0.012), (87, 0.011), (95, 0.264), (99, 0.231)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97103405 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics

Introduction: Now here’s a foundation I (Bob) can get behind: Foundation for Open Access Statistics (FOAS) Their mission is to “promote free software, open access publishing, and reproducible research in statistics.” To me, that’s like supporting motherhood and apple pie ! FOAS spun out of and is partially designed to support the Journal of Statistical Software (aka JSS , aka JStatSoft ). I adore JSS because it (a) is open access, (b) publishes systems papers on statistical software, (c) has fast reviewing turnaround times, and (d) is free for authors and readers. One of the next items on my to-do list is to write up the Stan modeling language and submit it to JSS . As a not-for-profit with no visible source of income, they are quite sensibly asking for donations (don’t complain — it beats $3K author fees or not being able to read papers).

2 0.9638592 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

Introduction: Greg Kaplan writes: I noticed that you have blogged a little about interstate migration trends in the US, and thought that you might be interested in a new working paper of mine (joint with Sam Schulhofer-Wohl from the Minneapolis Fed) which I have attached. Briefly, we show that much of the recent reported drop in interstate migration is a statistical artifact: The Census Bureau made an undocumented change in its imputation procedures for missing data in 2006, and this change significantly reduced the number of imputed interstate moves. The change in imputation procedures — not any actual change in migration behavior — explains 90 percent of the reported decrease in interstate migration between the 2005 and 2006 Current Population Surveys, and 42 percent of the decrease between 2000 and 2010. I haven’t had a chance to give a serious look so could only make the quick suggestion to make the graphs smaller and put multiple graphs on a page, This would allow the reader to bett

3 0.9587934 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved

Introduction: When I first saw this graphic, I thought “boy, that’s great, sometimes the graphic practically makes itself.” Normally it’s hard to use lots of different colors to differentiate items of interest, because there’s usually not an intuitive mapping between color and item (e.g. for countries, or states, or whatever). But the colors of crayons, what could be more perfect? So this graphic seemed awesome. But, as they discovered after some experimentation at datapointed.net there is an even BETTER possibility here. Click the link to see. Crayola Crayon colors by year

4 0.95697266 1973 andrew gelman stats-2013-08-08-For chrissake, just make up an analysis already! We have a lab here to run, y’know?

Introduction: Ben Hyde sends along this : Stuck in the middle of the supplemental data, reporting the total workup for their compounds, was this gem: Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis . . . I’m reminded of our recent discussions of coauthorship, where I argued that I see real advantages to having multiple people taking responsibility for the result. Jay Verkuilen responded: “On the flipside of collaboration . . . is diffusion of responsibility, where everybody thinks someone else ‘has that problem’ and thus things don’t get solved.” That’s what seems to have happened (hilariously) here.

5 0.94986588 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

Introduction: Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.

6 0.93553567 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

7 0.93293542 876 andrew gelman stats-2011-08-28-Vaguely related to the coke-dumping story

same-blog 8 0.92587328 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

9 0.90599751 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America

10 0.89929897 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

11 0.89901388 1308 andrew gelman stats-2012-05-08-chartsnthings !

12 0.88930261 266 andrew gelman stats-2010-09-09-The future of R

13 0.88295102 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

14 0.86323118 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

15 0.85713953 1595 andrew gelman stats-2012-11-28-Should Harvard start admitting kids at random?

16 0.84614861 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally

17 0.84311545 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

18 0.84191871 520 andrew gelman stats-2011-01-17-R Advertised

19 0.84037125 1646 andrew gelman stats-2013-01-01-Back when fifty years was a long time ago

20 0.83992064 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?