andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-906 knowledge-graph by maker-knowledge-mining

906 andrew gelman stats-2011-09-14-Another day, another stats postdoc


meta infos for this blog

Source: html

Introduction: This post is from Phil Price.  I work in the Environmental Energy Technologies Division at Lawrence Berkeley National Laboratory, and I am looking for a postdoc who knows substantially more than I do about time-series modeling; in practice this probably means someone whose dissertation work involved that sort of thing.  The work involves developing models to predict and/or forecast the time-dependent energy use in buildings, given historical data and some covariates such as outdoor temperature.  Simple regression approaches (e.g. using time-of-week indicator variables, plus outdoor temperature) work fine for a lot of things, but we still have a variety of problems.  To give one example, sometimes building behavior changes — due to retrofits, or a change in occupant behavior — so that a single model won’t fit well over a long time period. We want to recognize these changes automatically .  We have many other issues besides: heteroskedasticity, need for good uncertainty estimates, abilit


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I work in the Environmental Energy Technologies Division at Lawrence Berkeley National Laboratory, and I am looking for a postdoc who knows substantially more than I do about time-series modeling; in practice this probably means someone whose dissertation work involved that sort of thing. [sent-2, score-1.066]

2 The work involves developing models to predict and/or forecast the time-dependent energy use in buildings, given historical data and some covariates such as outdoor temperature. [sent-3, score-1.257]

3 using time-of-week indicator variables, plus outdoor temperature) work fine for a lot of things, but we still have a variety of problems. [sent-6, score-0.857]

4 To give one example, sometimes building behavior changes — due to retrofits, or a change in occupant behavior — so that a single model won’t fit well over a long time period. [sent-7, score-0.647]

5 We have many other issues besides: heteroskedasticity, need for good uncertainty estimates, ability to partially pool information from different buildings, and so on. [sent-9, score-0.501]

6 Some knowledge of engineering, physics, or related fields would be a plus, but really I just need someone who knows about ARIMA and ARCH and all that jazz and is willing to learn the rest. [sent-10, score-0.723]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('outdoor', 0.318), ('buildings', 0.265), ('energy', 0.203), ('plus', 0.191), ('jazz', 0.182), ('arch', 0.172), ('heteroskedasticity', 0.172), ('knows', 0.166), ('changes', 0.161), ('behavior', 0.156), ('technologies', 0.154), ('dissertation', 0.147), ('lawrence', 0.132), ('laboratory', 0.132), ('besides', 0.127), ('indicator', 0.126), ('partially', 0.126), ('work', 0.125), ('postdoc', 0.123), ('pool', 0.118), ('covariates', 0.118), ('environmental', 0.115), ('substantially', 0.115), ('division', 0.112), ('temperature', 0.112), ('berkeley', 0.112), ('engineering', 0.109), ('forecast', 0.108), ('automatically', 0.103), ('someone', 0.103), ('involves', 0.103), ('developing', 0.1), ('phil', 0.098), ('variety', 0.097), ('historical', 0.097), ('need', 0.092), ('physics', 0.09), ('willing', 0.09), ('recognize', 0.09), ('fields', 0.09), ('approaches', 0.09), ('building', 0.089), ('website', 0.088), ('due', 0.085), ('ability', 0.085), ('predict', 0.085), ('whose', 0.083), ('apply', 0.081), ('uncertainty', 0.08), ('involved', 0.079)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc

Introduction: This post is from Phil Price.  I work in the Environmental Energy Technologies Division at Lawrence Berkeley National Laboratory, and I am looking for a postdoc who knows substantially more than I do about time-series modeling; in practice this probably means someone whose dissertation work involved that sort of thing.  The work involves developing models to predict and/or forecast the time-dependent energy use in buildings, given historical data and some covariates such as outdoor temperature.  Simple regression approaches (e.g. using time-of-week indicator variables, plus outdoor temperature) work fine for a lot of things, but we still have a variety of problems.  To give one example, sometimes building behavior changes — due to retrofits, or a change in occupant behavior — so that a single model won’t fit well over a long time period. We want to recognize these changes automatically .  We have many other issues besides: heteroskedasticity, need for good uncertainty estimates, abilit

2 0.20604686 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

Introduction: Bill Harris wrote in with a question: David Hogg points out in one of his general articles on data modeling that regression assumptions require one to put the variable with the highest variance in the ‘y’ position and the variable you know best (lowest variance) in the ‘x’ position. As he points out, others speak of independent and dependent variables, as if causality determined the form of a regression formula. In a quick scan of ARM and BDA, I don’t see clear advice, but I do see the use of ‘independent’ and ‘dependent.’ I recently did a model over data in which we know the ‘effect’ pretty well (we measure it), while we know the ’cause’ less well (it’s estimated by people who only need to get it approximately correct). A model of the form ’cause ~ effect’ fit visually much better than one of the form ‘effect ~ cause’, but interpreting it seems challenging. For a simplistic example, let the effect be energy use in a building for cooling (E), and let the cause be outdoor ai

3 0.15635616 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

Introduction: This post is by Phil. Psychologists perform experiments on Canadian undergraduate psychology students and draws conclusions that (they believe) apply to humans in general; they publish in Science. A drug company decides to embark on additional trials that will cost tens of millions of dollars based on the results of a careful double-blind study….whose patients are all volunteers from two hospitals. A movie studio holds 9 screenings of a new movie for volunteer viewers and, based on their survey responses, decides to spend another $8 million to re-shoot the ending.  A researcher interested in the effect of ventilation on worker performance conducts a months-long study in which ventilation levels are varied and worker performance is monitored…in a single building. In almost all fields of research, most studies are based on convenience samples, or on random samples from a larger population that is itself a convenience sample. The paragraph above gives just a few examples.  The benefit

4 0.14386088 1010 andrew gelman stats-2011-11-14-“Free energy” and economic resources

Introduction: By “free energy” I don’t mean perpetual motion machines, cars that run on water and get 200 mpg, or the latest cold-fusion hype. No, I’m referring to the term from physics. The free energy of a system is, roughly, the amount of energy that can be directly extracted from it. For example, a rock at room temperature is just full of energy—not just the energy locked in its nuclei, but basic thermal energy—but at room temperature you can’t extract any of it. To the physicists in the audience: Yes, I realize that free energy has a technical meaning in statistical mechanics and that my above definition is sloppy. Please bear with me. And, to the non-physicists: feel free to head to Wikipedia or a physics textbook for a more careful treatment. I was thinking about free energy the other day when hearing someone on the radio say something about China bailing out the E.U. I did a double-take. Huh? The E.U. is rich, China’s not so rich. How can a middle-income country bail out a

5 0.13773227 1383 andrew gelman stats-2012-06-18-Hierarchical modeling as a framework for extrapolation

Introduction: Phil recently posted on the challenge of extrapolation of inferences to new data. After telling the story of a colleague who flat-out refused to make predictions from his model of buildings to new data, Phil wrote, “This is an interesting problem because it is sort of outside the realm of statistics, and into some sort of meta-statistical area. How can you judge whether your results can be extrapolated to the ‘real world,’ if you cant get a real-world sample to compare to?” In reply, I wrote: I agree that this is an important and general problem, but I don’t think it is outside the realm of statistics! I think that one useful statistical framework here is multilevel modeling. Suppose you are applying a procedure to J cases and want to predict case J+1 (in this case, the cases are buildings and J=52). Let the parameters be theta_1,…,theta_{J+1}, with data y_1,…,y_{J+1}, and case-level predictors X_1,…,X_{J+1}. The question is how to generalize from (theta_1,…,theta_J) to theta_{

6 0.11782354 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

7 0.11016623 62 andrew gelman stats-2010-06-01-Two Postdoc Positions Available on Bayesian Hierarchical Modeling

8 0.10905684 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

9 0.098403871 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

10 0.094438016 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

11 0.093733884 2296 andrew gelman stats-2014-04-19-Index or indicator variables

12 0.092235737 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections

13 0.090233162 1605 andrew gelman stats-2012-12-04-Write This Book

14 0.085700259 1961 andrew gelman stats-2013-07-29-Postdocs in probabilistic modeling! With David Blei! And Stan!

15 0.085210077 936 andrew gelman stats-2011-10-02-Covariate Adjustment in RCT - Model Overfitting in Multilevel Regression

16 0.084641367 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon

17 0.08199282 1295 andrew gelman stats-2012-05-02-Selection bias, or, How you can think the experts don’t check their models, if you simply don’t look at what the experts actually are doing

18 0.08167471 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?

19 0.07901679 1501 andrew gelman stats-2012-09-18-More studies on the economic effects of climate change

20 0.078873433 1076 andrew gelman stats-2011-12-21-Derman, Rodrik and the nature of statistical models


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.163), (1, 0.03), (2, 0.008), (3, 0.018), (4, 0.05), (5, 0.051), (6, -0.015), (7, -0.041), (8, 0.049), (9, 0.056), (10, 0.002), (11, 0.007), (12, 0.012), (13, -0.038), (14, -0.076), (15, 0.016), (16, 0.034), (17, -0.016), (18, 0.037), (19, -0.013), (20, 0.0), (21, 0.016), (22, -0.009), (23, 0.038), (24, 0.023), (25, -0.03), (26, 0.027), (27, -0.066), (28, 0.017), (29, 0.036), (30, 0.06), (31, 0.01), (32, 0.011), (33, -0.03), (34, 0.003), (35, -0.039), (36, 0.013), (37, 0.001), (38, 0.002), (39, -0.036), (40, -0.024), (41, 0.002), (42, -0.034), (43, -0.021), (44, -0.004), (45, 0.0), (46, -0.079), (47, 0.023), (48, -0.024), (49, 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95953077 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc

Introduction: This post is from Phil Price.  I work in the Environmental Energy Technologies Division at Lawrence Berkeley National Laboratory, and I am looking for a postdoc who knows substantially more than I do about time-series modeling; in practice this probably means someone whose dissertation work involved that sort of thing.  The work involves developing models to predict and/or forecast the time-dependent energy use in buildings, given historical data and some covariates such as outdoor temperature.  Simple regression approaches (e.g. using time-of-week indicator variables, plus outdoor temperature) work fine for a lot of things, but we still have a variety of problems.  To give one example, sometimes building behavior changes — due to retrofits, or a change in occupant behavior — so that a single model won’t fit well over a long time period. We want to recognize these changes automatically .  We have many other issues besides: heteroskedasticity, need for good uncertainty estimates, abilit

2 0.75625116 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

Introduction: Bill Harris wrote in with a question: David Hogg points out in one of his general articles on data modeling that regression assumptions require one to put the variable with the highest variance in the ‘y’ position and the variable you know best (lowest variance) in the ‘x’ position. As he points out, others speak of independent and dependent variables, as if causality determined the form of a regression formula. In a quick scan of ARM and BDA, I don’t see clear advice, but I do see the use of ‘independent’ and ‘dependent.’ I recently did a model over data in which we know the ‘effect’ pretty well (we measure it), while we know the ’cause’ less well (it’s estimated by people who only need to get it approximately correct). A model of the form ’cause ~ effect’ fit visually much better than one of the form ‘effect ~ cause’, but interpreting it seems challenging. For a simplistic example, let the effect be energy use in a building for cooling (E), and let the cause be outdoor ai

3 0.7169255 245 andrew gelman stats-2010-08-31-Predicting marathon times

Introduction: Frank Hansen writes: I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited. So I decided to do some analysis of my own. I was going to put together a web page where people could get their race time predictions, maybe sell some ads for sports gps watches, but it might also be publishable. I have 2 requests which obviously I don’t want you to spend more than a few seconds on. 1. I was wondering if you knew of any sports performance researchers working on performance of not just elite athletes, but the full range of runners. 2. Can you suggest a way to do multilevel modeling of this. There are several natural subsets for the data but it’s not obvious what makes sense. I describe the data below. 3. Phil (the runner/co-blogger who posted about weight loss) might be interested. I collected race results for the Chicago marathon and 3

4 0.67897135 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of

5 0.67441249 969 andrew gelman stats-2011-10-22-Researching the cost-effectiveness of political lobbying organisations

Introduction: Sally Murray from Giving What We Can writes: We are an organisation that assesses different charitable (/fundable) interventions, to estimate which are the most cost-effective (measured in terms of the improvement of life for people in developing countries gained for every dollar invested). Our research guides and encourages greater donations to the most cost-effective charities we thus identify, and our members have so far pledged a total of $14m to these causes, with many hundreds more relying on our advice in a less formal way. I am specifically researching the cost-effectiveness of political lobbying organisations. We are initially focusing on organisations that lobby for ‘big win’ outcomes such as increased funding of the most cost-effective NTD treatments/ vaccine research, changes to global trade rules (potentially) and more obscure lobbies such as “Keep Antibiotics Working”. We’ve a great deal of respect for your work and the superbly rational way you go about it, and

6 0.67204988 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm

7 0.66318333 1261 andrew gelman stats-2012-04-12-The Naval Research Lab

8 0.66181153 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

9 0.66116583 938 andrew gelman stats-2011-10-03-Comparing prediction errors

10 0.66098803 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

11 0.66039038 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

12 0.65816981 1395 andrew gelman stats-2012-06-27-Cross-validation (What is it good for?)

13 0.65733075 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

14 0.65156811 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

15 0.65094954 421 andrew gelman stats-2010-11-19-Just chaid

16 0.64133281 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

17 0.63105661 268 andrew gelman stats-2010-09-10-Fighting Migraine with Multilevel Modeling

18 0.62836719 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”

19 0.62754655 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

20 0.62724435 397 andrew gelman stats-2010-11-06-Multilevel quantile regression


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.146), (9, 0.048), (15, 0.011), (16, 0.05), (21, 0.01), (24, 0.121), (31, 0.013), (53, 0.031), (56, 0.035), (76, 0.017), (82, 0.028), (86, 0.033), (89, 0.022), (99, 0.344)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97111017 664 andrew gelman stats-2011-04-16-Dilbert update: cartooning can give you the strength to open jars with your bare hands

Introduction: We were having so much fun on this thread that I couldn’t resist linking to this news item by Adrian Chen. The good news is that Scott Adams (creater of the Dilbert comic strip) “has a certified genius IQ” and that he “can open jars with [his] bare hands.” He is also “able to lift heavy objects.” Cool! In all seriousness, I knew nothing about this aspect of Adams when I wrote the earlier blog. I was just surprised (and remain surprised) that he was so impressed with Charlie Sheen for being good-looking and being able to remember his lines. At the time I thought it was just a matter of Adams being overly-influenced by his direct experience, along with some satisfaction in separating himself from the general mass of Sheen-haters out there. But now I wonder if something more is going on, that maybe he feels that he and Sheen are on the same side in a culture war. In any case, the ultimate topic of interest here is not Sheen or Adams but rather more general questions of what

2 0.97029233 973 andrew gelman stats-2011-10-26-Antman again courts controversy

Introduction: Commenter Zbicyclist links to a fun article by Howard French on biologist E. O. Wilson: Wilson announced that his new book may be his last. It is not limited to the discussion of evolutionary biology, but ranges provocatively through the humanities, as well. . . . Generation after generation of students have suffered trying to “puzzle out” what great thinkers like Socrates, Plato, and Descartes had to say on the great questions of man’s nature, Wilson said, but this was of little use, because philosophy has been based on “failed models of the brain.” This reminds me of my recent remarks on the use of crude folk-psychology models as microfoundations for social sciences. The article also discusses Wilson’s recent crusade against selfish-gene-style simplifications of human and animal nature. I’m with Wilson 100% on this one. “Two brothers or eight cousins” is a cute line but it doesn’t seem to come close to describing how species or societies work, and it’s always seemed a

3 0.96227288 525 andrew gelman stats-2011-01-19-Thiel update

Introduction: A year or so ago I discussed the reasoning of zillionaire financier Peter Thiel, who seems to believe his own hype and, worse, seems to be able to convince reporters of his infallibility as well. Apparently he “possesses a preternatural ability to spot patterns that others miss.” More recently, Felix Salmon commented on Thiel’s financial misadventures: Peter Thiel’s hedge fund, Clarium Capital, ain’t doing so well. Its assets under management are down 90% from their peak, and total returns from the high point are -65%. Thiel is smart, successful, rich, well-connected, and on top of all that his calls have actually been right . . . None of that, clearly, was enough for Clarium to make money on its trades: the fund was undone by volatility and weakness in risk management. There are a few lessons to learn here. Firstly, just because someone is a Silicon Valley gazillionaire, or any kind of successful entrepreneur for that matter, doesn’t mean they should be trusted with oth

same-blog 4 0.95915794 906 andrew gelman stats-2011-09-14-Another day, another stats postdoc

Introduction: This post is from Phil Price.  I work in the Environmental Energy Technologies Division at Lawrence Berkeley National Laboratory, and I am looking for a postdoc who knows substantially more than I do about time-series modeling; in practice this probably means someone whose dissertation work involved that sort of thing.  The work involves developing models to predict and/or forecast the time-dependent energy use in buildings, given historical data and some covariates such as outdoor temperature.  Simple regression approaches (e.g. using time-of-week indicator variables, plus outdoor temperature) work fine for a lot of things, but we still have a variety of problems.  To give one example, sometimes building behavior changes — due to retrofits, or a change in occupant behavior — so that a single model won’t fit well over a long time period. We want to recognize these changes automatically .  We have many other issues besides: heteroskedasticity, need for good uncertainty estimates, abilit

5 0.95905203 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James

Introduction: Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it ! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about. Here’s how it begins: I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that

6 0.95835638 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”

7 0.9537968 581 andrew gelman stats-2011-02-19-“The best living writer of thrillers”

8 0.94957477 657 andrew gelman stats-2011-04-11-Note to Dilbert: The difference between Charlie Sheen and Superman is that the Man of Steel protected Lois Lane, he didn’t bruise her

9 0.94570905 272 andrew gelman stats-2010-09-13-Ross Ihaka to R: Drop Dead

10 0.94402134 1665 andrew gelman stats-2013-01-10-That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

11 0.94330442 1449 andrew gelman stats-2012-08-08-Gregor Mendel’s suspicious data

12 0.93550795 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

13 0.93340456 1419 andrew gelman stats-2012-07-17-“Faith means belief in something concerning which doubt is theoretically possible.” — William James

14 0.93289477 611 andrew gelman stats-2011-03-14-As the saying goes, when they argue that you’re taking over, that’s when you know you’ve won

15 0.93219185 738 andrew gelman stats-2011-05-30-Works well versus well understood

16 0.93097913 541 andrew gelman stats-2011-01-27-Why can’t I be more like Bill James, or, The use of default and default-like models

17 0.92613763 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

18 0.92596895 2078 andrew gelman stats-2013-10-26-“The Bayesian approach to forensic evidence”

19 0.92581338 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

20 0.92333132 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”