andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-580 knowledge-graph by maker-knowledge-mining

580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark


meta infos for this blog

Source: html

Introduction: WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev .


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev . [sent-1, score-1.099]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('jure', 0.467), ('historic', 0.395), ('quantiles', 0.385), ('colorful', 0.368), ('observation', 0.301), ('wonderful', 0.273), ('prediction', 0.225), ('predictors', 0.221), ('via', 0.206), ('multiple', 0.178), ('data', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

Introduction: WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev .

2 0.14107378 752 andrew gelman stats-2011-06-08-Traffic Prediction

Introduction: I always thought predicting traffic for a particular day and time would be something easily predicted from historic data with regression. Google Maps now has this feature: It would be good to actually include season, holiday and similar information: the predictions would be better. I wonder if one can find this data easily, or if others have done this work before.

3 0.113116 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

Introduction: Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no need to add any additional penalty to correct for multiple comparisons. In my case I do not have hierarchically structured data—i.e. I have only 1 observation per group but have a categorical variable with a large number of categories. Thus, I am fitting a simple multiple regression in a Bayesian framework. Would putting a strong, mean 0, multivariate normal prior on the betas in this model accomplish the same sort of shrinkage (it seems to me that it would) and do you believe this is a valid way to address criticism of multiple comparisons in this setting? My reply: Yes, I think this makes sense. One way to address concerns of multiple com

4 0.098302886 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

Introduction: Following up on our discussion from the other day, Angelika van der Linde sends along this paper from 2012 (link to journal here ). And Aki pulls out this great quote from Geisser and Eddy (1979): This discussion makes clear that in the nested case this method, as Akaike’s, is not consistent; i.e., even if $M_k$ is true, it will be rejected with probability $\alpha$ as $N\to\infty$. This point is also made by Schwarz (1978). However, from the point of view of prediction, this is of no great consequence. For large numbers of observations, a prediction based on the falsely assumed $M_k$, will not differ appreciably from one based on the true $M_k$. For example, if we assert that two normal populations have different means when in fact they have the same mean, then the use of the group mean as opposed to the grand mean for predicting a future observation results in predictors which are asymptotically equivalent and whose predictive variances are $\sigma^2[1 + (1/2n)]$ and $\si

5 0.089620888 2092 andrew gelman stats-2013-11-07-Data visualizations gone beautifully wrong

Introduction: Jeremy Fox points us to this compilation of data visualizations in R that went wrong, in a way that ended up making them look like art. They are indeed wonderful.

6 0.078466132 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

7 0.077568993 1115 andrew gelman stats-2012-01-12-Where are the larger-than-life athletes?

8 0.075306922 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

9 0.073178068 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

10 0.070975125 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

11 0.069677666 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

12 0.066300675 1954 andrew gelman stats-2013-07-24-Too Good To Be True: The Scientific Mass Production of Spurious Statistical Significance

13 0.061668061 374 andrew gelman stats-2010-10-27-No matter how famous you are, billions of people have never heard of you.

14 0.058615118 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

15 0.058223415 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

16 0.056289468 1881 andrew gelman stats-2013-06-03-Boot

17 0.055232961 77 andrew gelman stats-2010-06-09-Sof[t]

18 0.054342248 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

19 0.054287232 2038 andrew gelman stats-2013-09-25-Great graphs of names

20 0.053062953 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.032), (1, 0.032), (2, 0.017), (3, -0.006), (4, 0.03), (5, -0.006), (6, -0.017), (7, -0.017), (8, -0.001), (9, 0.035), (10, 0.017), (11, 0.014), (12, 0.007), (13, -0.002), (14, -0.012), (15, 0.015), (16, -0.011), (17, -0.02), (18, 0.016), (19, -0.005), (20, -0.005), (21, 0.036), (22, 0.019), (23, 0.008), (24, -0.017), (25, -0.006), (26, -0.007), (27, -0.002), (28, 0.006), (29, 0.004), (30, 0.018), (31, 0.01), (32, 0.013), (33, 0.036), (34, 0.012), (35, 0.02), (36, 0.034), (37, 0.011), (38, 0.002), (39, -0.004), (40, -0.019), (41, 0.0), (42, 0.002), (43, -0.029), (44, -0.002), (45, 0.019), (46, -0.0), (47, 0.001), (48, -0.003), (49, -0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91531807 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

Introduction: WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev .

2 0.62340975 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

Introduction: Vishnu Ganglani writes: It appears that multiple imputation appears to be the best way to impute missing data because of the more accurate quantification of variance. However, when imputing missing data for income values in national household surveys, would you recommend it would be practical to maintain the multiple datasets associated with multiple imputations, or a single imputation method would suffice. I have worked on household survey projects (in Scotland) and in the past gone with suggesting single methods for ease of implementation, but with the availability of open source R software I am think of performing multiple imputation methodologies, but a bit apprehensive because of the complexity and also the need to maintain multiple datasets (ease of implementation). My reply: In many applications I’ve just used a single random imputation to avoid the awkwardness of working with multiple datasets. But if there’s any concern, I’d recommend doing parallel analyses on multipl

3 0.58555448 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

Introduction: Aureliano Crameri writes: I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts. I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is

4 0.58511591 948 andrew gelman stats-2011-10-10-Combining data from many sources

Introduction: Mark Grote writes: I’d like to request general feedback and references for a problem of combining disparate data sources in a regression model. We’d like to model log crop yield as a function of environmental predictors, but the observations come from many data sources and are peculiarly structured. Among the issues are: 1. Measurement precision in predictors and outcome varies widely with data sources. Some observations are in very coarse units of measurement, due to rounding or even observer guesswork. 2. There are obvious clusters of observations arising from studies in which crop yields were monitored over successive years in spatially proximate communities. Thus some variables may be constant within clusters–this is true even for log yield, probably due to rounding of similar yields. 3. Cluster size and intra-cluster association structure (temporal, spatial or both) vary widely across the dataset. My [Grote's] intuition is that we can learn about central tendency

5 0.57354063 704 andrew gelman stats-2011-05-10-Multiple imputation and multilevel analysis

Introduction: Robert Birkelbach: I am writing my Bachelor Thesis in which I want to assess the reading competencies of German elementary school children using the PIRLS2006 data. My levels are classrooms and the individuals. However, my dependent variable is a multiple imputed (m=5) reading test. The problem I have is, that I do not know, whether I can just calculate 5 linear multilevel models and then average all the results (the coefficients, standard deviation, bic, intra class correlation, R2, t-statistics, p-values etc) or if I need different formulas for integrating the results of the five models into one because it is a multilevel analysis? Do you think there’s a better way in solving my problem? I would greatly appreciate if you could help me with a problem regarding my analysis — I am quite a newbie to multilevel modeling and especially to multiple imputation. Also: Is it okay to use frequentist models when the multiple imputation was done bayesian? Would the different philosophies of sc

6 0.5727303 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

7 0.57126892 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

8 0.55956376 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

9 0.55554092 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

10 0.55485827 215 andrew gelman stats-2010-08-18-DataMarket

11 0.54924822 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

12 0.54258394 2294 andrew gelman stats-2014-04-17-If you get to the point of asking, just do it. But some difficulties do arise . . .

13 0.52471918 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

14 0.52038801 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

15 0.51481616 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

16 0.50941873 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

17 0.50708801 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

18 0.49810743 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

19 0.49398097 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

20 0.49011648 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.137), (59, 0.499), (99, 0.126)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90283966 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark

Introduction: WeatherSpark : prediction and observation quantiles, historic data, multiple predictors, zoomable, draggable, colorful, wonderful: Via Jure Cuhalev .

2 0.85652447 403 andrew gelman stats-2010-11-09-Society for Industrial and Applied Mathematics startup-math meetup

Introduction: Chris Wiggins sends along this . It’s a meetup at Davis Auditorium, CEPSR Bldg, Columbia University, on Wed 10 Nov (that’s tomorrow! or maybe today! depending on when you’re reading this), 6-8pm.

3 0.69403827 1716 andrew gelman stats-2013-02-09-iPython Notebook

Introduction: Burak Bayramli writes: I wanted to inform you on iPython Notebook technology – allowing markup, Python code to reside in one document. Someone ported one of your examples from ARM . iPynb file is actually a live document, can be downloaded and reran locally, hence change of code on document means change of images, results. Graphs (as well as text output) which are generated by the code, are placed inside the document automatically. No more referencing image files seperately. For now running notebooks locally require a notebook server, but that part can live “on the cloud” as part of an educational software. Viewers, such as nbviewer.ipython.org, do not even need that much, since all recent results of a notebook are embedded in the notebook itself. A lot of people are excited about this; Also out of nowhere, Alfred P. Sloan Foundation dropped a $1.15 million grant on the developers of ipython which provided some extra energy on the project. Cool. We’ll have to do that ex

4 0.64690411 214 andrew gelman stats-2010-08-17-Probability-processing hardware

Introduction: Lyric Semiconductor posted: For over 60 years, computers have been based on digital computing principles. Data is represented as bits (0s and 1s). Boolean logic gates perform operations on these bits. A processor steps through many of these operations serially in order to perform a function. However, today’s most interesting problems are not at all suited to this approach. Here at Lyric Semiconductor, we are redesigning information processing circuits from the ground up to natively process probabilities: from the gate circuits to the processor architecture to the programming language. As a result, many applications that today require a thousand conventional processors will soon run in just one Lyric processor, providing 1,000x efficiencies in cost, power, and size. Om Malik has some more information, also relating to the team and the business. The fundamental idea is that computing architectures work deterministically, even though the world is fundamentally stochastic.

5 0.59936243 853 andrew gelman stats-2011-08-14-Preferential admissions for children of elite colleges

Introduction: Jenny Anderson reports on a discussion of the practice of colleges preferential admission of children of alumni: [Richard] Kahlenberg citing research from his book “Affirmative Action for the Rich: Legacy Preferences in College Admissions” made the case that getting into good schools matters — 12 institutions making up less than 1 percent of the U.S. population produced 42 percent of government leaders and 54 percent of corporate leaders. And being a legacy helps improve an applicant’s chances of getting in, with one study finding that being a primary legacy — the son or daughter of an undergraduate alumnus or alumna — increases one’s chance of admission by 45.1 percent. I’d call that 45 percent but I get the basic idea. But then Jeffrey Brenzel of the Yale admissions office replied: “We turn away 80 percent of our legacies, and we feel it every day,” Mr. Brenzel said, adding that he rejected more offspring of the school’s Sterling donors than he accepted this year (

6 0.56951892 763 andrew gelman stats-2011-06-13-Inventor of Connect Four dies at 91

7 0.54278386 1599 andrew gelman stats-2012-11-30-“The scientific literature must be cleansed of everything that is fraudulent, especially if it involves the work of a leading academic”

8 0.53965664 34 andrew gelman stats-2010-05-14-Non-academic writings on literature

9 0.50712603 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?

10 0.50205839 965 andrew gelman stats-2011-10-19-Web-friendly visualizations in R

11 0.48237896 1380 andrew gelman stats-2012-06-15-Coaching, teaching, and writing

12 0.47549534 229 andrew gelman stats-2010-08-24-Bizarre twisty argument about medical diagnostic tests

13 0.46138942 199 andrew gelman stats-2010-08-11-Note to semi-spammers

14 0.43723124 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

15 0.43619907 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others

16 0.43088305 1235 andrew gelman stats-2012-03-29-I’m looking for a quadrille notebook with faint lines

17 0.42627156 517 andrew gelman stats-2011-01-14-Bayes in China update

18 0.4244248 1185 andrew gelman stats-2012-02-26-A statistician’s rants and raves

19 0.42246145 771 andrew gelman stats-2011-06-16-30 days of statistics

20 0.41533345 1415 andrew gelman stats-2012-07-13-Retractions, retractions: “left-wing enough to not care about truth if it confirms their social theories, right-wing enough to not care as long as they’re getting paid enough”