andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-215 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: It seems that every day brings a better system for exploring and sharing data on the Internet. From Iceland comes DataMarket . DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. Here’s an example of visualizing rankings of countries competing in WorldCup: And here’s a lovely example of visualizing population pyramids : In the future, the visualizations will also include state of the art models for predicting and imputing missing data, and understanding the underlying mechanisms. Other posts: InfoChimps , Future of Data Analysis
sentIndex sentText sentNum sentScore
1 It seems that every day brings a better system for exploring and sharing data on the Internet. [sent-1, score-0.807]
2 DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. [sent-3, score-1.231]
wordName wordTfidf (topN-words)
[('visualizing', 0.482), ('datamarket', 0.43), ('infochimps', 0.194), ('animation', 0.194), ('iceland', 0.187), ('lovely', 0.187), ('future', 0.178), ('rankings', 0.169), ('imputing', 0.166), ('visualizations', 0.14), ('exploring', 0.14), ('brings', 0.135), ('sharing', 0.134), ('competing', 0.133), ('hasn', 0.132), ('datasets', 0.126), ('art', 0.123), ('interaction', 0.12), ('access', 0.12), ('market', 0.116), ('developed', 0.114), ('aspect', 0.113), ('predicting', 0.113), ('countries', 0.107), ('posts', 0.105), ('underlying', 0.097), ('missing', 0.091), ('data', 0.088), ('system', 0.081), ('population', 0.081), ('yet', 0.08), ('individual', 0.079), ('understanding', 0.078), ('include', 0.076), ('although', 0.076), ('state', 0.072), ('example', 0.071), ('day', 0.071), ('comes', 0.071), ('every', 0.067), ('models', 0.054), ('better', 0.047), ('analysis', 0.046), ('seems', 0.044), ('good', 0.035), ('also', 0.028)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 215 andrew gelman stats-2010-08-18-DataMarket
Introduction: It seems that every day brings a better system for exploring and sharing data on the Internet. From Iceland comes DataMarket . DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. Here’s an example of visualizing rankings of countries competing in WorldCup: And here’s a lovely example of visualizing population pyramids : In the future, the visualizations will also include state of the art models for predicting and imputing missing data, and understanding the underlying mechanisms. Other posts: InfoChimps , Future of Data Analysis
2 0.28351021 1806 andrew gelman stats-2013-04-16-My talk in Chicago this Thurs 6:30pm
Introduction: Choices in Visualizing Data This time, it’s not at the university, it’s at a data science meetup. Here are the slides . I actually prefer the term “statistical graphics” or “visualizing quantitative information” rather than “visualizing data.” I spend a lot of time graphing inferences and fitted models, understanding my fits and doing exploratory model analysis. Graphs aren’t just for raw data. P.S. Mike Stringer, who prepared the blurb for my talk at the above link, wrote that ARM “has the most understandable description of causal inference I’ve ever read.” I appreciate the compliment, but, to be fair, Jennifer deserves most of the credit for the causal chapters of that book.
3 0.15300326 1991 andrew gelman stats-2013-08-21-BDA3 table of contents (also a new paper on visualization)
Introduction: In response to our recent posting of Amazon’s offer of Bayesian Data Analysis 3rd edition at 40% off, some people asked what was in this new edition, with more information beyond the beautiful cover image and the brief paragraph I’d posted earlier. Here’s the table of contents. The following sections have all-new material: 1.4 New introduction of BDA principles using a simple spell checking example 2.9 Weakly informative prior distributions 5.7 Weakly informative priors for hierarchical variance parameters 7.1-7.4 Predictive accuracy for model evaluation and comparison 10.6 Computing environments 11.4 Split R-hat 11.5 New measure of effective number of simulation draws 13.7 Variational inference 13.8 Expectation propagation 13.9 Other approximations 14.6 Regularization for regression models C.1 Getting started with R and Stan C.2 Fitting a hierarchical model in Stan C.4 Programming Hamiltonian Monte Carlo in R And the new chapters: 20 Basis function models 2
4 0.14546852 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
Introduction: Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). Do you know any books that might be good for such a course? I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. Any suggestions?
5 0.13896164 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
Introduction: Adrien Friggeri has a lovely visualization of US Senators movement between clusters: You have to click the image and play with it to appreciate it. The methodology isn’t yet published – but I can see how this could be very illuminating. The dynamic clustering aspect hasn’t been researched much – one of the notable pieces is the Blei and Lafferty dynamic topic model of Science . I did a static analysis of the US Senate back in 2005 with Wray Buntine and coauthors. Some additional visualizations and the source code are here . We did a dynamic analysis of US Supreme Court on this blog but there’s also a paper . My knowledge on this topic is out of date, however. Who has been doing good work in this area? I’ll organize the links. [added 4/29/12, via Edo Airoldi ]: Visualizing the Evolution of Community Structures in Dynamic Social Networks by Khairi Reda et al (2011) [ PDF ]. [added 4/29/12, via Allen Riddell ] Joint Analysis of Time-Evolving Binary Matrices an
6 0.13050668 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices
7 0.12052978 2240 andrew gelman stats-2014-03-10-On deck this week: Things people sent me
8 0.1156808 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data
9 0.10582734 1785 andrew gelman stats-2013-04-02-So much artistic talent
10 0.07889206 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
11 0.077331021 1426 andrew gelman stats-2012-07-23-Special effects
12 0.076768637 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys
13 0.072443821 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?
14 0.072398536 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
15 0.069727495 608 andrew gelman stats-2011-03-12-Single or multiple imputation?
16 0.069327191 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys
17 0.067640662 1604 andrew gelman stats-2012-12-04-An epithet I can live with
18 0.065013833 1003 andrew gelman stats-2011-11-11-$
19 0.064837418 192 andrew gelman stats-2010-08-08-Turning pages into data
20 0.063958727 1964 andrew gelman stats-2013-08-01-Non-topical blogging
topicId topicWeight
[(0, 0.086), (1, 0.013), (2, 0.006), (3, 0.018), (4, 0.045), (5, -0.001), (6, -0.054), (7, -0.012), (8, 0.006), (9, 0.037), (10, -0.011), (11, 0.015), (12, 0.007), (13, 0.02), (14, 0.007), (15, 0.014), (16, 0.02), (17, -0.03), (18, 0.011), (19, -0.006), (20, -0.003), (21, 0.007), (22, -0.011), (23, 0.014), (24, -0.015), (25, 0.026), (26, 0.005), (27, 0.008), (28, 0.057), (29, 0.034), (30, -0.023), (31, -0.023), (32, 0.034), (33, 0.021), (34, 0.03), (35, 0.039), (36, 0.002), (37, 0.018), (38, 0.015), (39, 0.046), (40, -0.027), (41, -0.015), (42, -0.013), (43, -0.007), (44, 0.036), (45, 0.029), (46, -0.002), (47, -0.049), (48, 0.006), (49, -0.066)]
simIndex simValue blogId blogTitle
same-blog 1 0.93125939 215 andrew gelman stats-2010-08-18-DataMarket
Introduction: It seems that every day brings a better system for exploring and sharing data on the Internet. From Iceland comes DataMarket . DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. Here’s an example of visualizing rankings of countries competing in WorldCup: And here’s a lovely example of visualizing population pyramids : In the future, the visualizations will also include state of the art models for predicting and imputing missing data, and understanding the underlying mechanisms. Other posts: InfoChimps , Future of Data Analysis
2 0.7077477 275 andrew gelman stats-2010-09-14-Data visualization at the American Evaluation Association
Introduction: Stephanie Evergreen writes: Media, web design, and marketing have all created an environment where stakeholders – clients, program participants, funders – all expect high quality graphics and reporting that effectively conveys the valuable insights from evaluation work. Some in statistics and mathematics have used data visualization strategies to support more useful reporting of complex ideas. Global growing interest in improving communications has begun to take root in the evaluation field as well. But as anyone who has sat through a day’s worth of a conference or had to endure a dissertation-worthy evaluation report knows, evaluators still have a long way to go. To support the development of researchers and evaluators, some members of the American Evaluation Association are proposing a new TIG (Topical Interest Group) on Data Visualization and Reporting. If you are a member of AEA (or want to be) and you are interested in joining this TIG, contact Stephanie Evergreen.
3 0.69783437 396 andrew gelman stats-2010-11-05-Journalism in the age of data
Introduction: Journalism in the age of data is a video report including interviews with many visualization people. It’s also a great example of how citations, and further information appear alongside with the video – showing us the future of video content online.
4 0.69267231 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
Introduction: Adrien Friggeri has a lovely visualization of US Senators movement between clusters: You have to click the image and play with it to appreciate it. The methodology isn’t yet published – but I can see how this could be very illuminating. The dynamic clustering aspect hasn’t been researched much – one of the notable pieces is the Blei and Lafferty dynamic topic model of Science . I did a static analysis of the US Senate back in 2005 with Wray Buntine and coauthors. Some additional visualizations and the source code are here . We did a dynamic analysis of US Supreme Court on this blog but there’s also a paper . My knowledge on this topic is out of date, however. Who has been doing good work in this area? I’ll organize the links. [added 4/29/12, via Edo Airoldi ]: Visualizing the Evolution of Community Structures in Dynamic Social Networks by Khairi Reda et al (2011) [ PDF ]. [added 4/29/12, via Allen Riddell ] Joint Analysis of Time-Evolving Binary Matrices an
5 0.68919969 1853 andrew gelman stats-2013-05-12-OpenData Latinoamerica
Introduction: Miguel Paz writes : Poderomedia Foundation and PinLatam are launching OpenDataLatinoamerica.org, a regional data repository to free data and use it on Hackathons and other activities by HacksHackers chapters and other organizations. We are doing this because the road to the future of news has been littered with lost datasets. A day or so after every hackathon and meeting where a group has come together to analyze, compare and understand a particular set of data, someone tries to remember where the successful files were stored. Too often, no one is certain. Therefore with Mariano Blejman we realized that we need a central repository where you can share the data that you have proved to be reliable: OpenData Latinoamerica, which we are leading as ICFJ Knight International Journalism Fellows. If you work in Latin America or Central America your organization can take part in OpenDataLatinoamerica.org. To apply, go to the website and answer a simple form agreeing to meet the standard
6 0.68834555 580 andrew gelman stats-2011-02-19-Weather visualization with WeatherSpark
7 0.67241406 1920 andrew gelman stats-2013-06-30-“Non-statistical” statistics tools
8 0.66561449 176 andrew gelman stats-2010-08-02-Information is good
9 0.65525985 1014 andrew gelman stats-2011-11-16-Visualizations of NYPD stop-and-frisk data
10 0.65496916 211 andrew gelman stats-2010-08-17-Deducer update
11 0.65358925 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation
12 0.65025121 1175 andrew gelman stats-2012-02-19-Factual – a new place to find data
13 0.64880931 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.
14 0.64334863 192 andrew gelman stats-2010-08-08-Turning pages into data
15 0.63781273 714 andrew gelman stats-2011-05-16-NYT Labs releases Openpaths, a utility for saving your iphone data
16 0.63232261 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
17 0.60565954 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models
18 0.60264522 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size
19 0.59532499 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge
20 0.59483957 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations
topicId topicWeight
[(9, 0.05), (16, 0.013), (24, 0.078), (29, 0.027), (47, 0.022), (57, 0.237), (63, 0.05), (65, 0.031), (77, 0.023), (86, 0.037), (95, 0.018), (98, 0.101), (99, 0.18)]
simIndex simValue blogId blogTitle
1 0.87089241 1146 andrew gelman stats-2012-01-30-Convenient page of data sources from the Washington Post
Introduction: Wayne Folta points us to this list .
2 0.86587465 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion
Introduction: We have lots of models for overdispersed count data but we rarely see underdispersed data. But now I know what example I’ll be giving when this next comes up in class. From a book review by Theo Tait: A number of shark species go in for oophagy, or uterine cannibalism. Sand tiger foetuses ‘eat each other in utero, acting out the harshest form of sibling rivalry imaginable’. Only two babies emerge, one from each of the mother shark’s uteruses: the survivors have eaten everything else. ‘A female sand tiger gives birth to a baby that’s already a metre long and an experienced killer,’ explains Demian Chapman, an expert on the subject. That’s what I call underdispersion. E(y)=2, var(y)=0. Take that, M. Poisson!
same-blog 3 0.85162759 215 andrew gelman stats-2010-08-18-DataMarket
Introduction: It seems that every day brings a better system for exploring and sharing data on the Internet. From Iceland comes DataMarket . DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. Here’s an example of visualizing rankings of countries competing in WorldCup: And here’s a lovely example of visualizing population pyramids : In the future, the visualizations will also include state of the art models for predicting and imputing missing data, and understanding the underlying mechanisms. Other posts: InfoChimps , Future of Data Analysis
Introduction: That’s ok , Krugman earlier slammed Galbraith. (I wonder if Krugman is as big a fan of “tough choices” now as he was in 1996 .) Given Krugman’s politicization in recent years, I’m surprised he’s so dismissive of the political (rather than technical-economic) nature of Hayek’s influence. (I don’t know if he’s changed his views on Galbraith in recent years.) P.S. Greg Mankiw, in contrast, labels Galbraith and Hayek as “two of the great economists of the 20th century” and writes, “even though their most famous works were written many decades ago, they are still well worth reading today.”
5 0.76840967 1018 andrew gelman stats-2011-11-19-Tempering and modes
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
6 0.76755619 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?
7 0.7605921 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?
8 0.75783265 891 andrew gelman stats-2011-09-05-World Bank data now online
10 0.71818602 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
11 0.7123642 1120 andrew gelman stats-2012-01-15-Fun fight over the Grover search algorithm
12 0.69540429 306 andrew gelman stats-2010-09-29-Statistics and the end of time
13 0.69149494 989 andrew gelman stats-2011-11-03-This post does not mention Wegman
14 0.69025755 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?
15 0.68988448 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!
16 0.68636715 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models
17 0.6863327 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey
19 0.65746623 1108 andrew gelman stats-2012-01-09-Blogging, polemical and otherwise
20 0.65737367 1 andrew gelman stats-2010-04-22-Political Belief Networks: Socio-cognitive Heterogeneity in American Public Opinion