andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-454 knowledge-graph by maker-knowledge-mining

454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?


meta infos for this blog

Source: html

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. [sent-1, score-0.093]

2 The differences are small across state lines, but the consistent differences along the borders really don’t make. [sent-2, score-0.556]

3 This weird pattern shows up in other CDC data I’ve seen. [sent-5, score-0.171]

4 Turns out that CDC isn’t providing data , they’re providing model . [sent-6, score-0.385]

5 Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. [sent-7, score-0.495]

6 I went to the original data source, the CDC, and then to the relevant FAQ . [sent-8, score-0.099]

7 There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U. [sent-9, score-0.578]

8 The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. [sent-12, score-0.251]

9 The survey provides state-specific information” So the CDC then uses a complicated statistical procedure (“indirect model-dependent estimates” using Bayesian techniques and multilevel Poisson regression models) to go from state to county prevalence estimates. [sent-13, score-0.938]

10 My hunch is that the state level averages thereby affect the county estimates. [sent-14, score-0.741]

11 The FAQ in fact says “State is included as a county-level covariate. [sent-15, score-0.058]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('cdc', 0.55), ('brfss', 0.268), ('county', 0.25), ('faq', 0.244), ('prevalence', 0.189), ('estimates', 0.168), ('state', 0.163), ('differences', 0.145), ('providing', 0.143), ('diabetes', 0.122), ('khan', 0.122), ('el', 0.115), ('razib', 0.115), ('surveillance', 0.115), ('borders', 0.103), ('hunch', 0.1), ('data', 0.099), ('regulations', 0.096), ('gradient', 0.093), ('stamp', 0.093), ('thereby', 0.09), ('answered', 0.09), ('bureau', 0.088), ('indirect', 0.086), ('survey', 0.086), ('tricky', 0.085), ('poisson', 0.083), ('telephone', 0.083), ('adult', 0.082), ('causing', 0.082), ('ongoing', 0.082), ('behavioral', 0.079), ('monthly', 0.079), ('thank', 0.077), ('manner', 0.077), ('discover', 0.076), ('averages', 0.075), ('census', 0.074), ('frank', 0.074), ('weird', 0.072), ('techniques', 0.067), ('policies', 0.066), ('procedure', 0.063), ('affect', 0.063), ('turns', 0.063), ('complicated', 0.061), ('measurement', 0.06), ('asks', 0.059), ('provides', 0.059), ('included', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati

2 0.18389691 993 andrew gelman stats-2011-11-05-The sort of thing that gives technocratic reasoning a bad name

Introduction: 1. Freakonomics characterizes drunk driving as an example of “the human tendency to worry about rare problems that are unlikely to happen.” 2. The CDC reports , “Alcohol-impaired drivers are involved in about 1 in 3 crash deaths, resulting in nearly 11,000 deaths in 2009.” No offense to the tenured faculty at the University of Chicago, but I’m going with the CDC on this one. P.S. The Freakonomics blog deserves to be dinged another time, not just for claiming, based on implausible assumptions and making the all-else-equal fallacy that “drunk walking is 8 times more likely to result in your death than drunk driving” but for presenting this weak inference as a fact rather than as a speculation. When doing “Freakonomics,” you can be counterintuitive, or you can be sensible, but it’s hard to be both. I mean, sure, sometimes you can be. But there’s a tradeoff, and in this case, they’re choosing to push the envelope on counterintuitiveness.

3 0.148146 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

4 0.13213503 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

Introduction: John Pugliese writes: I was recently in a conversation with some colleagues regarding the evaluation of recent welfare reform in California. The discussion centered around what types of design might allow us to understand the impact the changes. Experimental designs were out, as random assignment is not feasible. Our data is pre/post, and some of my colleagues believed that the best we can do under these circumstance was a descriptive study; i.e. no causal inference. All of us were concerned with changes in economic and population changes over the pre-to-post period; i.e. over-estimating the effects in an improving economy. I was thought a quasi-experimental design was possible using MLM. Briefly, my suggestion was the following: Match our post-participants to a set of pre-participants on relevant person level factors, and treat the pre/post differences as a random effect at the county level. Next, we would adjust the pre/post differences by changes in economic and populati

5 0.1088436 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

Introduction: One can quibble about the best way to display county-level unemployment data on a map, since a small, populous county gets much less visual weight than a large, sparsely populated one. Even so, I think we can agree that this animated map by LaToya Egwuekwe is pretty cool. It says it shows the unemployment rate by county, as a function of time, but anyone with even the slightest knowledge of what happens during a zombie attack will recognize it for what it is.

6 0.10163183 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

7 0.084979475 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

8 0.078321412 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

9 0.077946536 483 andrew gelman stats-2010-12-23-Science, ideology, and human origins

10 0.076743633 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

11 0.076046392 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

12 0.075658612 1725 andrew gelman stats-2013-02-17-“1.7%” ha ha ha

13 0.074370682 730 andrew gelman stats-2011-05-25-Rechecking the census

14 0.074124582 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

15 0.072526537 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

16 0.071246393 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

17 0.0671001 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

18 0.06594871 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

19 0.065347865 962 andrew gelman stats-2011-10-17-Death!

20 0.063091286 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.112), (1, 0.049), (2, 0.072), (3, -0.013), (4, 0.037), (5, 0.022), (6, -0.05), (7, -0.002), (8, 0.031), (9, 0.013), (10, 0.028), (11, -0.031), (12, -0.002), (13, 0.037), (14, 0.011), (15, 0.034), (16, 0.002), (17, 0.003), (18, 0.03), (19, -0.006), (20, -0.022), (21, 0.011), (22, -0.05), (23, 0.0), (24, -0.019), (25, -0.036), (26, -0.009), (27, -0.017), (28, 0.027), (29, 0.032), (30, 0.051), (31, -0.027), (32, 0.015), (33, -0.01), (34, 0.019), (35, 0.033), (36, 0.021), (37, -0.007), (38, 0.012), (39, 0.009), (40, -0.015), (41, 0.014), (42, -0.023), (43, -0.039), (44, 0.008), (45, 0.007), (46, 0.004), (47, 0.036), (48, 0.011), (49, -0.005)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.92974013 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati

2 0.79431212 962 andrew gelman stats-2011-10-17-Death!

Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly

3 0.78093755 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

Introduction: Alban Zeber writes: Suppose I have survey data from say 10 countries where by each country collected the data based on different sampling routines – the results of this being that each country has its own weights for the data that can be used in the analyses. If I analyse the data of each country separately then I can incorporate the survey design in the analyses e.g in Stata once can use svyset ….. But what happens when I want to do a pooled analysis of the all the data from the 10 countries: Presumably either 1. I analyse the data from each country separately (using multiple or logistic regression, …) accounting for the survey design and then combine the estimates using a meta analysis (fixed or random) OR 2. Assume that the data from each country is a simple random sample from the population, combine the data from the 10 countries and then use multilevel or hierarchical models My question is which of the methods is likely to give better estimates? Or is the

4 0.75725925 70 andrew gelman stats-2010-06-07-Mister P goes on a date

Introduction: I recently wrote something on the much-discussed OK Cupid analysis of political attitudes of a huge sample of people in their dating database. My quick comment was that their analysis was interesting, but participants on an online dating site must certainly be far from a random sample of Americans. But suppose I want to not just criticize but also think in a positive direction. OK Cupid’s database is huge, and one thing statistical methods are good at–Bayesian methods in particular–is combining a huge amount of noisy, biased data with a smaller amount of good data. This is what we did in our radon study, using a high-quality survey of 5000 houses in 125 counties to calibrate a set of crappier surveys totaling 80,000 houses in 3000 counties. How would it work for OK Cupid? We’d want to take their data and poststratify on: Age Sex Marital/family status Education Income Partisanship Ideology Political participation Religion and religious attendance State Urban/rural/

5 0.75600672 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

6 0.75428718 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

7 0.73199254 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!

8 0.7310642 2056 andrew gelman stats-2013-10-09-Mister P: What’s its secret sauce?

9 0.7185387 159 andrew gelman stats-2010-07-23-Popular governor, small state

10 0.70032012 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

11 0.69927579 1511 andrew gelman stats-2012-09-26-What do statistical p-values mean when the sample = the population?

12 0.69866657 948 andrew gelman stats-2011-10-10-Combining data from many sources

13 0.69571733 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

14 0.69167411 200 andrew gelman stats-2010-08-11-Separating national and state swings in voting and public opinion, or, How I avoided blogorific embarrassment: An agony in four acts

15 0.68979269 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

16 0.68804157 2061 andrew gelman stats-2013-10-14-More on Mister P and how it does what it does

17 0.67259967 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

18 0.67130113 1725 andrew gelman stats-2013-02-17-“1.7%” ha ha ha

19 0.67036772 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

20 0.66791236 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.012), (9, 0.036), (14, 0.01), (16, 0.04), (21, 0.027), (24, 0.112), (41, 0.204), (42, 0.022), (45, 0.015), (52, 0.012), (53, 0.038), (57, 0.012), (84, 0.049), (86, 0.019), (89, 0.011), (95, 0.012), (96, 0.011), (99, 0.258)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93697667 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

Introduction: John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write: 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . . In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . . The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes. They also find that, if you start with the prediction based on parents’ heigh

2 0.92939556 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

Introduction: In politics we’re familiar with the non-apology apology (well described in Wikipedia as “a statement that has the form of an apology but does not express the expected contrition”). Here’s the scientific equivalent: the non-retraction retraction. Sanjay Srivastava points to an amusing yet barfable story of a pair of researchers who (inadvertently, I assume) made a data coding error and were eventually moved to issue a correction notice, but even then refused to fully admit their error. As Srivastava puts it, the story “ended up with Lew [Goldberg] and colleagues [Kibeom Lee and Michael Ashton] publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal.” From the comment on the erratum: In their “erratum and addendum,” Anderson and Ones (this issue) explained that we had brought their attention to the “potential” of a “possible” misalignment and described the results computed from re-aligned data as being based on a “post-ho

same-blog 3 0.91439843 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati

4 0.91369104 685 andrew gelman stats-2011-04-29-Data mining and allergies

Introduction: With all this data floating around, there are some interesting analyses one can do. I came across “The Association of Tree Pollen Concentration Peaks and Allergy Medication Sales in New York City: 2003-2008″ by Perry Sheffield . There they correlate pollen counts with anti-allergy medicine sales – and indeed find that two days after high pollen counts, the medicine sales are the highest. Of course, it would be interesting to play with the data to see *what* tree is actually causing the sales to increase the most. Perhaps this would help the arborists what trees to plant. At the moment they seem to be following a rather sexist approach to tree planting: Ogren says the city could solve the problem by planting only female trees, which don’t produce pollen like male trees do. City arborists shy away from females because many produce messy – or in the case of ginkgos, smelly – fruit that litters sidewalks. In Ogren’s opinion, that’s a mistake. He says the females only pro

5 0.90988308 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

Introduction: Wayne Folta points me to “EigenBracket 2012: Using Graph Theory to Predict NCAA March Madness Basketball” and writes, “I [Folta] have got to believe that he’s simply re-invented a statistical method in a graph-ish context, but don’t know enough to judge.” I have not looked in detail at the method being presented here—I’m not much of college basketball fan—but I’d like to use this as an excuse to make one of my favorite general point, which is that a good way to characterize any statistical method is by what information it uses. The basketball ranking method here uses score differentials between teams in the past season. On the plus side, that is better than simply using one-loss records (which (a) discards score differentials and (b) discards information on who played whom). On the minus side, the method appears to be discretizing the scores (thus throwing away information on the exact score differential) and doesn’t use any external information such as external ratings. A

6 0.90304548 2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density

7 0.90302896 516 andrew gelman stats-2011-01-14-A new idea for a science core course based entirely on computer simulation

8 0.89624929 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

9 0.87723333 1013 andrew gelman stats-2011-11-16-My talk at Math for America on Saturday

10 0.87389874 1300 andrew gelman stats-2012-05-05-Recently in the sister blog

11 0.87372941 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

12 0.86594808 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!

13 0.86381149 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

14 0.85970753 1816 andrew gelman stats-2013-04-21-Exponential increase in the number of stat majors

15 0.84973055 778 andrew gelman stats-2011-06-24-New ideas on DIC from Martyn Plummer and Sumio Watanabe

16 0.84854245 2202 andrew gelman stats-2014-02-07-Outrage of the week

17 0.8438769 447 andrew gelman stats-2010-12-03-Reinventing the wheel, only more so.

18 0.84256554 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

19 0.84212649 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

20 0.83758712 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys