andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-465 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at
sentIndex sentText sentNum sentScore
1 i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California. [sent-1, score-0.925]
2 In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. [sent-3, score-0.516]
3 The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. [sent-4, score-1.284]
4 the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. [sent-9, score-1.305]
5 “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at least part of the health care cost conundrum . [sent-10, score-1.509]
6 ” I don’t know enough about health policy to know if this makes sense. [sent-13, score-0.328]
7 Ultimately, the goal is not to predict hospitalization, but to avoid it. [sent-14, score-0.222]
8 But maybe if you can predict it well, it could be possible to design the system a bit better. [sent-15, score-0.342]
9 The current system–in which the doctor’s office is open about 40 hours a week, and otherwise you have to go the emergency room–is a joke. [sent-16, score-0.372]
wordName wordTfidf (topN-words)
[('hospitalization', 0.349), ('health', 0.328), ('analytics', 0.238), ('predict', 0.222), ('members', 0.167), ('licensed', 0.159), ('spur', 0.159), ('provider', 0.15), ('conundrum', 0.15), ('contest', 0.138), ('heritage', 0.138), ('involved', 0.138), ('emergency', 0.134), ('care', 0.125), ('doctor', 0.123), ('system', 0.12), ('involvement', 0.119), ('hospital', 0.115), ('managed', 0.11), ('winners', 0.11), ('mining', 0.108), ('prize', 0.104), ('largest', 0.104), ('innovative', 0.102), ('records', 0.1), ('teams', 0.098), ('date', 0.097), ('joke', 0.096), ('receive', 0.095), ('designed', 0.093), ('network', 0.091), ('set', 0.09), ('organization', 0.09), ('release', 0.087), ('room', 0.086), ('accuracy', 0.085), ('limited', 0.085), ('may', 0.084), ('following', 0.083), ('office', 0.083), ('challenge', 0.083), ('press', 0.082), ('hours', 0.08), ('participants', 0.078), ('solve', 0.078), ('million', 0.077), ('individuals', 0.076), ('bring', 0.076), ('defined', 0.075), ('otherwise', 0.075)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at
2 0.16157377 635 andrew gelman stats-2011-03-29-Bayesian spam!
Introduction: Cool! I know Bayes has reached the big time when I receive spam like this: Bayesian networks are rapidly emerging as a new research paradigm . . . With this monthly newsletter, we’ll keep you up to date . . . Financial Analytics Webinar . . . will exhibit at this year’s INFORMS Analytics Conference in downtown Chicago. Please join us for our Bayesian networks technology workshop on April 10 . . . a powerful desktop application (Windows/Mac/Unix) for knowledge discovery, data mining, analytics, predictive modeling and simulation . . . the world’s only comprehensive software package for learning, editing and analyzing Bayesian networks . . . If you no longer wish to receive these emails, please reply to this message with “Unsubscribe” in the subject line . . . You know the saying, “It’s not real unless it’s on TV”? My saying is: It’s not real until it’s on spam.
3 0.13644919 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
Introduction: David Shor sends along a job announcement for Civis Analytics, which he describes as “basically Obama’s Analytics team reconstituted as a company”: Data Scientist Position Overview Data Scientists are responsible for providing the fundamental data science that powers our work – including predictive analytics, data mining, experimental design and ad-hoc statistical analysis. As a Data Scientist, you will join our Chicago-based data science team, working closely and collaboratively with analysts and engineers to identify, quantify and solve big, meaningful problems. Data Scientists will have the opportunity to dive deeply into big problems and work in a variety of areas. Civis Analytics has opportunities for applicants who are seasoned professionals, brilliant new comers, and anywhere in between. Qualifications · Master’s degree in statistics, machine learning, computer science with heavy quant focus, a related subject, or a Bachelor’s degree and significant work ex
4 0.12886886 585 andrew gelman stats-2011-02-22-“How has your thinking changed over the past three years?”
Introduction: Harold Pollack writes: Over the past three years, we have experienced an amazing number of political, economic, and legislative trials. I suppose it’s human nature to respond to such events by doubling down on our own prior strongly-held beliefs. Health care reform/TARP/stimulus, whatever–that proves that I am even more right than I thought I was! That’s really too bad. We’ve been through some hard trials recently, in multiple senses. We’ve been tested by difficult times. We’ve also had the opportunity to see many of our beliefs tested through real-world experiments that should challenge our ideological, strategic, and policy views. Anyone active and attentive should be thinking differently about _something_ important after having witnessed so much history being made so quickly on so many different fronts. Have your own views changed on any basic issues of domestic policy? I’m not so much interested in your assessment of particular politicians or specific political tactics. Ra
5 0.12535825 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
Introduction: Rhoderick Machekano writes: I have a design question which has been bothering me and wonder if you can clear for me. In my line of work, we often conveniently select health centers and from those sample patients. When I am doing sample size estimation under this design do I account for the design effect – since I expect outcomes in patients from the same health center to be correlated? Given that I didn’t random sample the health facilities, is my only limitation that I cannot generalize the results and make group level comparisons in the analysis? My response: You can generalize the results even if you didn’t randomly sample the health facilities. The only thing is that your generalization applies to the implicit population of facilities to which your sample is representative. You could try to move further on this by considering facility-level predictors. Regarding sample size estimation, see chapter 20 .
6 0.11124372 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys
8 0.10790931 178 andrew gelman stats-2010-08-03-(Partisan) visualization of health care legislation
9 0.10783904 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys
10 0.10412195 1147 andrew gelman stats-2012-01-30-Statistical Murder
12 0.10056338 15 andrew gelman stats-2010-05-03-Public Opinion on Health Care Reform
13 0.095070966 1916 andrew gelman stats-2013-06-27-The weirdest thing about the AJPH story
14 0.090906844 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”
15 0.090585694 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
16 0.090241984 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
18 0.084170058 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study
19 0.08125554 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!
20 0.080229357 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
topicId topicWeight
[(0, 0.132), (1, -0.037), (2, 0.018), (3, -0.019), (4, 0.031), (5, 0.05), (6, -0.02), (7, -0.025), (8, -0.034), (9, 0.022), (10, -0.04), (11, -0.028), (12, 0.013), (13, 0.029), (14, -0.059), (15, 0.028), (16, 0.056), (17, -0.023), (18, 0.047), (19, 0.028), (20, -0.002), (21, 0.05), (22, 0.003), (23, 0.013), (24, -0.022), (25, 0.007), (26, -0.002), (27, 0.015), (28, 0.027), (29, -0.008), (30, -0.051), (31, -0.031), (32, -0.011), (33, 0.018), (34, 0.018), (35, -0.03), (36, -0.016), (37, 0.043), (38, 0.016), (39, 0.001), (40, 0.001), (41, -0.053), (42, -0.052), (43, 0.009), (44, 0.013), (45, 0.03), (46, 0.02), (47, -0.021), (48, -0.045), (49, -0.015)]
simIndex simValue blogId blogTitle
same-blog 1 0.96502066 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
Introduction: i received the following press release from the Heritage Provider Network, “the largest limited Knox-Keene licensed managed care organization in California.” I have no idea what this means, but I assume it’s some sort of HMO. In any case, this looks like it could be interesting: Participants in the Health Prize challenge will be given a data set comprised of the de-identified medical records of 100,000 individuals who are members of HPN. The teams will then need to predict the hospitalization of a set percentage of those members who went to the hospital during the year following the start date, and do so with a defined accuracy rate. The winners will receive the $3 million prize. . . . the contest is designed to spur involvement by others involved in analytics, such as those involved in data mining and predictive modeling who may not currently be working in health care. “We believe that doing so will bring innovative thinking to health analytics and may allow us to solve at
2 0.72763121 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study
Introduction: Hank Aaron at the Brookings Institution, who knows a lot more about policy than I do, had some interesting comments on the recent New York Times article about problems with the Dartmouth health care atlas. which I discussed a few hours ago . Aaron writes that much of the criticism in that newspaper article was off-base, but that there are real difficulties in translating the Dartmouth results (finding little relation between spending and quality of care) to cost savings in the real world. Aaron writes: The Dartmouth research, showing huge variation in the use of various medical procedures and large variations in per patient spending under Medicare, has been a revelation and a useful one. There is no way to explain such variation on medical grounds and it is problematic. But readers, including my former colleague Orszag, have taken an oversimplistic view of what the numbers mean and what to do about them. There are three really big problems with the common interpreta
3 0.72531176 1147 andrew gelman stats-2012-01-30-Statistical Murder
Introduction: Image via Wikipedia Robert Zubrin writes in “How Much Is an Astronaut’s Life Worth?” ( Reason , Feb 2012 ): …policy analyst John D. Graham and his colleagues at the Harvard Center for Risk Analysis found in 1997 that the median cost for lifesaving expenditures and regulations by the U.S. government in the health care, residential, transportation, and occupational areas ranges from about $1 million to $3 million spent per life saved in today’s dollars. The only marked exception to this pattern occurs in the area of environmental health protection (such as the Superfund program) which costs about $200 million per life saved. Graham and his colleagues call the latter kind of inefficiency “ statistical murder ,” since thousands of additional lives could be saved each year if the money were used more cost-effectively. To avoid such deadly waste, the Department of Transportation has a policy of rejecting any proposed safety expenditure that costs more than $3
4 0.69845986 645 andrew gelman stats-2011-04-04-Do you have any idea what you’re talking about?
Introduction: We all have opinions about the federal budget and how it should be spent. Infrequently, those opinions are informed by some knowledge about where the money actually goes. It turns out that most people don’t have a clue. What about you? Here, take this poll/quiz and then compare your answers to (1) what other people said, in a CNN poll that asked about these same items and (2) compare your answers to the real answers. Quiz is below the fold. The questions below are from a CNN poll. ======== Think about all the money that the federal government spent last year. I’m going to name a few federal programs and for each one, I’d like you to estimate what percentage of the federal government’s budget last year was spent on each of those programs. Medicare — the federal health program for the elderly Medicaid — the federal health program for the poor Social Security Military spending by the Department of Defense Aid to foreign countries for international development
5 0.69825035 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
Introduction: I just read this article on the treatment of medical volunteers, written by doctor and bioethicist Carl Ellliott. As a statistician who has done a small amount of consulting for pharmaceutical companies, I have a slightly different perspective. As a doctor, Elliott focuses on individual patients, whereas, as a statistician, I’ve been trained to focus on the goal of accurately estimate treatment effects. I’ll go through Elliott’s article and give my reactions. Elliott: In Miami, investigative reporters for Bloomberg Markets magazine discovered that a contract research organisation called SFBC International was testing drugs on undocumented immigrants in a rundown motel; since that report, the motel has been demolished for fire and safety violations. . . . SFBC had recently been named one of the best small businesses in America by Forbes magazine. The Holiday Inn testing facility was the largest in North America, and had been operating for nearly ten years before inspecto
6 0.69360149 178 andrew gelman stats-2010-08-03-(Partisan) visualization of health care legislation
9 0.66396964 1127 andrew gelman stats-2012-01-18-The Fixie Bike Index
10 0.66043049 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
11 0.65471506 322 andrew gelman stats-2010-10-06-More on the differences between drugs and medical devices
12 0.64779735 988 andrew gelman stats-2011-11-02-Roads, traffic, and the importance in decision analysis of carefully examining your goals
13 0.64529902 1906 andrew gelman stats-2013-06-19-“Behind a cancer-treatment firm’s rosy survival claims”
14 0.64501023 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!
16 0.64193708 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
17 0.63769269 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
18 0.63611466 68 andrew gelman stats-2010-06-03-…pretty soon you’re talking real money.
19 0.63330764 1618 andrew gelman stats-2012-12-11-The consulting biz
20 0.62848049 116 andrew gelman stats-2010-06-29-How to grab power in a democracy – in 5 easy non-violent steps
topicId topicWeight
[(9, 0.048), (16, 0.059), (21, 0.023), (23, 0.012), (24, 0.143), (27, 0.27), (31, 0.014), (53, 0.034), (99, 0.3)]
simIndex simValue blogId blogTitle
1 0.97830373 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)
Introduction: I was pleasantly surprised to have my recreational reading about baseball in the New Yorker interrupted by a digression on statistics. Sam Fuld of the Tampa Bay Rays, was the subjet of a Ben McGrath profile in the 4 July 2011 issue of the New Yorker , in an article titled Super Sam . After quoting a minor-league trainer who described Fuld as “a bit of a geek” (who isn’t these days?), McGrath gets into that lovely New Yorker detail: One could have pointed out the more persuasive and telling examples, such as the fact that in 2005, after his first pro season, with the Class-A Peoria Chiefs, Fuld applied for a fall internship with Stats, Inc., the research firm that supplies broadcasters with much of the data anad analysis that you hear in sports telecasts. After a description of what they had him doing, reviewing footage of games and cataloguing, he said “I thought, They have a stat for everything, but they don’t have any stats regarding foul balls.” Fuld’s
Introduction: Someone passed on to a message from his university library announcing that the journal “Wiley Interdisciplinary Reviews: Computational Statistics” is no longer free. Librarians have to decide what to do, so I thought I’d offer the following consumer guide: Wiley Computational Statistics journal Wikipedia Frequency 6 issues per year Continuously updated Includes articles from Wikipedia? Yes Yes Cites the Wikipedia sources it uses? No Yes Edited by recipient of ASA Founders Award? Yes No Articles are subject to rigorous review? No Yes Errors, when discovered, get fixed? No Yes Number of vertices in n-dimensional hypercube? 2n 2 n Easy access to Brady Bunch trivia? No Yes Cost (North America) $1400-$2800 $0 Cost (UK) £986-£1972 £0 Cost (Europe) €1213-€2426 €0 The choice seems pretty clear to me! It’s funny for the Wiley journal to start charging now
3 0.93501282 347 andrew gelman stats-2010-10-17-Getting arm and lme4 running on the Mac
Introduction: Our “arm” package in R requires Doug Bates’s “lme4″ which fits multilevel models. lme4 is currently having some problems on the Mac. But installation on the Mac can be done; it just takes a bit of work. I have two sets of instructions below. From Yu-Sung: If you have MAC OS DVD, you should install developer X code packages from it. Otherwise, install them from here . After this, do the following in R: install.packages(“lme4″, type = “source”) Then you will have lme4 in R and you can install arm without a problem. And, from David Ozonoff: I installed the lme4 package via the Package Installer but this didn’t work, of course. I then installed, via this link , gfortran which seemed to put the libraries in the right place (I had earlier installed via Fink the gcc42 compiler, so I’m not sure if this is required or not). I then ran, in R, this: install.packages(c(“Matrix”,”lme4″), repos=”http://R-Forge.R-project.org”) This does not appear to work since it wi
4 0.93085718 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
Introduction: John Keltz writes: What do you think about curved lines connecting discrete data-points? (For example, here .) The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it. In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input. Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics. In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves. The more general id
5 0.92305374 1472 andrew gelman stats-2012-08-28-Migrating from dot to underscore
Introduction: My C-oriented Stan collaborators have convinced me to use underscore (_) rather than dot (.) as much as possible in expressions in R. For example, I can name a variable n_years rather than n.years. This is fine. But I’m getting annoyed because I need to press the shift key every time I type the underscore. What do people do about this? I know that it’s easy enough to reassign keys (I could, for example, assign underscore to backslash, which I never use). I’m just wondering what C programmers actually do. Do they reassign the key or do they just get used to pressing Shift? P.S. In comments, Ben Hyde points to Google’s R style guide, which recommends that variable names use dots, not underscore or camel case, for variable names (for example, “avg.clicks” rather than “avg_Clicks” or “avgClicks”). I think they’re recommending this to be consistent with R coding conventions . I am switching to underscores in R variable names to be consistent with C. Otherwise we were run
same-blog 6 0.92263854 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
7 0.91610396 343 andrew gelman stats-2010-10-15-?
8 0.91306865 173 andrew gelman stats-2010-07-31-Editing and clutch hitting
9 0.90882444 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?
10 0.90536565 1490 andrew gelman stats-2012-09-09-I’m still wondering . . .
11 0.89308155 1727 andrew gelman stats-2013-02-19-Beef with data
13 0.88077438 1238 andrew gelman stats-2012-03-31-Dispute about ethics of data sharing
14 0.87823415 1255 andrew gelman stats-2012-04-10-Amtrak sucks
15 0.87390947 1982 andrew gelman stats-2013-08-15-Blaming scientific fraud on the Kuhnians
16 0.87118596 804 andrew gelman stats-2011-07-15-Static sensitivity analysis
17 0.86246955 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
19 0.85526395 2079 andrew gelman stats-2013-10-27-Uncompressing the concept of compressed sensing
20 0.85183913 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher