andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1191 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would
sentIndex sentText sentNum sentScore
1 Gerrit Storms reports on an interesting linguistic research project in which you can participate! [sent-1, score-0.177]
2 Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. [sent-2, score-0.55]
3 It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. [sent-3, score-0.989]
4 Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. [sent-4, score-0.893]
5 Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). [sent-5, score-0.403]
6 Most people enjoy doing the task, but we need thousands of participants to succeed. [sent-6, score-0.157]
7 Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. [sent-7, score-0.237]
8 Would it be possible to forward this call for participation to graduate and undergraduate students who are fluent in English? [sent-9, score-0.429]
9 For each word, you are supposed to give three spontaneous associations. [sent-13, score-0.236]
10 I find it difficult to give more than one, or at times two, spontaneous word associations. [sent-14, score-0.65]
11 I started to get tangled up in a concern of whether I should be giving synonyms or just related words. [sent-16, score-0.083]
12 My internet connection was slow when I was filling out the forms. [sent-18, score-0.155]
13 Sometimes I was clicking and nothing was happening, other times it whipped through words too fast for me to follow. [sent-19, score-0.431]
14 Gerrit also writes: If people would REALLY like to help us, they can forward the call to students, friends, family, etc. [sent-22, score-0.221]
15 (In this way, we succeeded in building a word association network in Dutch over the past years. [sent-24, score-0.976]
16 The network comprises about 13,000 words and was built using more than 4 million word associations, gathered from 100,000 native Dutch speakers. [sent-25, score-1.16]
17 The problem is only: who cares about Dutch data. [sent-26, score-0.075]
18 ) Any suggestion about how to reach more participants is welcome (societies that we can e-mail, local communities who want to put this on their website, . [sent-28, score-0.306]
19 ) Of course the network will be freely available to all interested language researchers when it becomes substantial enough. [sent-31, score-0.581]
wordName wordTfidf (topN-words)
[('word', 0.345), ('network', 0.306), ('task', 0.254), ('dutch', 0.254), ('gerrit', 0.236), ('spontaneous', 0.236), ('words', 0.164), ('participants', 0.157), ('association', 0.156), ('participate', 0.133), ('call', 0.121), ('comprises', 0.107), ('whipped', 0.107), ('alzheimer', 0.101), ('distribute', 0.101), ('longer', 0.101), ('forward', 0.1), ('semantics', 0.097), ('gathered', 0.097), ('semantic', 0.097), ('succeeded', 0.093), ('project', 0.092), ('clicking', 0.091), ('filling', 0.091), ('develops', 0.088), ('lexicon', 0.086), ('linguistic', 0.085), ('tangled', 0.083), ('freely', 0.08), ('till', 0.08), ('communities', 0.079), ('societies', 0.078), ('native', 0.078), ('associations', 0.077), ('past', 0.076), ('cares', 0.075), ('facebook', 0.075), ('twitter', 0.073), ('students', 0.071), ('memory', 0.07), ('participation', 0.07), ('welcome', 0.07), ('times', 0.069), ('undergraduate', 0.067), ('substantial', 0.066), ('researchers', 0.065), ('disease', 0.065), ('slow', 0.064), ('interested', 0.064), ('built', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1191 andrew gelman stats-2012-03-01-Hoe noem je?
Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would
2 0.24407907 318 andrew gelman stats-2010-10-04-U-Haul statistics
Introduction: Very freakonomic (and I mean that in the best sense of the word).
3 0.20809171 77 andrew gelman stats-2010-06-09-Sof[t]
Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug
Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how
5 0.11844701 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.
Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i
6 0.11534785 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks
7 0.11482868 476 andrew gelman stats-2010-12-19-Google’s word count statistics viewer
8 0.11068029 1996 andrew gelman stats-2013-08-24-All inference is about generalizing from sample to population
9 0.10374578 1502 andrew gelman stats-2012-09-19-Scalability in education
10 0.10208594 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.
11 0.097707696 938 andrew gelman stats-2011-10-03-Comparing prediction errors
12 0.094916634 925 andrew gelman stats-2011-09-26-Ethnicity and Population Structure in Personal Naming Networks
13 0.084484816 756 andrew gelman stats-2011-06-10-Christakis-Fowler update
14 0.081099391 1959 andrew gelman stats-2013-07-28-50 shades of gray: A research story
15 0.08068569 742 andrew gelman stats-2011-06-02-Grouponomics, counterfactuals, and opportunity cost
16 0.080329344 1904 andrew gelman stats-2013-06-18-Job opening! Come work with us!
17 0.079199672 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy
18 0.07880298 2234 andrew gelman stats-2014-03-05-Plagiarism, Arizona style
19 0.074756056 1618 andrew gelman stats-2012-12-11-The consulting biz
20 0.072827592 1236 andrew gelman stats-2012-03-29-Resolution of Diederik Stapel case
topicId topicWeight
[(0, 0.145), (1, -0.046), (2, -0.016), (3, -0.033), (4, 0.033), (5, 0.056), (6, 0.021), (7, 0.009), (8, -0.021), (9, -0.003), (10, -0.014), (11, -0.011), (12, 0.022), (13, -0.007), (14, -0.031), (15, 0.015), (16, 0.039), (17, -0.033), (18, 0.011), (19, 0.011), (20, -0.007), (21, -0.019), (22, -0.002), (23, -0.003), (24, 0.002), (25, -0.013), (26, 0.032), (27, -0.003), (28, 0.028), (29, 0.016), (30, -0.029), (31, -0.011), (32, -0.016), (33, 0.005), (34, 0.012), (35, 0.018), (36, 0.003), (37, -0.0), (38, -0.015), (39, -0.033), (40, 0.032), (41, -0.023), (42, 0.009), (43, -0.006), (44, -0.04), (45, -0.017), (46, 0.008), (47, -0.005), (48, -0.031), (49, 0.01)]
simIndex simValue blogId blogTitle
same-blog 1 0.97072667 1191 andrew gelman stats-2012-03-01-Hoe noem je?
Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would
2 0.80180275 1539 andrew gelman stats-2012-10-18-IRB nightmares
Introduction: Andrew Perrin nails it : Twice a year, like clockwork, the ethics cops at the IRB [institutional review board, the group on campus that has to approve research involving human subjects] take a break from deciding whether or not radioactive isotopes can be administered to prison populations to cure restless-leg syndrome to dream up some fancy new way in which participating in an automated telephone poll might cause harm. Perrin adds: The list of exemptions to IRB review is too short and, more importantly, contains no guiding principle as to what makes exempt. . . . [and] Even exemptions require approval by the IRB. He also voices a thought I’ve had many times, which is that there are all sorts of things you or I or anyone else can do on the street (for example, go up to people and ask them personal questions, drop objects and see if people pick them up, stage fights with our friends to see the reactions of bystanders, etc etc etc) but for which we have to go through an IRB
3 0.78809875 35 andrew gelman stats-2010-05-16-Another update on the spam email study
Introduction: I think youall are probably getting sick of this by now so I’ll put it all below the fold. Akinola Modupe and Katherine Milkman responded to my email about their study : We want to clarify the reason we believe that the use of deception and a lack of informed consent were appropriate and ethical for this research study. In this project, we were studying how the timing of a decision affects discrimination based on race and/or gender. The emails all participants in our study received were identical except for a) the sender’s name (we used 20 names that pretesting revealed were strongly associated with being either Caucasian, Black, Indian, Chinese or Hispanic, as well as associated with being male or female) and b) whether the meeting requested was for today or for a week from today. Recipients were randomly selected and were randomly assigned to one of the race/gender/timing conditions. This study design will allow us to test for baseline levels of discrimination in acade
4 0.7637645 969 andrew gelman stats-2011-10-22-Researching the cost-effectiveness of political lobbying organisations
Introduction: Sally Murray from Giving What We Can writes: We are an organisation that assesses different charitable (/fundable) interventions, to estimate which are the most cost-effective (measured in terms of the improvement of life for people in developing countries gained for every dollar invested). Our research guides and encourages greater donations to the most cost-effective charities we thus identify, and our members have so far pledged a total of $14m to these causes, with many hundreds more relying on our advice in a less formal way. I am specifically researching the cost-effectiveness of political lobbying organisations. We are initially focusing on organisations that lobby for ‘big win’ outcomes such as increased funding of the most cost-effective NTD treatments/ vaccine research, changes to global trade rules (potentially) and more obscure lobbies such as “Keep Antibiotics Working”. We’ve a great deal of respect for your work and the superbly rational way you go about it, and
5 0.75052845 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?
Introduction: As someone who relies strongly on survey research, it’s good for me to be reminded that some surveys are useful, some are useless, but one thing they almost all have in common is . . . they waste the respondents’ time. I thought of this after receiving the following email, which I shall reproduce here. My own comments appear after. Recently, you received an email from a student asking for 10 minutes of your time to discuss your Ph.D. program (the body of the email appears below). We are emailing you today to debrief you on the actual purpose of that email, as it was part of a research study. We sincerely hope our study did not cause you any disruption and we apologize if you were at all inconvenienced. Our hope is that this letter will provide a sufficient explanation of the purpose and design of our study to alleviate any concerns you may have about your involvement. We want to thank you for your time and for reading further if you are interested in understanding why you rece
6 0.74935758 2313 andrew gelman stats-2014-04-30-Seth Roberts
7 0.74670774 989 andrew gelman stats-2011-11-03-This post does not mention Wegman
8 0.74405777 732 andrew gelman stats-2011-05-26-What Do We Learn from Narrow Randomized Studies?
10 0.734227 1434 andrew gelman stats-2012-07-29-FindTheData.org
11 0.72530949 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform
12 0.72525591 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
13 0.72106749 866 andrew gelman stats-2011-08-23-Participate in a research project on combining information for prediction
14 0.71918315 1933 andrew gelman stats-2013-07-10-Please send all comments to -dev-ripley
15 0.71896595 1453 andrew gelman stats-2012-08-10-Quotes from me!
16 0.71766698 116 andrew gelman stats-2010-06-29-How to grab power in a democracy – in 5 easy non-violent steps
17 0.71661466 1261 andrew gelman stats-2012-04-12-The Naval Research Lab
18 0.71434242 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
19 0.71298647 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge
20 0.71289378 88 andrew gelman stats-2010-06-15-What people do vs. what they want to do
topicId topicWeight
[(4, 0.035), (9, 0.021), (14, 0.038), (16, 0.018), (21, 0.043), (24, 0.217), (32, 0.117), (63, 0.015), (71, 0.01), (77, 0.02), (86, 0.04), (87, 0.017), (89, 0.012), (99, 0.245)]
simIndex simValue blogId blogTitle
same-blog 1 0.95810127 1191 andrew gelman stats-2012-03-01-Hoe noem je?
Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would
Introduction: Ken Rice writes: In the recent discussion on stopping rules I saw a comment that I wanted to chip in on, but thought it might get a bit lost, in the already long thread. Apologies in advance if I misinterpreted what you wrote, or am trying to tell you things you already know. The comment was: “In Bayesian decision making, there is a utility function and you choose the decision with highest expected utility. Making a decision based on statistical significance does not correspond to any utility function.” … which immediately suggests this little 2010 paper; A Decision-Theoretic Formulation of Fisher’s Approach to Testing, The American Statistician, 64(4) 345-349. It contains utilities that lead to decisions that very closely mimic classical Wald tests, and provides a rationale for why this utility is not totally unconnected from how some scientists think. Some (old) slides discussing it are here . A few notes, on things not in the paper: * I know you don’t like squared-
3 0.92020881 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
4 0.91927576 896 andrew gelman stats-2011-09-09-My homework success
Introduction: A friend writes to me: You will be amused to know that students in our Bayesian Inference paper at 4th year found solutions to exercises from your book on-line. The amazing thing was that some of them were dumb enough to copy out solutions verbatim. However, I thought you might like to know you have done well in this class! I’m happy to hear this. I worked hard on those solutions!
5 0.91923702 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
Introduction: Interesting discussion from David Gorski (which I found via this link from Joseph Delaney). I don’t have anything really to add to this discussion except to note the value of this sort of anecdote in a statistics discussion. It’s only n=1 and adds almost nothing to the literature on the effectiveness of various treatments, but a story like this can help focus one’s thoughts on the decision problems.
6 0.91871738 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
7 0.91868979 1465 andrew gelman stats-2012-08-21-D. Buggin
8 0.9181211 1792 andrew gelman stats-2013-04-07-X on JLP
10 0.91559196 1208 andrew gelman stats-2012-03-11-Gelman on Hennig on Gelman on Bayes
11 0.91554993 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
12 0.91522843 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
13 0.91476756 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
14 0.91462004 1970 andrew gelman stats-2013-08-06-New words of 1917
15 0.91427863 1240 andrew gelman stats-2012-04-02-Blogads update
16 0.9142015 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
17 0.91397923 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
18 0.91365433 77 andrew gelman stats-2010-06-09-Sof[t]
19 0.91349488 846 andrew gelman stats-2011-08-09-Default priors update?