andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1101 knowledge-graph by maker-knowledge-mining

1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?


meta infos for this blog

Source: html

Introduction: An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. I’m posting it here because I imagine t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). [sent-1, score-0.963]

2 He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. [sent-2, score-1.697]

3 The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. [sent-4, score-2.366]

4 But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. [sent-5, score-2.13]

5 I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. [sent-6, score-0.703]

6 I’m posting it here because I imagine that some of the psychometricians out there might have some comments. [sent-7, score-0.201]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('detectability', 0.518), ('signal', 0.377), ('reliability', 0.322), ('kappa', 0.297), ('cohen', 0.217), ('psychologist', 0.193), ('adjust', 0.187), ('guessing', 0.169), ('theory', 0.16), ('application', 0.152), ('straightforward', 0.114), ('adjustments', 0.113), ('acceptable', 0.108), ('wondered', 0.104), ('coding', 0.101), ('criteria', 0.096), ('versus', 0.091), ('standards', 0.087), ('knowing', 0.084), ('posting', 0.083), ('motivated', 0.079), ('measurement', 0.078), ('wondering', 0.074), ('experimental', 0.073), ('variation', 0.071), ('assumptions', 0.069), ('code', 0.069), ('area', 0.068), ('researcher', 0.067), ('present', 0.065), ('differences', 0.062), ('imagine', 0.062), ('experience', 0.062), ('field', 0.062), ('issues', 0.056), ('might', 0.056), ('applied', 0.053), ('looking', 0.051), ('comments', 0.05), ('answer', 0.05), ('saying', 0.048), ('working', 0.047), ('whether', 0.045), ('without', 0.041), ('important', 0.04), ('perhaps', 0.04), ('wrote', 0.039), ('enough', 0.039), ('question', 0.037), ('problem', 0.034)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?

Introduction: An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. I’m posting it here because I imagine t

2 0.1073391 500 andrew gelman stats-2011-01-03-Bribing statistics

Introduction: I Paid a Bribe by Janaagraha, a Bangalore based not-for-profit, harnesses the collective energy of citizens and asks them to report on the nature, number, pattern, types, location, frequency and values of corruption activities. These reports would be used to argue for improving governance systems and procedures, tightening law enforcement and regulation and thereby reduce the scope for corruption. Here’s a presentation of data from the application: Transparency International could make something like this much more widely available around the world . While awareness is good, follow-up is even better. For example, it’s known that New York’s subway signal inspections were being falsified . Signal inspections are pretty serious stuff, as failures lead to disasters , such as the one in Washington. Nothing much happened after: the person responsible (making $163k a year) was merely reassigned .

3 0.094435766 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

Introduction: There’s a paradigm in applied statistics that goes something like this: 1. There is a scientific or policy question of some theoretical or practical importance. 2. Researchers gather data on relevant outcomes and perform a statistical analysis, ideally leading to a clear conclusion (p less than 0.05, or a strong posterior distribution, or good predictive performance, or high reliability and validity, whatever). 3. This conclusion informs policy. This paradigm has room for positive findings (for example, that a new program is statistically significantly better, or statistically significantly worse than what came before) or negative findings (data are inconclusive, further study is needed), even if negative findings seem less likely to make their way into the textbooks. But what happens when step 2 simply isn’t possible. This came up a few years ago—nearly 10 years ago, now!—with the excellent paper by Donohue and Wolfers which explained why it’s just about impossible to

4 0.090407357 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

Introduction: Steve Cohen writes: As someone who has been working with Bayesian statistical models for the past several years, I [Cohen] have been challenged recently to describe the difference between Bayesian Networks (as implemented in BayesiaLab software) and modeling and inference using MCMC methods. I hope you have the time to give me (or to write on your blog) and relatively simple explanation that an advanced layman could understand. My reply: I skimmed the above website but I couldn’t quite see what they do. My guess is that they use MCMC and also various parametric approximations such as variational Bayes. They also seem to have something set up for decision analysis. My guess is that, compared to a general-purpose tool such as Stan, this Bayesia software is more accessible to non-academics in particular application areas (in this case, it looks like business marketing). But I can’t be sure. I’ve also heard about another company that looks to be doing something similar: h

5 0.085124709 568 andrew gelman stats-2011-02-11-Calibration in chess

Introduction: Has anybody done this study yet? I’m curious about the results. Perhaps there’s some chess-playing cognitive psychologist who’d like to collaborate on this?

6 0.077317216 2326 andrew gelman stats-2014-05-08-Discussion with Steven Pinker on research that is attached to data that are so noisy as to be essentially uninformative

7 0.07547392 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

8 0.072611235 563 andrew gelman stats-2011-02-07-Evaluating predictions of political events

9 0.070128933 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

10 0.067332789 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

11 0.067329131 2062 andrew gelman stats-2013-10-15-Last word on Mister P (for now)

12 0.066558503 164 andrew gelman stats-2010-07-26-A very short story

13 0.06632451 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

14 0.06589853 1469 andrew gelman stats-2012-08-25-Ways of knowing

15 0.064973399 1979 andrew gelman stats-2013-08-13-Convincing Evidence

16 0.064428955 579 andrew gelman stats-2011-02-18-What is this, a statistics class or a dentist’s office??

17 0.063562781 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

18 0.062697358 1942 andrew gelman stats-2013-07-17-“Stop and frisk” statistics

19 0.061324328 1652 andrew gelman stats-2013-01-03-“The Case for Inductive Theory Building”

20 0.061216112 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.083), (1, 0.005), (2, -0.008), (3, -0.034), (4, -0.002), (5, 0.023), (6, -0.022), (7, 0.006), (8, 0.016), (9, 0.008), (10, -0.026), (11, -0.011), (12, 0.011), (13, -0.022), (14, -0.015), (15, 0.02), (16, -0.011), (17, -0.019), (18, 0.003), (19, 0.007), (20, 0.014), (21, -0.009), (22, -0.03), (23, -0.001), (24, -0.012), (25, 0.011), (26, 0.047), (27, 0.027), (28, 0.009), (29, -0.024), (30, 0.025), (31, 0.001), (32, 0.012), (33, -0.018), (34, -0.012), (35, -0.021), (36, -0.006), (37, 0.009), (38, -0.043), (39, -0.022), (40, -0.001), (41, -0.004), (42, -0.003), (43, -0.016), (44, -0.015), (45, -0.001), (46, 0.006), (47, -0.017), (48, 0.044), (49, 0.007)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96496814 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?

Introduction: An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. I’m posting it here because I imagine t

2 0.68498874 1051 andrew gelman stats-2011-12-10-Towards a Theory of Trust in Networks of Humans and Computers

Introduction: Hey, this looks cool: Towards a Theory of Trust in Networks of Humans and Computers Virgil Gligor Carnegie Mellon University We argue that a general theory of trust in networks of humans and computers must be build on both a theory of behavioral trust and a theory of computational trust. This argument is motivated by increased participation of people in social networking, crowdsourcing, human computation, and socio-economic protocols, e.g., protocols modeled by trust and gift-exchange games, norms-establishing contracts, and scams/deception. User participation in these protocols relies primarily on trust, since on-line verification of protocol compliance is often impractical; e.g., verification can lead to undecidable problems, co-NP complete test procedures, and user inconvenience. Trust is captured by participant preferences (i.e., risk and betrayal aversion) and beliefs in the trustworthiness of other protocol participants. Both preferences and beliefs can be enhanced

3 0.68392909 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

Introduction: Commenter Wonks Anonymous writes : After the recent EconNobel announcement I decided to check Dimensional’s Fama-French blog to see if it had much new content recently, and while it was dissapointingly sparse it did have an interesting bit where Fama linked to the best advice he’d ever gotten , from his statistics professor Harry Roberts: With formal statistics, you say something — a hypothesis — and then you test it. Harry always said that your criterion should be not whether or not you can reject or accept the hypothesis, but what you can learn from the data. The best thing you can do is use the data to enhance your description of the world. I responded: That’s a great quote. Except that I disagree with what Fama says about “formal statistics.” Or, should I say, he has an old-fashioned view of formal statistics. (See this paper by X and me for some discussion of old-fashioned views.) Nowadays, lots of formal statistics is all about what you can learn from the data, no

4 0.67170286 1861 andrew gelman stats-2013-05-17-Where do theories come from?

Introduction: Lee Sechrest sends along this article by Brian Haig and writes that it “presents what seems to me a useful perspective on much of what scientists/statisticians do and how science works, at least in the fields in which I work.” Here’s Haig’s abstract: A broad theory of scientific method is sketched that has particular relevance for the behavioral sciences. This theory of method assembles a complex of specific strategies and methods that are used in the detection of empirical phenomena and the subsequent construction of explanatory theories. A characterization of the nature of phenomena is given, and the process of their detection is briefly described in terms of a multistage model of data analysis. The construction of explanatory theories is shown to involve their generation through abductive, or explanatory, reasoning, their development through analogical modeling, and their fuller appraisal in terms of judgments of the best of competing explanations. The nature and limits of

5 0.67147875 2326 andrew gelman stats-2014-05-08-Discussion with Steven Pinker on research that is attached to data that are so noisy as to be essentially uninformative

Introduction: I pointed Steven Pinker to my post, How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless? , and he responded: Clearly it *is* important to call out publicized research whose conclusions are likely to be false. The only danger is that it’s so easy and fun to criticize, with all the perks of intellectual and moral superiority for so little cost, that there is a moral hazard to go overboard and become a professional slasher and snarker. (That’s a common phenomenon among literary critics, especially in the UK.) There’s also the risk of altering the incentive structure for innovative research, so that researchers stick to the safest kinds of paradigm-twiddling. I think these two considerations were what my late colleague Dan Wegner had in mind when he made the bumbler-pointer contrast — he himself was certainly a discerning critic of social science research. [Just to clarify: Wegner is the person who talked about bumblers and po

6 0.64124542 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

7 0.63769931 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

8 0.63007325 2050 andrew gelman stats-2013-10-04-Discussion with Dan Kahan on political polarization, partisan information processing. And, more generally, the role of theory in empirical social science

9 0.6267004 2272 andrew gelman stats-2014-03-29-I agree with this comment

10 0.62474823 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

11 0.62417018 1808 andrew gelman stats-2013-04-17-Excel-bashing

12 0.61770755 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?

13 0.61608326 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

14 0.61462724 309 andrew gelman stats-2010-10-01-Why Development Economics Needs Theory?

15 0.61443615 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

16 0.61317992 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

17 0.61073029 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

18 0.61048424 1883 andrew gelman stats-2013-06-04-Interrogating p-values

19 0.60939509 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)

20 0.60832411 2282 andrew gelman stats-2014-04-05-Bizarre academic spam


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.101), (21, 0.055), (24, 0.14), (57, 0.216), (61, 0.017), (62, 0.019), (63, 0.052), (73, 0.016), (82, 0.03), (86, 0.019), (99, 0.196)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93497849 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion

Introduction: We have lots of models for overdispersed count data but we rarely see underdispersed data. But now I know what example I’ll be giving when this next comes up in class. From a book review by Theo Tait: A number of shark species go in for oophagy, or uterine cannibalism. Sand tiger foetuses ‘eat each other in utero, acting out the harshest form of sibling rivalry imaginable’. Only two babies emerge, one from each of the mother shark’s uteruses: the survivors have eaten everything else. ‘A female sand tiger gives birth to a baby that’s already a metre long and an experienced killer,’ explains Demian Chapman, an expert on the subject. That’s what I call underdispersion. E(y)=2, var(y)=0. Take that, M. Poisson!

same-blog 2 0.90997779 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?

Introduction: An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present. What about Cohen’s kappa? The psychologist wrote: Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not. I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on. I’m posting it here because I imagine t

3 0.86917835 891 andrew gelman stats-2011-09-05-World Bank data now online

Introduction: Wayne Folta writes that the World Bank is opening up some of its data for researchers.

4 0.85513252 1485 andrew gelman stats-2012-09-06-One reason New York isn’t as rich as it used to be: Redistribution of federal tax money to other states

Introduction: Uberbloggers Andrew Sullivan and Matthew Yglesias were kind enough to link to my five-year-old post with graphs from Red State Blue State on time trends of average income by state. Here are the graphs : Yglesias’s take-home point: There isn’t that much change over time in states’ economic well-being. All things considered the best predictor of how rich a state was in 2000 was simply how rich it was in 1929…. Massachusetts and Connecticut have always been rich and Arkansas and Mississippi have always been poor. I’d like to point to a different feature of the graphs, which is that, although the rankings of the states haven’t changed much (as can be seen from the “2000 compared to 1929″ scale), the relative values of the incomes have converged quite a bit—at least, they converged from about 1930 to 1980 before hitting some level of stability. And the rankings have changed a bit. My impression (without checking the numbers) is that New York and Connecticut were

5 0.84833014 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

Introduction: Michael McLaughlin sent me the following query with the above title. Some time ago, I [McLaughlin] was handed a dataset that needed to be modeled. It was generated as follows: 1. Random navigation errors, historically a binary mixture of normal and Laplace with a common mean, were collected by observation. 2. Sadly, these data were recorded with too few decimal places so that the resulting quantization is clearly visible in a scatterplot. 3. The quantized data were then interpolated (to an unobserved location). The final result looks like fuzzy points (small scale jitter) at quantized intervals spanning a much larger scale (the parent mixture distribution). This fuzziness, likely ~normal or ~Laplace, results from the interpolation. Otherwise, the data would look like a discrete analogue of the normal/Laplace mixture. I would like to characterize the latent normal/Laplace mixture distribution but the quantization is “getting in the way”. When I tried MCMC on this proble

6 0.84437454 1036 andrew gelman stats-2011-11-30-Stan uses Nuts!

7 0.84304416 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey

8 0.83899331 1043 andrew gelman stats-2011-12-06-Krugman disses Hayek as “being almost entirely about politics rather than economics”

9 0.83851272 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

10 0.82872665 215 andrew gelman stats-2010-08-18-DataMarket

11 0.82735634 2318 andrew gelman stats-2014-05-04-Stan (& JAGS) Tutorial on Linear Mixed Models

12 0.81757021 1120 andrew gelman stats-2012-01-15-Fun fight over the Grover search algorithm

13 0.8146081 1018 andrew gelman stats-2011-11-19-Tempering and modes

14 0.7977069 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

15 0.79485142 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

16 0.78790206 989 andrew gelman stats-2011-11-03-This post does not mention Wegman

17 0.78553259 1876 andrew gelman stats-2013-05-29-Another one of those “Psychological Science” papers (this time on biceps size and political attitudes among college students)

18 0.78388238 1146 andrew gelman stats-2012-01-30-Convenient page of data sources from the Washington Post

19 0.78083122 2015 andrew gelman stats-2013-09-10-The ethics of lying, cheating, and stealing with data: A case study

20 0.77666795 35 andrew gelman stats-2010-05-16-Another update on the spam email study