andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1615 knowledge-graph by maker-knowledge-mining

1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure


meta infos for this blog

Source: html

Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. [sent-1, score-0.446]

2 Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. [sent-3, score-1.29]

3 The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. [sent-4, score-0.48]

4 Now suppose you look at a few hundred people and all their connections. [sent-5, score-0.091]

5 This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. [sent-6, score-0.665]

6 For novelistic reasons he can only have a handful of major characters and a few dozen or so minor characters. [sent-8, score-0.447]

7 If he gives them a realistic level of interconnections, the resulting network will not be a reasonable small-scale replica of society at large—it will be too sparsely connected. [sent-9, score-0.889]

8 To make his story-network realistic in a larger sense, he has to overload his characters with connections (“coincidences”) beyond what would actually arise in a group this small. [sent-10, score-1.075]

9 Thus, it’s not fair to slam Wolfe for having too many connections or coincidences in his books—these are a necessary artifice that allows him to achieve a realistic density of connections in a small group of characters. [sent-11, score-1.796]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('connections', 0.309), ('network', 0.305), ('coincidences', 0.294), ('realistic', 0.258), ('wolfe', 0.22), ('small', 0.199), ('characters', 0.175), ('tom', 0.156), ('overload', 0.134), ('sparser', 0.134), ('compressing', 0.126), ('replica', 0.126), ('law', 0.126), ('necessarily', 0.125), ('sparsely', 0.121), ('handful', 0.105), ('larger', 0.102), ('thin', 0.1), ('defines', 0.098), ('tall', 0.098), ('pet', 0.098), ('tversky', 0.097), ('removing', 0.097), ('group', 0.097), ('kahneman', 0.096), ('dozen', 0.093), ('numbers', 0.093), ('hundred', 0.091), ('achieve', 0.09), ('strictly', 0.09), ('sample', 0.09), ('slam', 0.088), ('depending', 0.085), ('properties', 0.085), ('brought', 0.083), ('applies', 0.083), ('consider', 0.082), ('novel', 0.082), ('replicate', 0.081), ('density', 0.08), ('resulting', 0.079), ('social', 0.077), ('psychologists', 0.076), ('impossible', 0.076), ('concept', 0.075), ('minor', 0.074), ('young', 0.073), ('allows', 0.072), ('city', 0.071), ('unique', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how

2 0.13940276 963 andrew gelman stats-2011-10-18-Question on Type M errors

Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain

3 0.12889259 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

Introduction: Antti Rasinen writes: I’m a former undergrad machine learning student and a current software engineer with a Bayesian hobby. Today my two worlds collided. I ask for some enlightenment. On your blog you’ve repeatedly advocated continuous distributions with Bayesian models. Today I read this article by Ricky Ho, who writes: The strength of Bayesian network is it is highly scalable and can learn incrementally because all we do is to count the observed variables and update the probability distribution table. Similar to Neural Network, Bayesian network expects all data to be binary, categorical variable will need to be transformed into multiple binary variable as described above. Numeric variable is generally not a good fit for Bayesian network. The last sentence seems to be at odds with what you’ve said. Sadly, I don’t have enough expertise to say which view of the world is correct. During my undergrad years our team wrote an implementation of the Junction Tree algorithm. We r

4 0.12825772 1191 andrew gelman stats-2012-03-01-Hoe noem je?

Introduction: Gerrit Storms reports on an interesting linguistic research project in which you can participate! Here’s the description: Over the past few weeks, we have been trying to set up a scientific study that is important for many researchers interested in words, word meaning, semantics, and cognitive science in general. It is a huge word association project, in which people are asked to participate in a small task that doesn’t last longer than 5 minutes. Our goal is to build a global word association network that contains connections between about 40,000 words, the size of the lexicon of an average adult. Setting up such a network might learn us a lot about semantic memory, how it develops, and maybe also about how it can deteriorate (like in Alzheimer’s disease). Most people enjoy doing the task, but we need thousands of participants to succeed. Up till today, we found about 53,000 participants willing to do the little task, but we need more subjects. That is why we address you. Would

5 0.10509109 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.

Introduction: AT discusses [link broken; see P.P.S. below] a new paper of his that casts doubt on the robustness of the controversial Christakis and Fowler papers. AT writes that he ran some simulations of contagion on social networks and found that (a) in a simple model assuming the contagion of the sort hypothesized by Christakis and Fowler, their procedure would indeed give the sorts of estimates they found in their papers, but (b) in another simple model assuming a different sort of contagion, the C&F; estimation would give indistinguishable estimates. Thus, if you believe AT’s simulation model, C&F;’s procedure cannot statistically distinguish between two sorts of contagion (directional and simultaneous). I have not looked at AT’s paper so I can’t fully comment, but I don’t fully understand his method for simulating network connections. AT uses what he calls a “rewiring” model. This makes sense: as time progresses, we make new friends and lose old ones—but I am confused by the details

6 0.095859066 756 andrew gelman stats-2011-06-10-Christakis-Fowler update

7 0.081348754 2251 andrew gelman stats-2014-03-17-In the best alternative histories, the real world is what’s ultimately real

8 0.080970816 137 andrew gelman stats-2010-07-10-Cost of communicating numbers

9 0.080200545 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

10 0.079350837 1381 andrew gelman stats-2012-06-16-The Art of Fielding

11 0.079220787 1904 andrew gelman stats-2013-06-18-Job opening! Come work with us!

12 0.071711421 2255 andrew gelman stats-2014-03-19-How Americans vote

13 0.071350157 148 andrew gelman stats-2010-07-15-“Gender Bias Still Exists in Modern Children’s Literature, Say Centre Researchers”

14 0.070612088 925 andrew gelman stats-2011-09-26-Ethnicity and Population Structure in Personal Naming Networks

15 0.070190348 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities

16 0.067949817 1785 andrew gelman stats-2013-04-02-So much artistic talent

17 0.066620275 695 andrew gelman stats-2011-05-04-Statistics ethics question

18 0.065168694 2258 andrew gelman stats-2014-03-21-Random matrices in the news

19 0.065002806 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

20 0.064515524 1945 andrew gelman stats-2013-07-18-“How big is your chance of dying in an ordinary play?”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.123), (1, -0.025), (2, 0.029), (3, -0.024), (4, 0.003), (5, 0.012), (6, -0.006), (7, 0.017), (8, -0.013), (9, 0.008), (10, -0.026), (11, -0.035), (12, 0.018), (13, 0.02), (14, 0.001), (15, 0.003), (16, -0.02), (17, 0.002), (18, 0.07), (19, -0.009), (20, -0.026), (21, -0.025), (22, -0.014), (23, 0.018), (24, -0.014), (25, 0.002), (26, -0.002), (27, 0.029), (28, -0.005), (29, 0.005), (30, 0.002), (31, 0.006), (32, -0.029), (33, -0.009), (34, 0.052), (35, 0.024), (36, -0.016), (37, 0.002), (38, 0.01), (39, -0.022), (40, 0.032), (41, -0.017), (42, 0.016), (43, 0.018), (44, -0.029), (45, 0.01), (46, 0.016), (47, 0.009), (48, 0.0), (49, 0.002)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97136509 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how

2 0.7485376 137 andrew gelman stats-2010-07-10-Cost of communicating numbers

Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi

3 0.70645612 719 andrew gelman stats-2011-05-19-Everything is Obvious (once you know the answer)

Introduction: Duncan Watts gave his new book the above title, reflecting his irritation with those annoying people who, upon hearing of the latest social science research, reply with: Duh-I-knew-that. (I don’t know how to say Duh in Australian; maybe someone can translate that for me?) I, like Duncan, am easily irritated, and I looked forward to reading the book. I enjoyed it a lot, even though it has only one graph, and that graph has a problem with its y-axis. (OK, the book also has two diagrams and a graph of fake data, but that doesn’t count.) Before going on, let me say that I agree wholeheartedly with Duncan’s central point: social science research findings are often surprising, but the best results cause us to rethink our world in such a way that they seem completely obvious, in retrospect. (Don Rubin used to tell us that there’s no such thing as a “paradox”: once you fully understand a phenomenon, it should not seem paradoxical any more. When learning science, we sometimes speak

4 0.70493931 2184 andrew gelman stats-2014-01-24-Parables vs. stories

Introduction: God is in every leaf of every tree , but he is not in every leaf of every parable. Let me explain with a story. A few months ago I read the new book, Doing Data Science, by Rachel Schutt and Cathy O’Neal, and I came across the following motivation for comprehensive integration of data sources, a story that is reminiscent of the parables we sometimes see in business books: By some estimates, one or two patients died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic. In other words, if the records had been easier to match, they’d have been able to save more lives. On the other hand, if it had been easy to match records, other breaches of confidence might also have occurred. Of course it’s hard to know exactly how many lives are at stake, but it’s nontrivial. The moral: We can assume we think privacy is a generally good thing. . . . But privacy takes lives too, as we see from

5 0.70350277 1949 andrew gelman stats-2013-07-21-Defensive political science responds defensively to an attack on social science

Introduction: Nicholas Christakis, a medical scientist perhaps best known for his controversial claim (see also here ), based on joint work with James Fowler, that obesity is contagious, writes : The social sciences have stagnated. They offer essentially the same set of academic departments and disciplines that they have for nearly 100 years: sociology, economics, anthropology, psychology and political science. This is not only boring but also counterproductive, constraining engagement with the scientific cutting edge and stifling the creation of new and useful knowledge. . . . I’m not suggesting that social scientists stop teaching and investigating classic topics like monopoly power, racial profiling and health inequality. But everyone knows that monopoly power is bad for markets, that people are racially biased and that illness is unequally distributed by social class. There are diminishing returns from the continuing study of many such topics. And repeatedly observing these phenomen

6 0.68665075 1715 andrew gelman stats-2013-02-09-Thomas Hobbes would be spinning in his grave

7 0.68244076 1947 andrew gelman stats-2013-07-20-We are what we are studying

8 0.68147737 1123 andrew gelman stats-2012-01-17-Big corporations are more popular than you might realize

9 0.680848 116 andrew gelman stats-2010-06-29-How to grab power in a democracy – in 5 easy non-violent steps

10 0.67997134 1892 andrew gelman stats-2013-06-10-I don’t think we get much out of framing politics as the Tragic Vision vs. the Utopian Vision

11 0.67780668 157 andrew gelman stats-2010-07-21-Roller coasters, charity, profit, hmmm

12 0.677293 1453 andrew gelman stats-2012-08-10-Quotes from me!

13 0.67712462 2199 andrew gelman stats-2014-02-04-Widening the goalposts in medical trials

14 0.672234 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

15 0.67185861 70 andrew gelman stats-2010-06-07-Mister P goes on a date

16 0.66366607 1191 andrew gelman stats-2012-03-01-Hoe noem je?

17 0.65655124 1952 andrew gelman stats-2013-07-23-Christakis response to my comment on his comments on social science (or just skip to the P.P.P.S. at the end)

18 0.65646952 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

19 0.65412217 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

20 0.65296656 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(1, 0.014), (13, 0.025), (16, 0.06), (21, 0.28), (24, 0.18), (27, 0.012), (54, 0.019), (79, 0.012), (81, 0.012), (86, 0.021), (98, 0.022), (99, 0.235)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97453797 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

same-blog 2 0.95385802 1615 andrew gelman stats-2012-12-10-A defense of Tom Wolfe based on the impossibility of the law of small numbers in network structure

Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how

3 0.95346057 2298 andrew gelman stats-2014-04-21-On deck this week

Introduction: Mon : Ticket to Baaaath Tues : Ticket to Baaaaarf Wed : Thinking of doing a list experiment? Here’s a list of reasons why you should think again Thurs : An open site for researchers to post and share papers Fri : Questions about “Too Good to Be True” Sat : Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu Sun : White stripes and dead armadillos

4 0.94906342 151 andrew gelman stats-2010-07-16-Wanted: Probability distributions for rank orderings

Introduction: Dietrich Stoyan writes: I asked the IMS people for an expert in statistics of voting/elections and they wrote me your name. I am a statistician, but never worked in the field voting/elections. It was my son-in-law who asked me for statistical theories in that field. He posed in particular the following problem: The aim of the voting is to come to a ranking of c candidates. Every vote is a permutation of these c candidates. The problem is to have probability distributions in the set of all permutations of c elements. Are there theories for such distributions? I should be very grateful for a fast answer with hints to literature. (I confess that I do not know your books.) My reply: Rather than trying to model the ranks directly, I’d recommend modeling a latent continuous outcome which then implies a distribution on ranks, if the ranks are of interest. There are lots of distributions of c-dimensional continuous outcomes. In political science, the usual way to start is

5 0.93931174 1401 andrew gelman stats-2012-06-30-David Hogg on statistics

Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation

6 0.93656546 432 andrew gelman stats-2010-11-27-Neumann update

7 0.93261516 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

8 0.92061979 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”

9 0.91910017 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

10 0.91715991 894 andrew gelman stats-2011-09-07-Hipmunk FAIL: Graphics without content is not enough

11 0.91650522 1232 andrew gelman stats-2012-03-27-Banned in NYC school tests

12 0.91060233 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?

13 0.90910614 62 andrew gelman stats-2010-06-01-Two Postdoc Positions Available on Bayesian Hierarchical Modeling

14 0.89988089 2306 andrew gelman stats-2014-04-26-Sleazy sock puppet can’t stop spamming our discussion of compressed sensing and promoting the work of Xiteng Liu

15 0.89263165 1857 andrew gelman stats-2013-05-15-Does quantum uncertainty have a place in everyday applied statistics?

16 0.88960791 854 andrew gelman stats-2011-08-15-A silly paper that tries to make fun of multilevel models

17 0.88808346 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

18 0.88502371 1728 andrew gelman stats-2013-02-19-The grasshopper wins, and Greg Mankiw’s grandmother would be “shocked and appalled” all over again

19 0.87450206 659 andrew gelman stats-2011-04-13-Jim Campbell argues that Larry Bartels’s “Unequal Democracy” findings are not robust

20 0.86317265 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)