andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2185 knowledge-graph by maker-knowledge-mining

2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density


meta infos for this blog

Source: html

Introduction: I pointed Xihong Lin to this post from last month regarding Hastie and Tibshirani’s “bet on sparsity principle.” I argued that, in the worlds in which I work, in social and environmental science, every contrast is meaningful, even if not all of them can be distinguished from noise given a particular dataset. That is, I claim that effects are dense but data can be sparse—and any apparent sparsity of effects is typically just an artifact of sparsity of data. But things might be different in other fields. Xihong had an interesting perspective in the application areas where she works: Sparsity and density both appear in genetic studies too. For example, ethnicity has effects across millions of genetic variants across the genome (dense). Disease associated genetic variants are sparse.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I pointed Xihong Lin to this post from last month regarding Hastie and Tibshirani’s “bet on sparsity principle. [sent-1, score-0.808]

2 ” I argued that, in the worlds in which I work, in social and environmental science, every contrast is meaningful, even if not all of them can be distinguished from noise given a particular dataset. [sent-2, score-0.741]

3 That is, I claim that effects are dense but data can be sparse—and any apparent sparsity of effects is typically just an artifact of sparsity of data. [sent-3, score-1.957]

4 Xihong had an interesting perspective in the application areas where she works: Sparsity and density both appear in genetic studies too. [sent-5, score-0.771]

5 For example, ethnicity has effects across millions of genetic variants across the genome (dense). [sent-6, score-1.284]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('sparsity', 0.52), ('xihong', 0.347), ('genetic', 0.31), ('dense', 0.275), ('variants', 0.267), ('sparse', 0.218), ('lin', 0.158), ('genome', 0.149), ('effects', 0.147), ('artifact', 0.13), ('hastie', 0.127), ('tibshirani', 0.118), ('across', 0.112), ('distinguished', 0.112), ('worlds', 0.109), ('apparent', 0.102), ('environmental', 0.1), ('ethnicity', 0.098), ('meaningful', 0.097), ('disease', 0.095), ('bet', 0.095), ('density', 0.095), ('argued', 0.089), ('millions', 0.089), ('noise', 0.088), ('application', 0.076), ('month', 0.076), ('areas', 0.07), ('associated', 0.07), ('contrast', 0.07), ('appear', 0.066), ('pointed', 0.065), ('works', 0.064), ('regarding', 0.061), ('typically', 0.061), ('perspective', 0.06), ('claim', 0.055), ('studies', 0.054), ('every', 0.049), ('social', 0.045), ('last', 0.045), ('particular', 0.041), ('post', 0.041), ('interesting', 0.04), ('science', 0.038), ('given', 0.038), ('things', 0.035), ('different', 0.031), ('might', 0.028), ('work', 0.027)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density

Introduction: I pointed Xihong Lin to this post from last month regarding Hastie and Tibshirani’s “bet on sparsity principle.” I argued that, in the worlds in which I work, in social and environmental science, every contrast is meaningful, even if not all of them can be distinguished from noise given a particular dataset. That is, I claim that effects are dense but data can be sparse—and any apparent sparsity of effects is typically just an artifact of sparsity of data. But things might be different in other fields. Xihong had an interesting perspective in the application areas where she works: Sparsity and density both appear in genetic studies too. For example, ethnicity has effects across millions of genetic variants across the genome (dense). Disease associated genetic variants are sparse.

2 0.43940607 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?

Introduction: Rob Tibshirani writes : Hastie et al. (2001) coined the informal “Bet on Sparsity” principle. The l1 methods assume that the truth is sparse, in some basis. If the assumption holds true, then the parameters can be efficiently estimated using l1 penalties. If the assumption does not hold—so that the truth is dense—then no method will be able to recover the underlying model without a large amount of data per parameter. I’ve earlier expressed my full and sincere appreciation for Hastie and Tibshirani’s work in this area. Now I’d like to briefly comment on the above snippet. The question is, how do we think about the “bet on sparsity” principle in a world where the truth is dense? I’m thinking here of social science, where no effects are clean and no coefficient is zero (see page 960 of this article or various blog discussions in the past few years), where every contrast is meaningful—but some of these contrasts might be lost in the noise with any realistic size of data.

3 0.12644647 2317 andrew gelman stats-2014-05-04-Honored oldsters write about statistics

Introduction: The new book titled: Past, Present, and Future of Statistical Science is now available for download . The official description makes the book sound pretty stuffy: Past, Present, and Future of Statistical Science, commissioned by the Committee of Presidents of Statistical Societies (COPSS) to celebrate its 50th anniversary and the International Year of Statistics, will be published in April by Taylor & Francis/CRC Press. Through the contributions of a distinguished group of 50 statisticians, the book showcases the breadth and vibrancy of statistics, describes current challenges and new opportunities, highlights the exciting future of statistical science, and provides guidance for future statisticians. Contributors are past COPSS award honorees. But it actually has lots of good stuff, including the chapter by Tibshirani which I discussed last year (in the context of the “bet on sparsity principle”), and chapters by XL and other fun people. Also my own chapter, How do we choo

4 0.1234531 2121 andrew gelman stats-2013-12-02-Should personal genetic testing be regulated? Battle of the blogroll

Introduction: On the side of less regulation is Alex Tabarrok in “Our DNA, Our Selves”: At the same time that the NSA is secretly and illegally obtaining information about Americans the FDA is making it illegal for Americans to obtain information about themselves. In a warning letter the FDA has told Anne Wojcicki, The Most Daring CEO In America, that she “must immediately discontinue” selling 23andMe’s Personal Genome Service . . . Alex clarifies: I am not offended by all regulation of genetic tests. Indeed, genetic tests are already regulated. . . . the Clinical Laboratory Improvement Amendments (CLIA) . . . requires all labs, including the labs used by 23andMe, to be inspected for quality control, record keeping and the qualifications of their personnel. . . . What the FDA wants to do is categorically different. The FDA wants to regulate genetic tests as a high-risk medical device . . . the FDA wants to judge . . . the clinical validity, whether particular identified alleles are cau

5 0.11957597 1107 andrew gelman stats-2012-01-08-More on essentialism

Introduction: Matthieu Authier writes: I just read Genetic essentialism is in our genes . Here are a few papers from Kenneth Weiss about this missing heritability problem and genetic essentialism: Evol.Ant.2011 – Weiss – Seeing the forest through the gene-trees Genetics.2011 – Weiss.&.Buchanan – Is life-law-like

6 0.10536267 2079 andrew gelman stats-2013-10-27-Uncompressing the concept of compressed sensing

7 0.096659951 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics

8 0.088059165 1666 andrew gelman stats-2013-01-10-They’d rather be rigorous than right

9 0.086618997 1319 andrew gelman stats-2012-05-14-I hate to get all Gerd Gigerenzer on you here, but . . .

10 0.07746955 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

11 0.066453323 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

12 0.065097429 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

13 0.064389735 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

14 0.062884577 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

15 0.062503673 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

16 0.06125953 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

17 0.060586609 1665 andrew gelman stats-2013-01-10-That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

18 0.059189424 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

19 0.057069581 342 andrew gelman stats-2010-10-14-Trying to be precise about vagueness

20 0.054472528 830 andrew gelman stats-2011-07-29-Introductory overview lectures at the Joint Statistical Meetings in Miami this coming week


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.071), (1, 0.007), (2, 0.003), (3, -0.041), (4, -0.01), (5, -0.01), (6, -0.009), (7, -0.006), (8, 0.001), (9, 0.034), (10, -0.042), (11, 0.018), (12, 0.037), (13, -0.021), (14, -0.007), (15, 0.027), (16, -0.018), (17, 0.004), (18, -0.028), (19, 0.001), (20, -0.019), (21, -0.042), (22, -0.018), (23, 0.008), (24, 0.011), (25, 0.01), (26, -0.009), (27, 0.019), (28, 0.008), (29, 0.002), (30, -0.032), (31, -0.004), (32, -0.02), (33, -0.028), (34, 0.058), (35, -0.002), (36, 0.03), (37, -0.007), (38, 0.014), (39, -0.009), (40, -0.008), (41, 0.008), (42, 0.028), (43, -0.022), (44, -0.002), (45, 0.006), (46, -0.002), (47, 0.021), (48, -0.037), (49, 0.013)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9550572 2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density

Introduction: I pointed Xihong Lin to this post from last month regarding Hastie and Tibshirani’s “bet on sparsity principle.” I argued that, in the worlds in which I work, in social and environmental science, every contrast is meaningful, even if not all of them can be distinguished from noise given a particular dataset. That is, I claim that effects are dense but data can be sparse—and any apparent sparsity of effects is typically just an artifact of sparsity of data. But things might be different in other fields. Xihong had an interesting perspective in the application areas where she works: Sparsity and density both appear in genetic studies too. For example, ethnicity has effects across millions of genetic variants across the genome (dense). Disease associated genetic variants are sparse.

2 0.63530827 1949 andrew gelman stats-2013-07-21-Defensive political science responds defensively to an attack on social science

Introduction: Nicholas Christakis, a medical scientist perhaps best known for his controversial claim (see also here ), based on joint work with James Fowler, that obesity is contagious, writes : The social sciences have stagnated. They offer essentially the same set of academic departments and disciplines that they have for nearly 100 years: sociology, economics, anthropology, psychology and political science. This is not only boring but also counterproductive, constraining engagement with the scientific cutting edge and stifling the creation of new and useful knowledge. . . . I’m not suggesting that social scientists stop teaching and investigating classic topics like monopoly power, racial profiling and health inequality. But everyone knows that monopoly power is bad for markets, that people are racially biased and that illness is unequally distributed by social class. There are diminishing returns from the continuing study of many such topics. And repeatedly observing these phenomen

3 0.63358146 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

4 0.62614602 1555 andrew gelman stats-2012-10-31-Social scientists who use medical analogies to explain causal inference are, I think, implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes

Introduction: I’m sorry I don’t have any new zombie papers in time for Halloween. Instead I’d like to be a little monster by reproducing a mini-rant from this article on experimental reasoning in social science: I will restrict my discussion to social science examples. Social scientists are often tempted to illustrate their ideas with examples from medical research. When it comes to medicine, though, we are, with rare exceptions, at best ignorant laypersons (in my case, not even reaching that level), and it is my impression that by reaching for medical analogies we are implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes. Evidence-based medicine is the subject of a large literature of its own (see, for example, Lau, Ioannidis, and Schmid, 1998).

5 0.62454844 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”

Introduction: This came in the spam the other day: College Station, TX–August 16, 2010–Change and hope were central themes to the November 2008 U.S. presidential election. A new longitudinal study published in the September issue of Social Science Quarterly analyzes suicide rates at a state level from 1981-2005 and determines that presidential election outcomes directly influence suicide rates among voters. In states where the majority of voters supported the national election winner suicide rates decreased. However, counter-intuitively, suicide rates decreased even more dramatically in states where the majority of voters supported the election loser (4.6 percent lower for males and 5.3 lower for females). This article is the first in its field to focus on candidate and state-specific outcomes in relation to suicide rates. Prior research on this topic focused on whether the election process itself influenced suicide rates, and found that suicide rates fell during the election season. Ric

6 0.62381792 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

7 0.61407137 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

8 0.60453314 2165 andrew gelman stats-2014-01-09-San Fernando Valley cityscapes: An example of the benefits of fractal devastation?

9 0.59959459 756 andrew gelman stats-2011-06-10-Christakis-Fowler update

10 0.59774888 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

11 0.58912927 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

12 0.58782822 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

13 0.58482414 2336 andrew gelman stats-2014-05-16-How much can we learn about individual-level causal claims from state-level correlations?

14 0.58461392 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

15 0.58219588 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”

16 0.57985526 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

17 0.57742786 1929 andrew gelman stats-2013-07-07-Stereotype threat!

18 0.57603627 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

19 0.57459491 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

20 0.57393122 2136 andrew gelman stats-2013-12-16-Whither the “bet on sparsity principle” in a nonsparse world?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.018), (24, 0.065), (28, 0.038), (35, 0.089), (41, 0.156), (42, 0.016), (68, 0.044), (69, 0.04), (84, 0.112), (95, 0.032), (97, 0.025), (99, 0.228)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.93675673 2185 andrew gelman stats-2014-01-25-Xihong Lin on sparsity and density

Introduction: I pointed Xihong Lin to this post from last month regarding Hastie and Tibshirani’s “bet on sparsity principle.” I argued that, in the worlds in which I work, in social and environmental science, every contrast is meaningful, even if not all of them can be distinguished from noise given a particular dataset. That is, I claim that effects are dense but data can be sparse—and any apparent sparsity of effects is typically just an artifact of sparsity of data. But things might be different in other fields. Xihong had an interesting perspective in the application areas where she works: Sparsity and density both appear in genetic studies too. For example, ethnicity has effects across millions of genetic variants across the genome (dense). Disease associated genetic variants are sparse.

2 0.83605802 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

Introduction: John Cook and Joseph Delaney point to an article by Yurii Aulchenko et al., who write: 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people. . . . In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. . . . The message is that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than the method they have based on combining 54 genes. They also find that, if you start with the prediction based on parents’ heigh

3 0.83304101 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati

4 0.82034069 685 andrew gelman stats-2011-04-29-Data mining and allergies

Introduction: With all this data floating around, there are some interesting analyses one can do. I came across “The Association of Tree Pollen Concentration Peaks and Allergy Medication Sales in New York City: 2003-2008″ by Perry Sheffield . There they correlate pollen counts with anti-allergy medicine sales – and indeed find that two days after high pollen counts, the medicine sales are the highest. Of course, it would be interesting to play with the data to see *what* tree is actually causing the sales to increase the most. Perhaps this would help the arborists what trees to plant. At the moment they seem to be following a rather sexist approach to tree planting: Ogren says the city could solve the problem by planting only female trees, which don’t produce pollen like male trees do. City arborists shy away from females because many produce messy – or in the case of ginkgos, smelly – fruit that litters sidewalks. In Ogren’s opinion, that’s a mistake. He says the females only pro

5 0.81663048 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

Introduction: The Organisation for Economic Co-operation and Development reports that the following project from Krisztina Szucs and Mate Cziner has won their visualization challenge, “launched in September 2012 to solicit visualisations based on the OECD’s data-rich Education at a Glance report”: (The graph is interactive. Click on the above image and click again to see the full version.) From the press release: Entries from around the world focused on data related to the economic costs and return on investment in education . . . [The winning entry] takes a detailed look at public vs. private and men vs. women for selected countries . . . The judges were particularly impressed by the angled slope format of the visualisation, which encourages comparison between the upper-secondary and tertiary benefits of education. Szucs and Cziner were also lauded for their striking visual design, which draws users into exploring their piece [emphasis added]. I used boldface to highlight a p

6 0.80859512 1626 andrew gelman stats-2012-12-16-The lamest, grudgingest, non-retraction retraction ever

7 0.80465865 516 andrew gelman stats-2011-01-14-A new idea for a science core course based entirely on computer simulation

8 0.80047655 1013 andrew gelman stats-2011-11-16-My talk at Math for America on Saturday

9 0.79791093 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

10 0.79409349 1214 andrew gelman stats-2012-03-15-Of forecasts and graph theory and characterizing a statistical method by the information it uses

11 0.78841573 2202 andrew gelman stats-2014-02-07-Outrage of the week

12 0.78499281 1895 andrew gelman stats-2013-06-12-Peter Thiel is writing another book!

13 0.78257346 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

14 0.78233826 1816 andrew gelman stats-2013-04-21-Exponential increase in the number of stat majors

15 0.78233063 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?

16 0.7821849 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five

17 0.77373624 1300 andrew gelman stats-2012-05-05-Recently in the sister blog

18 0.77176869 1352 andrew gelman stats-2012-05-29-Question 19 of my final exam for Design and Analysis of Sample Surveys

19 0.77079546 1181 andrew gelman stats-2012-02-23-Philosophy: Pointer to Salmon

20 0.7702167 2222 andrew gelman stats-2014-02-24-On deck this week