andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1761 knowledge-graph by maker-knowledge-mining

1761 andrew gelman stats-2013-03-13-Lame Statistics Patents


meta infos for this blog

Source: html

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. [sent-1, score-0.223]

2 Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? [sent-5, score-0.223]

3 And what about the lack of scientific validation of the claims in such a method? [sent-6, score-0.134]

4 I have no idea what Andrew’s take on patents is, but my own experience in computer science is that you can patent just about anything with enough patience. [sent-8, score-0.764]

5 There’s no “scientific validation” component to the patent process in the sense that you need peer-reviewed citations (not that you should trust peer review any more than patent-office review). [sent-9, score-0.595]

6 There’s supposed to be a novelty component in that you’re only supposed to be able to patent something that’s not obvious to someone “skilled in the art. [sent-10, score-0.848]

7 ” The problem is that they don’t assume much “skill” in this judgment given the obvious things people patent. [sent-11, score-0.077]

8 One of my personal favorites is US 6192338 , an AT&T; patent involving connecting a speech recognizer to a database over the network. [sent-12, score-0.679]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('patent', 0.416), ('variable', 0.251), ('logistic', 0.225), ('patents', 0.223), ('substantially', 0.223), ('error', 0.216), ('estimated', 0.193), ('sum', 0.175), ('reduced', 0.164), ('comprising', 0.153), ('expected', 0.151), ('moment', 0.142), ('regression', 0.141), ('validation', 0.134), ('computer', 0.125), ('moments', 0.123), ('negative', 0.114), ('component', 0.114), ('modeled', 0.112), ('errors', 0.106), ('constraints', 0.106), ('positive', 0.104), ('method', 0.096), ('twice', 0.095), ('across', 0.093), ('require', 0.09), ('erased', 0.088), ('supposed', 0.086), ('stored', 0.083), ('obvious', 0.077), ('executed', 0.077), ('inversely', 0.077), ('skilled', 0.077), ('learning', 0.075), ('whereby', 0.074), ('favorites', 0.072), ('novelty', 0.069), ('connecting', 0.066), ('polynomial', 0.066), ('review', 0.065), ('scaling', 0.065), ('medium', 0.065), ('speech', 0.064), ('exhibit', 0.063), ('skill', 0.063), ('classification', 0.063), ('database', 0.061), ('performs', 0.061), ('yields', 0.061), ('constrained', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce

2 0.25810364 860 andrew gelman stats-2011-08-18-Trolls!

Introduction: Christian Robert points to this absurd patent of the Monte Carlo method (which, as Christian notes, was actually invented by Stanislaw Ulam and others in the 1940s). The whole thing is pretty unreadable. I wonder if they first wrote it as a journal article and then it got rejected everywhere, so they decided to submit it as a patent instead. What’s even worse is this bit: This invention was made with government support under Grant Numbers 0612170 and 0347408 awarded by the National Science Foundation. So our tax dollars are being given to IBM so they can try to bring statistics to a halt by patenting one of our most basic tools? I’d say this is just a waste of money, but given that our country is run by lawyers, there must be some outside chance that this patent could actually succeed? Perhaps there’s room for an improvement in the patent that involves albedo in some way?

3 0.25568634 1398 andrew gelman stats-2012-06-28-Every time you take a sample, you’ll have to pay this guy a quarter

Introduction: Roy Mendelssohn pointed me to this heartwarming story of Jay Vadiveloo, an actuary who got a patent for the idea of statistical sampling. Vadiveloo writes, “the results were astounding: statistical sampling worked.” You may laugh, but wait till Albedo Man buys the patent and makes everybody do his bidding. They’re gonna dig up Laplace and make him pay retroactive royalties. And somehow Clippy will get involved in all this. P.S. Mendelssohn writes: “Yes, I felt it was a heartwarming story also. Perhaps we can get a patent for regression.” I say, forget a patent for regression. I want a patent for the sample mean. That’s where the real money is. You can’t charge a lot for each use, but consider the volume!

4 0.25384742 625 andrew gelman stats-2011-03-23-My last post on albedo, I promise

Introduction: After seeing my recent blogs on Nathan Myhrvold, a friend told me that, in the tech world, the albedo-obsessed genius is known as a patent troll. Really? Yup. My friend writes: It’s perhaps indicative that Myhrvold comes up in the top-ten hits on Google for [patent troll]. These blog posts lay it out pretty clearly: http://www.techdirt.com/articles/20100217/1853298215.shtml http://blogs.seattleweekly.com/dailyweekly/2010/12/giant_patent_troll_awakens_as.php http://bits.blogs.nytimes.com/2010/12/08/intellectual-ventures-goes-to-court Just about anyone’s that’s been in the tech game thinks patents are ridiculous. The lab where I used to work wanted us to create an “intellectual mine field” in our field so the companycould block anyone from entering the space. Yes, we made stuff, but the patents were for totally obvious ideas that anyone would have. Even Google’s PageRank was just a simple application of standard social network analysis models of authorities in netw

5 0.23956102 1885 andrew gelman stats-2013-06-06-Leahy Versus Albedoman and the Moneygoround, Part One

Introduction: Edward Wyatt reports : Now the Obama administration is cracking down on what many call patent trolls , shell companies that exist merely for the purpose of asserting that they should be paid . . . “The United States patent system is vital for our economic growth, job creation, and technological advance,” [Senator] Leahy said in a statement. “Unfortunately, misuse of low-quality patents through patent trolling has tarnished the system’s image.” There is some opposition: But some big software companies, including Microsoft, expressed dismay at some of the proposals, saying they could themselves stifle innovation. Microsoft . . . patent trolls . . . hmmm, where have we heard this connection before ? There is also some support for the bill: “These guys are terrorists,” said John Boswell, chief legal officer for SAS, a business software and services company, said at a panel discussion on Tuesday. SAS was cited in the White House report as an example of a company that has

6 0.17812198 1226 andrew gelman stats-2012-03-22-Story time meets the all-else-equal fallacy and the fallacy of measurement

7 0.12981458 892 andrew gelman stats-2011-09-06-Info on patent trolls

8 0.12262027 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles

9 0.12242164 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

10 0.11943498 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

11 0.11513145 247 andrew gelman stats-2010-09-01-How does Bayes do it?

12 0.11203314 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

13 0.10784835 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum

14 0.1056861 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

15 0.10297011 852 andrew gelman stats-2011-08-13-Checking your model using fake data

16 0.10206026 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

17 0.10117988 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

18 0.099912956 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

19 0.09734083 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model

20 0.096087664 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.142), (1, 0.059), (2, 0.039), (3, -0.059), (4, 0.049), (5, 0.037), (6, 0.017), (7, -0.043), (8, 0.003), (9, 0.006), (10, 0.008), (11, -0.008), (12, 0.004), (13, -0.009), (14, -0.042), (15, 0.011), (16, -0.003), (17, -0.005), (18, 0.005), (19, -0.031), (20, 0.001), (21, 0.047), (22, 0.052), (23, 0.004), (24, 0.006), (25, 0.03), (26, 0.044), (27, -0.016), (28, 0.0), (29, -0.058), (30, 0.064), (31, 0.093), (32, 0.013), (33, 0.013), (34, -0.023), (35, -0.078), (36, -0.035), (37, -0.007), (38, -0.055), (39, -0.045), (40, -0.025), (41, -0.051), (42, -0.066), (43, 0.069), (44, 0.035), (45, 0.119), (46, -0.018), (47, 0.076), (48, -0.043), (49, 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98265254 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce

2 0.66801375 1377 andrew gelman stats-2012-06-13-A question about AIC

Introduction: Jacob Oaknin asks: Akaike ‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying the derivations for the Akaike and Corrected Akaike information criteria”, by J.E.Cavanaugh, Statistics and Probability Letters, vol. 33, 1997, pp. 201-208). On the other hand, the family S_m is known to have finite VC-dimension (VC = m+1), and this fact should grant that empirical risk minimizer is asymtotically consistent regardless of the underlying probability distribution, and in particular for the assumed gaussian distribution of noise(“An overview of statistical learning theory”, by V.N.Vapnik, IEEE Transactions On Neural Networks, vol. 10, No. 5, 1999, pp. 988-999) What am I missing? My reply: I’m no expert on AIC so

3 0.65312546 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.

Introduction: John Mount provides some useful background and follow-up on our discussion from last year on computational instability of the usual logistic regression solver. Just to refresh your memory, here’s a simple logistic regression with only a constant term and no separation, nothing pathological at all: > y <- rep (c(1,0),c(10,5)) > display (glm (y ~ 1, family=binomial(link="logit"))) glm(formula = y ~ 1, family = binomial(link = "logit")) coef.est coef.se (Intercept) 0.69 0.55 --- n = 15, k = 1 residual deviance = 19.1, null deviance = 19.1 (difference = 0.0) And here’s what happens when we give it the not-outrageous starting value of -2: > display (glm (y ~ 1, family=binomial(link="logit"), start=-2)) glm(formula = y ~ 1, family = binomial(link = "logit"), start = -2) coef.est coef.se (Intercept) 71.97 17327434.18 --- n = 15, k = 1 residual deviance = 360.4, null deviance = 19.1 (difference = -341.3) Warning message:

4 0.6251328 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering

Introduction: Bill Harris wrote in with a question: David Hogg points out in one of his general articles on data modeling that regression assumptions require one to put the variable with the highest variance in the ‘y’ position and the variable you know best (lowest variance) in the ‘x’ position. As he points out, others speak of independent and dependent variables, as if causality determined the form of a regression formula. In a quick scan of ARM and BDA, I don’t see clear advice, but I do see the use of ‘independent’ and ‘dependent.’ I recently did a model over data in which we know the ‘effect’ pretty well (we measure it), while we know the ’cause’ less well (it’s estimated by people who only need to get it approximately correct). A model of the form ’cause ~ effect’ fit visually much better than one of the form ‘effect ~ cause’, but interpreting it seems challenging. For a simplistic example, let the effect be energy use in a building for cooling (E), and let the cause be outdoor ai

5 0.62293917 14 andrew gelman stats-2010-05-01-Imputing count data

Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.

6 0.61567652 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

7 0.60728461 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

8 0.60664922 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

9 0.57928628 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

10 0.57857132 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

11 0.57820839 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

12 0.57776999 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

13 0.57327783 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

14 0.57031399 2311 andrew gelman stats-2014-04-29-Bayesian Uncertainty Quantification for Differential Equations!

15 0.56732059 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

16 0.56629121 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

17 0.56354916 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

18 0.5601787 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

19 0.56003243 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

20 0.55842119 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.043), (15, 0.036), (16, 0.087), (21, 0.013), (24, 0.078), (27, 0.011), (41, 0.017), (47, 0.013), (53, 0.011), (64, 0.051), (86, 0.036), (89, 0.013), (95, 0.041), (99, 0.413)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99051195 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce

2 0.98311788 977 andrew gelman stats-2011-10-27-Hack pollster Doug Schoen illustrates a general point: The #1 way to lie with statistics is . . . to just lie!

Introduction: Everybody knows how you can lie with statistics by manipulating numbers, making inappropriate comparisons, misleading graphs, etc. But, as I like to remind students, the simplest way to lie with statistics is to just lie! You see this all the time, advocates who make up numbers or present numbers with such little justification that they might as well be made up (as in this purported survey of the “super-rich”). Here I’m not talking about the innumeracy of a Samantha Power or a David Runciman, or Michael Barone-style confusion or Gregg Easterbrook-style cluelessness or even Tucker Carlson-style asininity . No, I’m talking about flat-out lying by a professional who has the numbers and deliberately chooses to misrepresent them. The culprit is pollster Doug Schoen, and the catch was made by Jay Livingston. Schoen wrote the following based on a survey he took of Occupy Wall Street participants: On Oct. 10 and 11, Arielle Alter Confino, a senior researcher at my polli

3 0.9823463 361 andrew gelman stats-2010-10-21-Tenure-track statistics job at Teachers College, here at Columbia!

Introduction: See below for the job announcement. It’s for Teachers College, which is about 2 blocks from the statistics department and 2 blocks from the political science department. So even though I don’t have any official connection with Teachers College (besides occasionally working with them on research projects), I very much would like to have another exciting young applied researcher here, to complement all the people we currently have in stat, poli sci, engineering, etc. In particular, we have zillions of interesting and important social science research projects going on here, and they all need statistics work. A lot of social scientists do statistics, but it’s not so easy to find a statistician who does serious social science research. All this is to say that I hope this job gets some applicants from some people who are serious about applied statistics and the development of new models and methods. Teachers College, Columbia University Department of Human Development APPLIE

4 0.98109627 11 andrew gelman stats-2010-04-29-Auto-Gladwell, or Can fractals be used to predict human history?

Introduction: I just reviewed the book Bursts, by Albert-László Barabási, for Physics Today. But I had a lot more to say that couldn’t fit into the magazine’s 800-word limit. Here I’ll reproduce what I sent to Physics Today, followed by my additional thoughts. The back cover of Bursts book promises “a revolutionary new theory showing how we can predict human behavior.” I wasn’t fully convinced on that score, but the book does offer a well-written and thought-provoking window into author Albert-László Barabási’s research in power laws and network theory. Power laws–the mathematical pattern that little things are common and large things are rare–have been observed in many different domains, including incomes (as noted by economist Vilfredo Pareto in the nineteenth century), word frequencies (as noted by linguist George Zipf), city sizes, earthquakes, and virtually anything else that can be measured. In the mid-twentieth century, the mathematician Benoit Mandelbrot devoted an influential caree

5 0.98066211 571 andrew gelman stats-2011-02-13-A departmental wiki page?

Introduction: I was recently struggling with the Columbia University philophy department’s webpage (to see who might be interested in this stuff ). The faculty webpage was horrible: it’s just a list of names and links with no information on research interests. So I did some searching on the web and found a wonderful wikipedia page which had exactly what I wanted. Then I checked my own department’s page , and it’s even worse than what they have in philosophy! (We also have this page, which is even worse in that it omits many of our faculty and has a bunch of ridiculously technical links for some of the faculty who are included.) I don’t know about the philosophy department, but the statistics department’s webpage is an overengineered mess, designed from the outset to look pretty rather than to be easily updated. Maybe we could replace it entirely with a wiki? In the meantime, if anybody feels like setting up a wikipedia entry for the research of Columbia’s statistics faculty, that

6 0.97989237 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.

7 0.97984564 793 andrew gelman stats-2011-07-09-R on the cloud

8 0.97948968 1336 andrew gelman stats-2012-05-22-Battle of the Repo Man quotes: Reid Hastie’s turn

9 0.97909147 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

10 0.97901261 222 andrew gelman stats-2010-08-21-Estimating and reporting teacher effectivenss: Newspaper researchers do things that academic researchers never could

11 0.97854346 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves

12 0.97847712 2255 andrew gelman stats-2014-03-19-How Americans vote

13 0.97803718 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades

14 0.97803348 2337 andrew gelman stats-2014-05-18-Never back down: The culture of poverty and the culture of journalism

15 0.9774726 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

16 0.9773792 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

17 0.97662503 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

18 0.97644615 1917 andrew gelman stats-2013-06-28-Econ coauthorship update

19 0.97631031 578 andrew gelman stats-2011-02-17-Credentialism, elite employment, and career aspirations

20 0.97604239 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”