andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1596 knowledge-graph by maker-knowledge-mining

1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics


meta infos for this blog

Source: html

Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. [sent-1, score-0.971]

2 Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. [sent-2, score-0.346]

3 Usually on the phone to people who called us out of the blue. [sent-4, score-0.069]

4 Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. [sent-5, score-0.268]

5 Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. [sent-6, score-0.93]

6 We even provided “free” consulting with our startup license package. [sent-7, score-0.827]

7 We were brutally honest with customers, both about our goals and their goals. [sent-8, score-0.296]

8 Their goals were often incompatible with ours (use company X’s software to do Y — we didn’t take that kind of job, but would send work to other people we trusted). [sent-9, score-0.783]

9 More often, their goals were unrealistic, even if we had the big-brain count and computer power of Google, much less for a two-person company. [sent-10, score-0.295]

10 Sometimes we had a hunch about how we could do what they were asking, but weren’t certain enough to just sell it. [sent-11, score-0.229]

11 We found honesty up front often led to the company funding us to do some research or proof-of-concept studies (the advantage of supplying something with very little competition and desperate customers). [sent-12, score-0.729]

12 When we signed contracts with people and did consulting, it was a combination of technical and strategic consulting. [sent-13, score-0.272]

13 Not that we provided business strategy consulting, but we had to work with customers from their vaguely specified goals and needs (and often over-specific preconceptions about how they wanted to do it) toward a feasible project that could actually help with their business needs. [sent-14, score-1.325]

14 In my experience, the downside to hiring academics to do consulting is that they tend to be fixated on 2nd or even 3rd order details of problems that are fairly simple, while ignoring the grungy details needed to make something work in the field. [sent-15, score-1.116]

15 In the end, we built lots of cool stuff with lots of different customers and even got some of them to fund some open research and software development. [sent-16, score-0.862]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('consulting', 0.453), ('customers', 0.347), ('goals', 0.226), ('company', 0.172), ('software', 0.165), ('processing', 0.144), ('sell', 0.135), ('bob', 0.125), ('often', 0.121), ('provided', 0.115), ('grungy', 0.114), ('handed', 0.114), ('language', 0.108), ('tutorials', 0.108), ('google', 0.103), ('startup', 0.103), ('licenses', 0.103), ('honesty', 0.103), ('desperate', 0.099), ('contracts', 0.099), ('incompatible', 0.099), ('business', 0.096), ('feasible', 0.096), ('supplying', 0.096), ('natural', 0.094), ('unrealistic', 0.094), ('hunch', 0.094), ('signed', 0.092), ('details', 0.089), ('ranking', 0.087), ('license', 0.087), ('downside', 0.085), ('trusted', 0.084), ('free', 0.083), ('strategic', 0.081), ('vaguely', 0.081), ('fund', 0.08), ('hiring', 0.076), ('help', 0.074), ('specified', 0.073), ('marketing', 0.073), ('ignoring', 0.072), ('funding', 0.07), ('honest', 0.07), ('phone', 0.069), ('even', 0.069), ('academics', 0.069), ('competition', 0.068), ('lots', 0.067), ('built', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics

Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’

2 0.25376984 1597 andrew gelman stats-2012-11-29-What is expected of a consultant

Introduction: Robin Hanson writes on paid expert consulting (of the sort that I do sometime, and is common among economists and statisticians). Hanson agrees with Keith Yost, who says: Fellow consultants and associates . . . [said] fifty percent of the job is nodding your head at whatever’s being said, thirty percent of it is just sort of looking good, and the other twenty percent is raising an objection but then if you meet resistance, then dropping it. On the other side is Steven Levitt, who Hanson quotes as saying: My own experience has been that even though I know nothing about an industry, if you give me a week, and you get a bunch of really smart people to explain the industry to me, and to tell me what they do, a lot of times what I’ve learned in economics, what I’ve learned in other places can actually be really helpful in changing the way that they see the world. Perhaps unsurprisingly given my Bayesian attitudes and my preference for continuity , I’m inclined to split the d

3 0.19612552 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

Introduction: Steve Cohen writes: My [Cohen's] firm is looking for strong candidates to help us in developing software and analyzing data using Bayesian methods. We have been developing a suite of programs in C++ which allow us to do Bayesian hierarchical regression and logit/probit models on marketing data. These efforts have included the use of high performance computing tools like nVidia’s CUDA and the new OpenCL standard, which allow parallel processing of Bayesian models. Our software is very, very fast – even on databases that are ½ terabyte in size. The software still needs many additions and improvements and a person with the right skill set will have the chance to make a significant contribution. Here’s the job description he sent: Bayesian statistician and C++ programmer The company In4mation Insights is a marketing research, analytics, and consulting firm which operates on the leading-edge of our industry. Our clients are Fortune 500 companies and major management consul

4 0.18637763 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?

Introduction: I’m a physicist by training, statistical data analyst by trade. Although some of my work is pretty standard statistical analysis, more often I work somewhere in a gray area that includes physics, engineering, and statistics. I have very little formal statistics training but I do study in an academic-like way to learn techniques from the literature when I need to. I do some things well but there are big gaps in my stats knowledge compared to anyone who has gone to grad school in statistics. On the other hand, there are big gaps in most statisticians’ physics and engineering knowledge compared to anyone who has gone to grad school in physics. Generally my breadth and depth of knowledge is about right for the kind of work that I do, I think. But last week I was offered a consulting job that might be better done by someone with more conventional stats knowledge than I have. The job involves gene expression in different types of tumors, so it’s “biostatistics” by definition, but the

5 0.12644354 1618 andrew gelman stats-2012-12-11-The consulting biz

Introduction: I received the following (unsolicited) email: Hello, *** LLC, a ***-based market research company, has a financial client who is interested in speaking with a statistician who has done research in the field of Alzheimer’s Disease and preferably familiar with the SOLA and BAPI trials. We offer an honorarium of $200 for a 30 minute telephone interview. Please advise us if you have an employment or consulting agreement with any organization or operate professionally pursuant to an organization’s code of conduct or employee manual that may control activities by you outside of your regular present and former employment, such as participating in this consulting project for MedPanel. If there are such contracts or other documents that do apply to you, please forward MedPanel a copy of each such document asap as we are obligated to review such documents to determine if you are permitted to participate as a consultant for MedPanel on a project with this particular client. If you are

6 0.12204055 1850 andrew gelman stats-2013-05-10-The recursion of pop-econ

7 0.1183378 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

8 0.11120936 835 andrew gelman stats-2011-08-02-“The sky is the limit” isn’t such a good thing

9 0.098047093 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

10 0.09574829 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

11 0.094435811 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

12 0.093726821 793 andrew gelman stats-2011-07-09-R on the cloud

13 0.093426012 1808 andrew gelman stats-2013-04-17-Excel-bashing

14 0.09318915 1519 andrew gelman stats-2012-10-02-Job!

15 0.092930846 1003 andrew gelman stats-2011-11-11-$

16 0.089440577 676 andrew gelman stats-2011-04-23-The payoff: $650. The odds: 1 in 500,000.

17 0.088111445 1619 andrew gelman stats-2012-12-11-There are four ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, and (4) keeping a gambling addict away from the casino

18 0.086650901 1533 andrew gelman stats-2012-10-14-If x is correlated with y, then y is correlated with x

19 0.082933567 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

20 0.082807265 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.157), (1, -0.059), (2, -0.048), (3, 0.024), (4, 0.033), (5, 0.032), (6, 0.0), (7, -0.024), (8, -0.019), (9, 0.016), (10, -0.049), (11, -0.063), (12, 0.032), (13, -0.014), (14, -0.06), (15, 0.025), (16, 0.02), (17, -0.031), (18, 0.013), (19, 0.023), (20, 0.025), (21, -0.004), (22, 0.007), (23, 0.031), (24, -0.041), (25, -0.001), (26, 0.007), (27, 0.034), (28, -0.022), (29, 0.022), (30, 0.007), (31, -0.034), (32, 0.073), (33, -0.006), (34, 0.001), (35, 0.022), (36, 0.015), (37, 0.045), (38, -0.042), (39, 0.02), (40, 0.034), (41, -0.021), (42, -0.028), (43, 0.09), (44, 0.004), (45, 0.005), (46, -0.035), (47, 0.008), (48, -0.017), (49, -0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95543122 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics

Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’

2 0.79668796 1279 andrew gelman stats-2012-04-24-ESPN is looking to hire a research analyst

Introduction: This is somebody’s dream job, I’m sure . . . ESPN is looking for a statistician to join the HR department as a Research Analyst . The job will consist of analytical research and producing statistics about the people that work at ESPN. Topics of interest will include productivity, efficiency, and retention of employees, among other items. In addition to data mining and producing reports, we also field surveys and analyze results. The position is located at the headquarters in Bristol, Connecticut, the same campus where nearly all ESPN shows are produced. ESPN is a Disney company, so discounts and free admission to Disney parks are available for employees. Flexible work arrangements are available, along with working in the New York City office part-time if desired. The role is a relatively new function and will have a high impact very quickly on helping the business function. Statistical software, text books, and any other resource needed to get the job done will be provided. T

3 0.79310799 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?

Introduction: I’m a physicist by training, statistical data analyst by trade. Although some of my work is pretty standard statistical analysis, more often I work somewhere in a gray area that includes physics, engineering, and statistics. I have very little formal statistics training but I do study in an academic-like way to learn techniques from the literature when I need to. I do some things well but there are big gaps in my stats knowledge compared to anyone who has gone to grad school in statistics. On the other hand, there are big gaps in most statisticians’ physics and engineering knowledge compared to anyone who has gone to grad school in physics. Generally my breadth and depth of knowledge is about right for the kind of work that I do, I think. But last week I was offered a consulting job that might be better done by someone with more conventional stats knowledge than I have. The job involves gene expression in different types of tumors, so it’s “biostatistics” by definition, but the

4 0.76189756 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress

Introduction: Cord Blomquist, who did a great job moving us from horrible Movable Type to nice nice WordPress, writes: I [Cord] wanted to share a little news with you related to the original work we did for you last year. When ReadyMadeWeb converted your Movable Type blog to WordPress, we got a lot of other requestes for the same service, so we started thinking about a bigger market for such a product. After a bit of research, we started work on automating the data conversion, writing rules, and exceptions to the rules, on how Movable Type and TypePad data could be translated to WordPress. After many months of work, we’re getting ready to announce TP2WP.com , a service that converts Movable Type and TypePad export files to WordPress import files, so anyone who wants to migrate to WordPress can do so easily and without losing permalinks, comments, images, or other files. By automating our service, we’ve been able to drop the price to just $99. I recommend it (and, no, Cord is not paying m

5 0.7411558 1619 andrew gelman stats-2012-12-11-There are four ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, and (4) keeping a gambling addict away from the casino

Introduction: Ever since I got this new sound system for my bike, I’ve been listening to a lot of podcasts. This American Life is really good. I know, I know, everybody knows that, but it’s true. The only segments I don’t like are the ones that are too “writerly,” when they read a short story aloud. They don’t work for me. Most of the time, though, the show is as great as everyone says it is. Anyway, the other day I listened to program #466: Blackjack . It started with some items on card counting. That stuff is always fun. Then they get to the longer story, which is all about a moderately rich housewife from Iowa who, over a roughly ten-year period, lost her life savings, something like a million dollars, at Harrah’s casinos. Did you know they had casinos in Iowa and Indiana? I didn’t. Anyway, the lady was a gambling addict. That part’s pretty clear. You don’t lose your life savings at a casino by accident. The scary part, though, was how the casino company craftily enabled her to

6 0.73611802 835 andrew gelman stats-2011-08-02-“The sky is the limit” isn’t such a good thing

7 0.73295367 1536 andrew gelman stats-2012-10-16-Using economics to reduce bike theft

8 0.7324065 793 andrew gelman stats-2011-07-09-R on the cloud

9 0.73066956 597 andrew gelman stats-2011-03-02-RStudio – new cross-platform IDE for R

10 0.72100765 1531 andrew gelman stats-2012-10-12-Elderpedia

11 0.72036433 1003 andrew gelman stats-2011-11-11-$

12 0.72018355 1519 andrew gelman stats-2012-10-02-Job!

13 0.71974796 1342 andrew gelman stats-2012-05-24-The Used TV Price is Too Damn High

14 0.71312833 1211 andrew gelman stats-2012-03-13-A personal bit of spam, just for me!

15 0.70583469 988 andrew gelman stats-2011-11-02-Roads, traffic, and the importance in decision analysis of carefully examining your goals

16 0.6960935 223 andrew gelman stats-2010-08-21-Statoverflow

17 0.69588602 1597 andrew gelman stats-2012-11-29-What is expected of a consultant

18 0.69125301 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

19 0.69022471 2080 andrew gelman stats-2013-10-28-Writing for free

20 0.68563455 2282 andrew gelman stats-2014-04-05-Bizarre academic spam


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(13, 0.02), (15, 0.045), (16, 0.09), (21, 0.025), (24, 0.144), (28, 0.01), (45, 0.018), (53, 0.01), (55, 0.022), (76, 0.04), (77, 0.019), (84, 0.01), (86, 0.027), (89, 0.069), (91, 0.079), (99, 0.273)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96120095 1596 andrew gelman stats-2012-11-29-More consulting experiences, this time in computational linguistics

Introduction: Bob wrote this long comment that I think is worth posting: I [Bob] have done a fair bit of consulting for my small natural language processing company over the past ten years. Like statistics, natural language processing is something may companies think they want, but have no idea how to do themselves. We almost always handed out “free” consulting. Usually on the phone to people who called us out of the blue. Our blog and tutorials Google ranking was pretty much our only approach to marketing other than occassionally going to business-oriented conferences. Our goal was to sell software licenses (because consulting doesn’t scale nor does it provide continuing royalty income), but since so few people knew how to use toolkits like ours, we had to help them along the way. We even provided “free” consulting with our startup license package. We were brutally honest with customers, both about our goals and their goals. Their goals were often incompatible with ours (use company X’

2 0.94791174 637 andrew gelman stats-2011-03-29-Unfinished business

Introduction: This blog by J. Robert Lennon on abandoned novels made me think of the more general topic of abandoned projects. I seem to recall George V. Higgins writing that he’d written and discarded 14 novels or so before publishing The Friends of Eddie Coyle. I haven’t abandoned any novels but I’ve abandoned lots of research projects (and also have started various projects that there’s no way I’ll finish). If you think about the decisions involved, it really has to be that way. You learn while you’re working on a project whether it’s worth continuing. Sometimes I’ve put in the hard work and pushed a project to completion, published the article, and then I think . . . what was the point? The modal number of citations of our articles is zero, etc.

3 0.94724005 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon

Introduction: Some people pointed me to this : I am happy to see statistical theory and methods be a topic in popular culture, and of course I’m glad that, contra Feller , the Bayesian is presented as the hero this time, but . . . . I think the lower-left panel of the cartoon unfairly misrepresents frequentist statisticians. Frequentist statisticians recognize many statistical goals. Point estimates trade off bias and variance. Interval estimates have the goal of achieving nominal coverage and the goal of being informative. Tests have the goals of calibration and power. Frequentists know that no single principle applies in all settings, and this is a setting where this particular method is clearly inappropriate. All statisticians use prior information in their statistical analysis. Non-Bayesians express their prior information not through a probability distribution on parameters but rather through their choice of methods. I think this non-Bayesian attitude is too restrictive, but in

4 0.94651729 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

Introduction: Ellen Barnes writes, in response to my paper and the associated discussion at JCGS , I [Barnes] am an industry statistician. I will agree that a table of numbers is essential in an academic publication. The readers want to be able to sit down with the numbers, and make sure they can replicate the results. However, graphics communicate faster – especially when a group of engineers are trying to figure out what is going on. Or, there are times when I have just a couple minutes to convey a complex relationship to a director or a vice-president. One example from this week: We are putting a new subsystem into some of our vehicles – using new technology. The technical specialist leading the project wanted to double check to make sure the system was working properly and finalize the calibration procedure. He mentioned a concern that was nagging him. I plotted his data in a matrix plot (a matrix of two dimensional scatter plots). We immediately keyed in on one plot that showed s

5 0.94568157 1390 andrew gelman stats-2012-06-23-Traditionalist claims that modern art could just as well be replaced by a “paint-throwing chimp”

Introduction: Jed Dougherty points me to this opinion piece by Jacqueline Stevens, a professor of art at Northwestern University, who writes: Artists are defensive these days because in May the House passed an amendment to a bill eliminating the National Endowment for the Arts. Colleagues, especially those who have received N.E.A. grants, will loathe me for saying this, but just this once I’m sympathetic with the anti-intellectual Republicans behind this amendment. Why? The bill incited a national conversation about a subject that has troubled me for decades: the government — disproportionately — supports art that I do not like. Actually, just about nobody likes modern art. All those soup cans—what’s that all about? The stuff they have in museums nowadays, my 4-year-old could do better than that. Two-thirds of so-called modern artists are drunk and two-thirds are frauds. And, no, I didn’t get my math wrong—there’s just a lot of overlap among these categories! It’s an open secret in my

6 0.94040632 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

7 0.93996525 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

8 0.93957841 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns

9 0.93933082 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity

10 0.93862522 777 andrew gelman stats-2011-06-23-Combining survey data obtained using different modes of sampling

11 0.93841672 32 andrew gelman stats-2010-05-14-Causal inference in economics

12 0.93798494 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science

13 0.93710005 1473 andrew gelman stats-2012-08-28-Turing chess run update

14 0.93690878 2297 andrew gelman stats-2014-04-20-Fooled by randomness

15 0.93674457 53 andrew gelman stats-2010-05-26-Tumors, on the left, or on the right?

16 0.93568504 2296 andrew gelman stats-2014-04-19-Index or indicator variables

17 0.93563139 1580 andrew gelman stats-2012-11-16-Stantastic!

18 0.93550277 1818 andrew gelman stats-2013-04-22-Goal: Rules for Turing chess

19 0.9354881 1939 andrew gelman stats-2013-07-15-Forward causal reasoning statements are about estimation; reverse causal questions are about model checking and hypothesis generation

20 0.9351089 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research