hunch_net hunch_net-2005 hunch_net-2005-105 knowledge-graph by maker-knowledge-mining

105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers


meta infos for this blog

Source: html

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Martin Pool and I recently discussed the similarities and differences between academia and open source programming. [sent-1, score-1.393]

2 Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. [sent-2, score-0.441]

3 ) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. [sent-4, score-1.163]

4 Much of academia is a “gift culture”: whoever gives the most is most respected. [sent-5, score-0.459]

5 Problems Both academia and open source programming suffer from similar problems. [sent-6, score-1.269]

6 Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. [sent-7, score-1.071]

7 Similar phenomena can happen in academia with respect to directions of research. [sent-8, score-0.454]

8 Academics often invest many hours in writing grants while open source programmers simply often are not paid. [sent-10, score-1.062]

9 Given the similarities, it is not too surprising that there is significant cooperation between academia and open source programming, and it is relatively common to crossover from one to the other. [sent-13, score-1.184]

10 The differences are perhaps more interesting to examine because they may point out where one group can learn from the other. [sent-14, score-0.251]

11 A few open source projects have achieved significantly larger scales than academia as far as coordination amongst many people over a long time. [sent-15, score-1.602]

12 Groups of people of this scale in academia are typically things like “the ICML community”, or “people working on Bayesian learning”, which are significantly less tightly coupled than any of the above projects. [sent-17, score-0.726]

13 Academia has managed to secure significantly more funding than open source programmers. [sent-19, score-1.128]

14 Part of the reason for better funding in academia is that it has been around longer and so been able to accomplish more. [sent-21, score-0.593]

15 Perhaps governments will start funding open source programming more seriously if they produce an equivalent (with respect to societal impact) of the atom bomb. [sent-22, score-1.236]

16 In contrast the closest thing to a career path for open source programmers is something like “do a bunch of open source projects and become so wildly succesful that some company hires you to do the same thing”. [sent-24, score-2.164]

17 This is a difficult path but perhaps it is slowly becoming easier and there is still much room for improvement. [sent-25, score-0.286]

18 Open source programmers take significantly more advantage of modern tools for communication. [sent-26, score-0.79]

19 Open source programmers have considerably more freedom of location. [sent-29, score-0.696]

20 Academic research is almost always tied to a particular university or lab, while many people who work on open source projects can choose to live esssentially anywhere with reasonable internet access. [sent-30, score-0.994]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('academia', 0.403), ('source', 0.371), ('open', 0.361), ('programmers', 0.274), ('funding', 0.19), ('profile', 0.183), ('similarities', 0.169), ('significantly', 0.145), ('path', 0.141), ('programming', 0.134), ('projects', 0.133), ('education', 0.129), ('groups', 0.112), ('career', 0.102), ('martin', 0.102), ('perhaps', 0.095), ('driven', 0.094), ('differences', 0.089), ('people', 0.076), ('academic', 0.068), ('produce', 0.068), ('group', 0.067), ('coordination', 0.061), ('secure', 0.061), ('societal', 0.061), ('tuition', 0.061), ('whoever', 0.056), ('invest', 0.056), ('mozilla', 0.056), ('personality', 0.056), ('wealthy', 0.056), ('cost', 0.056), ('contributed', 0.053), ('culture', 0.053), ('mixed', 0.053), ('tied', 0.053), ('larger', 0.052), ('phenomena', 0.051), ('considerably', 0.051), ('collaborations', 0.051), ('programmer', 0.051), ('coupled', 0.051), ('governments', 0.051), ('grade', 0.051), ('tightly', 0.051), ('anyone', 0.05), ('still', 0.05), ('thing', 0.05), ('relatively', 0.049), ('academics', 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for

2 0.21163583 36 hunch net-2005-03-05-Funding Research

Introduction: The funding of research (and machine learning research) is an issue which seems to have become more significant in the United States over the last decade. The word “research” is applied broadly here to science, mathematics, and engineering. There are two essential difficulties with funding research: Longshot Paying a researcher is often a big gamble. Most research projects don’t pan out, but a few big payoffs can make it all worthwhile. Information Only Much of research is about finding the right way to think about or do something. The Longshot difficulty means that there is high variance in payoffs. This can be compensated for by funding many different research projects, reducing variance. The Information-Only difficulty means that it’s hard to extract a profit directly from many types of research, so companies have difficulty justifying basic research. (Patents are a mechanism for doing this. They are often extraordinarily clumsy or simply not applicable.) T

3 0.17071652 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

Introduction: I just created version 5.1 of vowpal wabbit . This almost entirely a bugfix release, so it’s an easy upgrade from v5.0. In addition: There is now a mailing list , which I and several other developers are subscribed to. The main website has shifted to the wiki on github. This means that anyone with a github account can now edit it. I’m planning to give a tutorial tomorrow on it at eHarmony / the LA machine learning meetup at 10am. Drop by if you’re interested. The status of VW amongst other open source projects has changed. When VW first came out, it was relatively unique amongst existing projects in terms of features. At this point, many other projects have started to appreciate the value of the design choices here. This includes: Mahout , which now has an SGD implementation. Shogun , where Soeren is keen on incorporating features . LibLinear , where they won the KDD best paper award for out-of-core learning . This is expected—any open sourc

4 0.14702578 297 hunch net-2008-04-22-Taking the next step

Introduction: At the last ICML , Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now. The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki , COLT has started using Joomla , with some dynamic content, and AAAI has been setting up a “ student blog “. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School , which was quite useful for coordinating events and other things. I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers pag

5 0.13804096 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

Introduction: How do you create an optimal environment for research? Here are some essential ingredients that I see. Stability . University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between: easy small difficult big For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things. Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt rese

6 0.13637708 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

7 0.13299237 292 hunch net-2008-03-15-COLT Open Problems

8 0.11711886 344 hunch net-2009-02-22-Effective Research Funding

9 0.11403244 464 hunch net-2012-05-03-Microsoft Research, New York City

10 0.10715757 424 hunch net-2011-02-17-What does Watson mean?

11 0.10045592 449 hunch net-2011-11-26-Giving Thanks

12 0.098960035 215 hunch net-2006-10-22-Exemplar programming

13 0.098885179 429 hunch net-2011-04-06-COLT open questions

14 0.098209992 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

15 0.096164435 48 hunch net-2005-03-29-Academic Mechanism Design

16 0.095995806 154 hunch net-2006-02-04-Research Budget Changes

17 0.094100669 225 hunch net-2007-01-02-Retrospective

18 0.091276474 110 hunch net-2005-09-10-“Failure” is an option

19 0.090796992 30 hunch net-2005-02-25-Why Papers?

20 0.089265034 98 hunch net-2005-07-27-Not goal metrics


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.18), (1, -0.036), (2, -0.1), (3, 0.127), (4, -0.109), (5, -0.025), (6, 0.007), (7, 0.029), (8, -0.072), (9, 0.089), (10, -0.014), (11, -0.012), (12, -0.033), (13, -0.005), (14, 0.039), (15, -0.009), (16, -0.059), (17, 0.027), (18, -0.095), (19, -0.02), (20, -0.058), (21, 0.108), (22, 0.044), (23, -0.071), (24, 0.003), (25, 0.011), (26, -0.099), (27, 0.168), (28, 0.079), (29, 0.029), (30, 0.089), (31, 0.007), (32, 0.092), (33, 0.011), (34, -0.09), (35, -0.094), (36, 0.009), (37, -0.114), (38, 0.136), (39, -0.023), (40, 0.027), (41, -0.088), (42, 0.15), (43, 0.027), (44, -0.015), (45, -0.057), (46, -0.087), (47, 0.067), (48, 0.047), (49, 0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98484743 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for

2 0.58514434 292 hunch net-2008-03-15-COLT Open Problems

Introduction: COLT has a call for open problems due March 21. I encourage anyone with a specifiable open problem to write it down and send it in. Just the effort of specifying an open problem precisely and concisely has been very helpful for my own solutions, and there is a substantial chance others will solve it. To increase the chance someone will take it up, you can even put a bounty on the solution. (Perhaps I should raise the $500 bounty on the K-fold cross-validation problem as it hasn’t yet been solved).

3 0.57810014 36 hunch net-2005-03-05-Funding Research

Introduction: The funding of research (and machine learning research) is an issue which seems to have become more significant in the United States over the last decade. The word “research” is applied broadly here to science, mathematics, and engineering. There are two essential difficulties with funding research: Longshot Paying a researcher is often a big gamble. Most research projects don’t pan out, but a few big payoffs can make it all worthwhile. Information Only Much of research is about finding the right way to think about or do something. The Longshot difficulty means that there is high variance in payoffs. This can be compensated for by funding many different research projects, reducing variance. The Information-Only difficulty means that it’s hard to extract a profit directly from many types of research, so companies have difficulty justifying basic research. (Patents are a mechanism for doing this. They are often extraordinarily clumsy or simply not applicable.) T

4 0.5580892 48 hunch net-2005-03-29-Academic Mechanism Design

Introduction: From game theory, there is a notion of “mechanism design”: setting up the structure of the world so that participants have some incentive to do sane things (rather than obviously counterproductive things). Application of this principle to academic research may be fruitful. What is misdesigned about academic research? The JMLG guides give many hints. The common nature of bad reviewing also suggests the system isn’t working optimally. There are many ways to experimentally “cheat” in machine learning . Funding Prisoner’s Delimma. Good researchers often write grant proposals for funding rather than doing research. Since the pool of grant money is finite, this means that grant proposals are often rejected, implying that more must be written. This is essentially a “prisoner’s delimma”: anyone not writing grant proposals loses, but the entire process of doing research is slowed by distraction. If everyone wrote 1/2 as many grant proposals, roughly the same distribution

5 0.55305183 271 hunch net-2007-11-05-CMU wins DARPA Urban Challenge

Introduction: The results have been posted , with CMU first , Stanford second , and Virginia Tech Third . Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops . Slashdot has a post with many comments.

6 0.55218327 429 hunch net-2011-04-06-COLT open questions

7 0.54282945 29 hunch net-2005-02-25-Solution: Reinforcement Learning with Classification

8 0.53714335 428 hunch net-2011-03-27-Vowpal Wabbit, v5.1

9 0.52898043 154 hunch net-2006-02-04-Research Budget Changes

10 0.52532327 297 hunch net-2008-04-22-Taking the next step

11 0.52256876 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

12 0.51599115 344 hunch net-2009-02-22-Effective Research Funding

13 0.49029958 119 hunch net-2005-10-08-We have a winner

14 0.46645394 449 hunch net-2011-11-26-Giving Thanks

15 0.45647705 50 hunch net-2005-04-01-Basic computer science research takes a hit

16 0.44606802 273 hunch net-2007-11-16-MLSS 2008

17 0.44036669 1 hunch net-2005-01-19-Why I decided to run a weblog.

18 0.43462905 458 hunch net-2012-03-06-COLT-ICML Open Questions and ICML Instructions

19 0.43367708 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

20 0.43046814 128 hunch net-2005-11-05-The design of a computing cluster


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.018), (27, 0.15), (34, 0.03), (38, 0.039), (48, 0.02), (53, 0.046), (55, 0.137), (64, 0.028), (68, 0.021), (94, 0.059), (95, 0.163), (96, 0.198)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.89126039 175 hunch net-2006-04-30-John Langford –> Yahoo Research, NY

Introduction: I will join Yahoo Research (in New York) after my contract ends at TTI-Chicago . The deciding reasons are: Yahoo is running into many hard learning problems. This is precisely the situation where basic research might hope to have the greatest impact. Yahoo Research understands research including publishing, conferences, etc… Yahoo Research is growing, so there is a chance I can help it grow well. Yahoo understands the internet, including (but not at all limited to) experimenting with research blogs. In the end, Yahoo Research seems like the place where I might have a chance to make the greatest difference. Yahoo (as a company) has made a strong bet on Yahoo Research. We-the-researchers all hope that bet will pay off, and this seems plausible. I’ll certainly have fun trying.

same-blog 2 0.88953966 105 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers

Introduction: Martin Pool and I recently discussed the similarities and differences between academia and open source programming. Similarities: Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.) Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected. Problems Both academia and open source programming suffer from similar problems. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research. Funding is often a problem for

3 0.87161744 443 hunch net-2011-09-03-Fall Machine Learning Events

Introduction: Many Machine Learning related events are coming up this fall. September 9 , abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we: widened submissions to be from anybody rather than students. set aside a larger fraction of time for contributed submissions. September 15 , there is a machine learning meetup , where I’ll be discussing terascale learning at AOL. September 16 , there is a CS&Econ; day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest. September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me: Big Learning is about dealing with lots of data. Abstracts are due September 30 . The Bayes

4 0.81287521 104 hunch net-2005-08-22-Do you believe in induction?

Introduction: Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing. As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings. The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to thi

5 0.77597737 53 hunch net-2005-04-06-Structured Regret Minimization

Introduction: Geoff Gordon made an interesting presentation at the snowbird learning workshop discussing the use of no-regret algorithms for the use of several robot-related learning problems. There seems to be a draft here . This seems interesting in two ways: Drawback Removal One of the significant problems with these online algorithms is that they can’t cope with structure very easily. This drawback is addressed for certain structures. Experiments One criticism of such algorithms is that they are too “worst case”. Several experiments suggest that protecting yourself against this worst case does not necessarily incur a great loss.

6 0.76645827 344 hunch net-2009-02-22-Effective Research Funding

7 0.75877196 389 hunch net-2010-02-26-Yahoo! ML events

8 0.75552529 456 hunch net-2012-02-24-ICML+50%

9 0.74588323 30 hunch net-2005-02-25-Why Papers?

10 0.73572838 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

11 0.73530114 234 hunch net-2007-02-22-Create Your Own ICML Workshop

12 0.73428428 464 hunch net-2012-05-03-Microsoft Research, New York City

13 0.73178166 127 hunch net-2005-11-02-Progress in Active Learning

14 0.72721589 466 hunch net-2012-06-05-ICML acceptance statistics

15 0.72075254 462 hunch net-2012-04-20-Both new: STOC workshops and NEML

16 0.70755059 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

17 0.7004025 445 hunch net-2011-09-28-Somebody’s Eating Your Lunch

18 0.69589382 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

19 0.69528747 36 hunch net-2005-03-05-Funding Research

20 0.68665946 343 hunch net-2009-02-18-Decision by Vetocracy