hunch_net hunch_net-2005 hunch_net-2005-134 knowledge-graph by maker-knowledge-mining

134 hunch net-2005-12-01-The Webscience Future


meta infos for this blog

Source: html

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. [sent-3, score-0.583]

2 The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. [sent-4, score-0.438]

3 This includes Satinder Singh’s Reinforcement Learning pages, and, more generally books that have been placed online such as this one . [sent-11, score-0.373]

4 Class notes have been placed online such as Avrim’s learning theory lecture notes . [sent-13, score-0.472]

5 At some point, we-the-community realize this and begin to emphasize (and credit) information placed in wikipedia. [sent-19, score-0.366]

6 As evidence compare the machine learning page three years ago (yep, it didn’t exist), two years ago , one year ago , and today . [sent-22, score-0.559]

7 There are fundamental obstacles to the success of the wikipedia future. [sent-24, score-0.363]

8 credit Wikipedia has only very weak mechanisms for crediting editors. [sent-25, score-0.213]

9 A list of the changes done by one user account is about as much credit as is available. [sent-26, score-0.223]

10 In science, the thing to worry about is misplaced ideas of the importance of your topic of research since it is very difficult to be sufficiently interested in a research topic and simultaneously view it objectively. [sent-33, score-0.795]

11 Research is about creating new ideas, and the location of these ideas in some general organization is in dispute by default. [sent-34, score-0.454]

12 Conference Organization We realize that having a list of online papers isn’t nearly as useful as having an organized list of online papers so the conferences which already have online proceedings create an explorable topic hierarchy. [sent-36, score-1.483]

13 Time Organization We realize that the organization at one particular year’s conference is sketchy—research is a multiyear endeavor. [sent-37, score-0.439]

14 Consequently, we start adding to last years topic hierarchy rather than creating a new one from scratch each year. [sent-38, score-0.532]

15 Transformation We realize that it is better if papers are done in the language of the web. [sent-39, score-0.344]

16 Consolidation We realize that there is a lot of redundancy in two papers on the same or a similar topic. [sent-42, score-0.409]

17 By joining the shared pieces, the contents of both papers can be made clearer. [sent-44, score-0.264]

18 At the end of these steps, creating a paper is simply the process of creating a webpage or altering an existing webpage. [sent-46, score-0.25]

19 It’s easier to author because for most papers much of the “filler” introduction/motivation/definition can be reused from previous papers. [sent-48, score-0.197]

20 Which future comes about is dependent on many things—the decisions of community leaders, enabling ‘math-on-the-web’ technologies, etc…, so it is difficult to predict which future and when it will come about. [sent-54, score-0.208]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('wikipedia', 0.363), ('organization', 0.229), ('realize', 0.21), ('topic', 0.198), ('placed', 0.156), ('progression', 0.146), ('online', 0.144), ('credit', 0.138), ('papers', 0.134), ('article', 0.128), ('creating', 0.125), ('hierarchy', 0.12), ('arxiv', 0.113), ('proceedings', 0.113), ('ago', 0.104), ('ideas', 0.1), ('incentive', 0.097), ('organized', 0.092), ('exist', 0.091), ('let', 0.09), ('years', 0.089), ('computer', 0.086), ('counting', 0.086), ('notes', 0.086), ('research', 0.086), ('list', 0.085), ('steps', 0.084), ('science', 0.082), ('known', 0.079), ('tend', 0.077), ('mechanisms', 0.075), ('future', 0.073), ('generally', 0.073), ('page', 0.069), ('consider', 0.067), ('paths', 0.065), ('hyperlink', 0.065), ('contents', 0.065), ('speeds', 0.065), ('acknowledge', 0.065), ('enabled', 0.065), ('joining', 0.065), ('misplaced', 0.065), ('redundancy', 0.065), ('remotely', 0.065), ('potential', 0.064), ('place', 0.064), ('easier', 0.063), ('greatly', 0.062), ('difficult', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 134 hunch net-2005-12-01-The Webscience Future

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield

2 0.30018416 208 hunch net-2006-09-18-What is missing for online collaborative research?

Introduction: The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader. But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research a

3 0.15925848 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

Introduction: How do you create an optimal environment for research? Here are some essential ingredients that I see. Stability . University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between: easy small difficult big For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things. Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt rese

4 0.15232629 343 hunch net-2009-02-18-Decision by Vetocracy

Introduction: Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. This is most easily seen by comparison: Paper Banditron Offset Tree Notes Problem Scope Multiclass problems where only the loss of one choice can be probed. Strictly greater: Cost sensitive multiclass problems where only the loss of one choice can be probed. Often generalizations don’t matter. That’s not the case here, since every plausible application I’ve thought of involves loss functions substantially different from 0/1. What’s new Analysis and Experiments Algorithm, Analysis, and Experiments As far as I know, the essence of the more general problem was first stated and analyzed with the EXP4 algorithm (page 16) (1998). It’s also the time horizon 1 simplification of the Reinforcement Learning setting for the random trajectory method (page 15) (2002). The Banditron algorithm itself is functionally identi

5 0.14627956 288 hunch net-2008-02-10-Complexity Illness

Introduction: One of the enduring stereotypes of academia is that people spend a great deal of intelligence, time, and effort finding complexity rather than simplicity. This is at least anecdotally true in my experience. Math++ Several people have found that adding useless math makes their paper more publishable as evidenced by a reject-add-accept sequence. 8 page minimum Who submitted a paper to ICML violating the 8 page minimum? Every author fears that the reviewers won’t take their work seriously unless the allowed length is fully used. The best minimum violation I know is Adam ‘s paper at SODA on generating random factored numbers , but this is deeply exceptional. It’s a fair bet that 90% of papers submitted are exactly at the page limit. We could imagine that this is because papers naturally take more space, but few people seem to be clamoring for more space. Journalong Has anyone been asked to review a 100 page journal paper? I have. Journal papers can be nice, becaus

6 0.13897082 116 hunch net-2005-09-30-Research in conferences

7 0.13863362 437 hunch net-2011-07-10-ICML 2011 and the future

8 0.1281677 333 hunch net-2008-12-27-Adversarial Academia

9 0.1269279 454 hunch net-2012-01-30-ICML Posters and Scope

10 0.12255856 233 hunch net-2007-02-16-The Forgetting

11 0.12116414 318 hunch net-2008-09-26-The SODA Program Committee

12 0.12096928 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

13 0.12022533 30 hunch net-2005-02-25-Why Papers?

14 0.12000977 296 hunch net-2008-04-21-The Science 2.0 article

15 0.11930045 452 hunch net-2012-01-04-Why ICML? and the summer conferences

16 0.11894091 207 hunch net-2006-09-12-Incentive Compatible Reviewing

17 0.11759284 378 hunch net-2009-11-15-The Other Online Learning

18 0.11693535 98 hunch net-2005-07-27-Not goal metrics

19 0.11313994 484 hunch net-2013-06-16-Representative Reviewing

20 0.11165247 267 hunch net-2007-10-17-Online as the new adjective


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.29), (1, -0.108), (2, -0.0), (3, 0.135), (4, -0.029), (5, 0.006), (6, -0.004), (7, -0.01), (8, -0.028), (9, 0.07), (10, 0.047), (11, -0.007), (12, -0.001), (13, -0.038), (14, 0.03), (15, -0.016), (16, -0.069), (17, 0.014), (18, 0.063), (19, 0.02), (20, 0.013), (21, -0.07), (22, -0.083), (23, 0.007), (24, 0.072), (25, -0.02), (26, 0.064), (27, 0.133), (28, -0.062), (29, -0.041), (30, 0.051), (31, 0.071), (32, 0.102), (33, -0.083), (34, 0.015), (35, 0.008), (36, 0.063), (37, 0.034), (38, -0.045), (39, 0.048), (40, -0.014), (41, 0.119), (42, -0.026), (43, -0.074), (44, -0.074), (45, -0.054), (46, 0.038), (47, 0.078), (48, 0.056), (49, 0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96732628 134 hunch net-2005-12-01-The Webscience Future

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield

2 0.87637174 208 hunch net-2006-09-18-What is missing for online collaborative research?

Introduction: The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader. But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research a

3 0.75815898 1 hunch net-2005-01-19-Why I decided to run a weblog.

Introduction: I have decided to run a weblog on machine learning and learning theory research. Here are some reasons: 1) Weblogs enable new functionality: Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies. Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so. Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but

4 0.68871015 30 hunch net-2005-02-25-Why Papers?

Introduction: Makc asked a good question in comments—”Why bother to make a paper, at all?” There are several reasons for writing papers which may not be immediately obvious to people not in academia. The basic idea is that papers have considerably more utility than the obvious “present an idea”. Papers are a formalized units of work. Academics (especially young ones) are often judged on the number of papers they produce. Papers have a formalized method of citing and crediting other—the bibliography. Academics (especially older ones) are often judged on the number of citations they receive. Papers enable a “more fair” anonymous review. Conferences receive many papers, from which a subset are selected. Discussion forums are inherently not anonymous for anyone who wants to build a reputation for good work. Papers are an excuse to meet your friends. Papers are the content of conferences, but much of what you do is talk to friends about interesting problems while there. Sometimes yo

5 0.67180651 288 hunch net-2008-02-10-Complexity Illness

Introduction: One of the enduring stereotypes of academia is that people spend a great deal of intelligence, time, and effort finding complexity rather than simplicity. This is at least anecdotally true in my experience. Math++ Several people have found that adding useless math makes their paper more publishable as evidenced by a reject-add-accept sequence. 8 page minimum Who submitted a paper to ICML violating the 8 page minimum? Every author fears that the reviewers won’t take their work seriously unless the allowed length is fully used. The best minimum violation I know is Adam ‘s paper at SODA on generating random factored numbers , but this is deeply exceptional. It’s a fair bet that 90% of papers submitted are exactly at the page limit. We could imagine that this is because papers naturally take more space, but few people seem to be clamoring for more space. Journalong Has anyone been asked to review a 100 page journal paper? I have. Journal papers can be nice, becaus

6 0.62839299 233 hunch net-2007-02-16-The Forgetting

7 0.62447184 106 hunch net-2005-09-04-Science in the Government

8 0.60793579 98 hunch net-2005-07-27-Not goal metrics

9 0.6013813 333 hunch net-2008-12-27-Adversarial Academia

10 0.59528863 297 hunch net-2008-04-22-Taking the next step

11 0.58999753 231 hunch net-2007-02-10-Best Practices for Collaboration

12 0.58069599 306 hunch net-2008-07-02-Proprietary Data in Academic Research?

13 0.57518226 485 hunch net-2013-06-29-The Benefits of Double-Blind Review

14 0.57076746 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

15 0.56516945 296 hunch net-2008-04-21-The Science 2.0 article

16 0.56399179 323 hunch net-2008-11-04-Rise of the Machines

17 0.56209391 241 hunch net-2007-04-28-The Coming Patent Apocalypse

18 0.55174077 378 hunch net-2009-11-15-The Other Online Learning

19 0.55104929 363 hunch net-2009-07-09-The Machine Learning Forum

20 0.54684585 116 hunch net-2005-09-30-Research in conferences


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.053), (25, 0.218), (27, 0.203), (38, 0.044), (53, 0.111), (55, 0.108), (67, 0.018), (90, 0.028), (94, 0.074), (95, 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.92186683 178 hunch net-2006-05-08-Big machine learning

Introduction: According to the New York Times , Yahoo is releasing Project Panama shortly . Project Panama is about better predicting which advertisements are relevant to a search, implying a higher click through rate, implying larger income for Yahoo . There are two things that seem interesting here: A significant portion of that improved accuracy is almost certainly machine learning at work. The quantitative effect is huge—the estimate in the article is $600*10 6 . Google already has such improvements and Microsoft Search is surely working on them, which suggest this is (perhaps) a $10 9 per year machine learning problem. The exact methodology under use is unlikely to be publicly discussed in the near future because of the competitive enivironment. Hopefully we’ll have some public “war stories” at some point in the future when this information becomes less sensitive. For now, it’s reassuring to simply note that machine learning is having a big impact.

same-blog 2 0.89640182 134 hunch net-2005-12-01-The Webscience Future

Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield

3 0.88280433 163 hunch net-2006-03-12-Online learning or online preservation of learning?

Introduction: In the online learning with experts setting, you observe a set of predictions, make a decision, and then observe the truth. This process repeats indefinitely. In this setting, it is possible to prove theorems of the sort: master algorithm error count < = k* best predictor error count + c*log(number of predictors) Is this a statement about learning or about preservation of learning? We did some experiments to analyze the new Binning algorithm which works in this setting. For several UCI datasets, we reprocessed them so that features could be used as predictors and then applied several master algorithms. The first graph confirms that Binning is indeed a better algorithm according to the tightness of the upper bound. Here, “Best” is the performance of the best expert. “V. Bound” is the bound for Vovk ‘s algorithm (the previous best). “Bound” is the bound for the Binning algorithm. “Binning” is the performance of the Binning algorithm. The Binning algorithm clearly h

4 0.8804521 148 hunch net-2006-01-13-Benchmarks for RL

Introduction: A couple years ago, Drew Bagnell and I started the RLBench project to setup a suite of reinforcement learning benchmark problems. We haven’t been able to touch it (due to lack of time) for a year so the project is on hold. Luckily, there are several other projects such as CLSquare and RL-Glue with a similar goal, and we strongly endorse their continued development. I would like to explain why, especially in the context of criticism of other learning benchmarks. For example, sometimes the UCI Machine Learning Repository is criticized. There are two criticisms I know of: Learning algorithms have overfit to the problems in the repository. It is easy to imagine a mechanism for this happening unintentionally. Strong evidence of this would be provided by learning algorithms which perform great on the UCI machine learning repository but very badly (relative to other learning algorithms) on non-UCI learning problems. I have seen little evidence of this but it remains a po

5 0.76014102 333 hunch net-2008-12-27-Adversarial Academia

Introduction: One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants. This is not a viewpoint that I want to promote, as I consider it flawed. However, I know several people believe strongly in this viewpoint, and I have found it to have substantial explanatory power. For example: It explains why your paper was rejected based on poor logic. The reviewer wasn’t concerned with research quality, but rather with rejecting a competitor. It explains why professors rarely work together. The goal of a non-tenured professor (at least) is to get tenure, and a case for tenure comes from a portfolio of work that is undisputably yours. It explains why new research programs are not quickly adopted. Adopting a competitor’s program is impossible, if your career is based on the competitor being wrong. Different academic groups subscribe to the adversarial viewp

6 0.75551003 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

7 0.75147229 437 hunch net-2011-07-10-ICML 2011 and the future

8 0.75005507 370 hunch net-2009-09-18-Necessary and Sufficient Research

9 0.74953896 141 hunch net-2005-12-17-Workshops as Franchise Conferences

10 0.74684393 194 hunch net-2006-07-11-New Models

11 0.74537897 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

12 0.74474168 403 hunch net-2010-07-18-ICML & COLT 2010

13 0.74329752 466 hunch net-2012-06-05-ICML acceptance statistics

14 0.74288017 225 hunch net-2007-01-02-Retrospective

15 0.74251771 358 hunch net-2009-06-01-Multitask Poisoning

16 0.74135071 297 hunch net-2008-04-22-Taking the next step

17 0.74132347 207 hunch net-2006-09-12-Incentive Compatible Reviewing

18 0.74086547 95 hunch net-2005-07-14-What Learning Theory might do

19 0.74079424 343 hunch net-2009-02-18-Decision by Vetocracy

20 0.74006307 286 hunch net-2008-01-25-Turing’s Club for Machine Learning