hunch_net hunch_net-2009 knowledge-graph by maker-knowledge-mining

hunch_net 2009 knowledge graph


similar blogs computed by tfidf model


similar blogs computed by lsi model


similar blogs computed by lda model


blogs list:

1 hunch net-2009-12-27-Interesting things at NIPS 2009

Introduction: Several papers at NIPS caught my attention. Elad Hazan and Satyen Kale , Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing . At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling. Mark Palatucci , Dean Pomerlau , Tom Mitchell , and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading. Sh

2 hunch net-2009-12-24-Top graduates this season

Introduction: I would like to point out 3 graduates this season as having my confidence they are capable of doing great things. Daniel Hsu has diverse papers with diverse coauthors on {active learning, mulitlabeling, temporal learning, …} each covering new algorithms and methods of analysis. He is also a capable programmer, having helped me with some nitty-gritty details of cluster parallel Vowpal Wabbit this summer. He has an excellent tendency to just get things done. Nicolas Lambert doesn’t nominally work in machine learning, but I’ve found his work in elicitation relevant nevertheless. In essence, elicitable properties are closely related to learnable properties, and the elicitation complexity is related to a notion of learning complexity. See the Surrogate regret bounds paper for some related discussion. Few people successfully work at such a general level that it crosses fields, but he’s one of them. Yisong Yue is deeply focused on interactive learning, which he has a

3 hunch net-2009-12-09-Inherent Uncertainty

Introduction: I’d like to point out Inherent Uncertainty , which I’ve added to the ML blog post scanner on the right. My understanding from Jake is that the intention is to have a multiauthor blog which is more specialized towards learning theory/game theory than this one. Nevertheless, several of the posts seem to be of wider interest.

4 hunch net-2009-12-09-Future Publication Models @ NIPS

Introduction: Yesterday, there was a discussion about future publication models at NIPS . Yann and Zoubin have specific detailed proposals which I’ll add links to when I get them ( Yann’s proposal and Zoubin’s proposal ). What struck me about the discussion is that there are many simultaneous concerns as well as many simultaneous proposals, which makes it difficult to keep all the distinctions straight in a verbal conversation. It also seemed like people were serious enough about this that we may see some real movement. Certainly, my personal experience motivates that as I’ve posted many times about the substantial flaws in our review process, including some very poor personal experiences. Concerns include the following: (Several) Reviewers are overloaded, boosting the noise in decision making. ( Yann ) A new system should run with as little built-in delay and friction to the process of research as possible. ( Hanna Wallach (updated)) Double-blind review is particularly impor

5 hunch net-2009-12-07-Vowpal Wabbit version 4.0, and a NIPS heresy

Introduction: I’m releasing version 4.0 ( tarball ) of Vowpal Wabbit . The biggest change (by far) in this release is experimental support for cluster parallelism, with notable help from Daniel Hsu . I also took advantage of the major version number to introduce some incompatible changes, including switching to murmurhash 2 , and other alterations to cachefiles. You’ll need to delete and regenerate them. In addition, the precise specification for a “tag” (i.e. string that can be used to identify an example) changed—you can’t have a space between the tag and the ‘|’ at the beginning of the feature namespace. And, of course, we made it faster. For the future, I put up my todo list outlining the major future improvements I want to see in the code. I’m planning to discuss the current mechanism and results of the cluster parallel implementation at the large scale machine learning workshop at NIPS later this week. Several people have asked me to do a tutorial/walkthrough of VW, wh

6 hunch net-2009-11-29-AI Safety

Introduction: Dan Reeves introduced me to Michael Vassar who ran the Singularity Summit and educated me a bit on the subject of AI safety which the Singularity Institute has small grants for . I still believe that interstellar space travel is necessary for long term civilization survival, and the AI is necessary for interstellar space travel . On these grounds alone, we could judge that developing AI is much more safe than not. Nevertheless, there is a basic reasonable fear, as expressed by some commenters, that AI could go bad. A basic scenario starts with someone inventing an AI and telling it to make as much money as possible. The AI promptly starts trading in various markets to make money. To improve, it crafts a virus that takes over most of the world’s computers using it as a surveillance network so that it can always make the right decision. The AI also branches out into any form of distance work, taking over the entire outsourcing process for all jobs that are entirely di

7 hunch net-2009-11-23-ICML 2009 Workshops (and Tutorials)

Introduction: I’m the workshops chair for ICML this year. As such, I would like to personally encourage people to consider running a workshop. My general view of workshops is that they are excellent as opportunities to discuss and develop research directions—some of my best work has come from collaborations at workshops and several workshops have substantially altered my thinking about various problems. My experience running workshops is that setting them up and making them fly often appears much harder than it actually is, and the workshops often come off much better than expected in the end. Submissions are due January 18, two weeks before papers. Similarly, Ben Taskar is looking for good tutorials , which is complementary. Workshops are about exploring a subject, while a tutorial is about distilling it down into an easily taught essence, a vital part of the research process. Tutorials are due February 13, two weeks after papers.

8 hunch net-2009-11-15-The Other Online Learning

Introduction: If you search for “online learning” with any major search engine , it’s interesting to note that zero of the results are for online machine learning. This may not be a mistake if you are committed to a global ordering. In other words, the number of people specifically interested in the least interesting top-10 online human learning result might exceed the number of people interested in online machine learning, even given the presence of the other 9 results. The essential observation here is that the process of human learning is a big business (around 5% of GDP) effecting virtually everyone. The internet is changing this dramatically, by altering the economics of teaching. Consider two possibilities: The classroom-style teaching environment continues as is, with many teachers for the same subject. All the teachers for one subject get together, along with perhaps a factor of 2 more people who are experts in online delivery. They spend a factor of 4 more time designing

9 hunch net-2009-11-09-NYAS ML Symposium this year.

Introduction: The NYAS ML symposium grew again this year to 170 participants, despite the need to outsmart or otherwise tunnel through a crowd . Perhaps the most distinct talk was by Bob Bell on various aspects of the Netflix prize competition. I also enjoyed several student posters including Matt Hoffman ‘s cool examples of blind source separation for music. I’m somewhat surprised how much the workshop has grown, as it is now comparable in size to a small conference, although in style more similar to a workshop. At some point as an event grows, it becomes owned by the community rather than the organizers, so if anyone has suggestions on improving it, speak up and be heard.

10 hunch net-2009-11-06-Yisong Yue on Self-improving Systems

Introduction: I’d like to point out Yisong Yue ‘s post on Self-improving systems , which is a nicely readable description of the necessity and potential of interactive learning to deal with the information overload problem that is endemic to the modern internet.

11 hunch net-2009-10-26-NIPS workshops

Introduction: Many of the NIPS workshops have a deadline about now, and the NIPS early registration deadline is Nov. 6 . Several interest me: Adaptive Sensing, Active Learning, and Experimental Design due 10/27. Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra , due Nov. 6. Large-Scale Machine Learning: Parallelism and Massive Datasets , due 10/23 (i.e. past) Analysis and Design of Algorithms for Interactive Machine Learning , due 10/30. And I’m sure many of the others interest others. Workshops are great as a mechanism for research, so take a look if there is any chance you might be interested.

12 hunch net-2009-10-10-ALT 2009

Introduction: I attended ALT (“Algorithmic Learning Theory”) for the first time this year. My impression is ALT = 0.5 COLT, by attendance and also by some more intangible “what do I get from it?” measure. There are many differences which can’t quite be described this way though. The program for ALT seems to be substantially more diverse than COLT, which is both a weakness and a strength. One paper that might interest people generally is: Alexey Chernov and Vladimir Vovk , Prediction with Expert Evaluators’ Advice . The basic observation here is that in the online learning with experts setting you can simultaneously compete with several compatible loss functions simultaneously. Restated, debating between competing with log loss and squared loss is a waste of breath, because it’s almost free to compete with them both simultaneously. This might interest anyone who has run into “which loss function?” debates that come up periodically.

13 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction

Introduction: I have had interesting discussions about distinction between static vs. dynamic classes with Kishore and Hal . The distinction arises in multiclass prediction settings. A static set of classes is given by a set of labels {1,…,k} and the goal is generally to choose the most likely label given features. The static approach is the one that we typically analyze and think about in machine learning. The dynamic setting is one that is often used in practice. The basic idea is that the number of classes is not fixed, varying on a per example basis. These different classes are generally defined by a choice of features. The distinction between these two settings as far as theory goes, appears to be very substantial. For example, in the static setting, in learning reductions land , we have techniques now for robust O(log(k)) time prediction in many multiclass setting variants. In the dynamic setting, the best techniques known are O(k) , and furthermore this exponential

14 hunch net-2009-09-29-Machine Learning Protests at the G20

Introduction: The machine learning department at CMU turned out en masse to protest the G20 summit in Pittsburgh. Arthur Gretton uploaded some great photos covering the event

15 hunch net-2009-09-21-Netflix finishes (and starts)

Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu

16 hunch net-2009-09-18-Necessary and Sufficient Research

Introduction: Researchers are typically confronted with big problems that they have no idea how to solve. In trying to come up with a solution, a natural approach is to decompose the big problem into a set of subproblems whose solution yields a solution to the larger problem. This approach can go wrong in several ways. Decomposition failure . The solution to the decomposition does not in fact yield a solution to the overall problem. Artificial hardness . The subproblems created are sufficient if solved to solve the overall problem, but they are harder than necessary. As you can see, computational complexity forms a relatively new (in research-history) razor by which to judge an approach sufficient but not necessary. In my experience, the artificial hardness problem is very common. Many researchers abdicate the responsibility of choosing a problem to work on to other people. This process starts very naturally as a graduate student, when an incoming student might have relatively l

17 hunch net-2009-08-27-New York Area Machine Learning Events

Introduction: Several events are happening in the NY area. Barriers in Computational Learning Theory Workshop, Aug 28. That’s tomorrow near Princeton. I’m looking forward to speaking at this one on “Getting around Barriers in Learning Theory”, but several other talks are of interest, particularly to the CS theory inclined. Claudia Perlich is running the INFORMS Data Mining Contest with a deadline of Sept. 25. This is a contest using real health record data (they partnered with HealthCare Intelligence ) to predict transfers and mortality. In the current US health care reform debate, the case studies of high costs we hear strongly suggest machine learning & statistics can save many billions. The Singularity Summit October 3&4 . This is for the AIists out there. Several of the talks look interesting, although unfortunately I’ll miss it for ALT . Predictive Analytics World, Oct 20-21 . This is stretching the definition of “New York Area” a bit, but the train to DC is reasonable.

18 hunch net-2009-08-26-Another 10-year paper in Machine Learning

Introduction: When I was thinking about the best “10 year paper” for ICML , I also took a look at a few other conferences. Here is one from 10 years ago that interested me: David McAllester PAC-Bayesian Model Averaging , COLT 1999. 2001 Journal Draft . Prior to this paper, the only mechanism known for controlling or estimating the necessary sample complexity for learning over continuously parameterized predictors was VC theory and variants, all of which suffered from a basic problem: they were incredibly pessimistic in practice. This meant that only very gross guidance could be provided for learning algorithm design. The PAC-Bayes bound provided an alternative approach to sample complexity bounds which was radically tighter, quantitatively. It also imported and explained many of the motivations for Bayesian learning in a way that learning theory and perhaps optimization people might appreciate. Since this paper came out, there have been a number of moderately successful attempts t

19 hunch net-2009-08-16-Centmail comments

Introduction: Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist , slashdot , and others, some of the comments make the academic review process appear thoughtful . Some prominent fallacies are: Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details . Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem . Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the

20 hunch net-2009-08-03-Carbon in Computer Science Research

Introduction: Al Gore ‘s film and gradually more assertive and thorough science has managed to mostly shift the debate on climate change from “Is it happening?” to “What should be done?” In that context, it’s worthwhile to think a bit about what can be done within computer science research. There are two things we can think about: Doing Research At a cartoon level, computer science research consists of some combination of commuting to&from; work, writing programs, running them on computers, writing papers, and presenting them at conferences. A typical computer has a power usage on the order of 100 Watts, which works out to 2.4 kiloWatt-hours/day. Looking up David MacKay ‘s reference on power usage per person , it becomes clear that this is a relatively minor part of the lifestyle, although it could become substantial if many more computers are required. Much larger costs are associated with commuting (which is in common with many people) and attending conferences. Since local commuti

21 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

22 hunch net-2009-07-11-Interesting papers at KDD

23 hunch net-2009-07-09-The Machine Learning Forum

24 hunch net-2009-06-26-Netflix nearly done

25 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

26 hunch net-2009-06-15-In Active Learning, the question changes

27 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

28 hunch net-2009-06-01-Multitask Poisoning

29 hunch net-2009-05-30-Many ways to Learn this summer

30 hunch net-2009-05-24-2009 ICML discussion site

31 hunch net-2009-05-19-CI Fellows

32 hunch net-2009-05-17-Server Update

33 hunch net-2009-05-08-Computability in Artificial Intelligence

34 hunch net-2009-05-06-Machine Learning to AI

35 hunch net-2009-05-02-Wielding a New Abstraction

36 hunch net-2009-04-23-Jonathan Chang at Slycoder

37 hunch net-2009-04-21-Interesting Presentations at Snowbird

38 hunch net-2009-04-02-Asymmophobia

39 hunch net-2009-03-26-Machine Learning is too easy

40 hunch net-2009-03-18-Parallel ML primitives

41 hunch net-2009-03-08-Prediction Science

42 hunch net-2009-02-22-Effective Research Funding

43 hunch net-2009-02-18-Decision by Vetocracy

44 hunch net-2009-02-16-KDNuggets

45 hunch net-2009-02-04-Optimal Proxy Loss for Classification

46 hunch net-2009-01-28-Nielsen’s talk

47 hunch net-2009-01-27-Key Scientific Challenges

48 hunch net-2009-01-23-An Active Learning Survey

49 hunch net-2009-01-21-Nearly all natural problems require nonlinearity

50 hunch net-2009-01-19-Netflix prize within epsilon

51 hunch net-2009-01-08-Predictive Analytics World

52 hunch net-2009-01-07-Interesting Papers at SODA 2009