hunch_net hunch_net-2005 knowledge-graph by maker-knowledge-mining
1 hunch net-2005-12-29-Deadline Season
Introduction: Many different paper deadlines are coming up soon so I made a little reference table. Out of curiosity, I also computed the interval between submission deadline and conference. Conference Location Date Deadline interval COLT Pittsburgh June 22-25 January 21 152 ICML Pittsburgh June 26-28 January 30/February 6 140 UAI MIT July 13-16 March 9/March 16 119 AAAI Boston July 16-20 February 16/21 145 KDD Philadelphia August 23-26 March 3/March 10 166 It looks like the northeastern US is the big winner as far as location this year.
2 hunch net-2005-12-28-Yet more nips thoughts
Introduction: I only managed to make it out to the NIPS workshops this year so I’ll give my comments on what I saw there. The Learing and Robotics workshops lives again. I hope it continues and gets more high quality papers in the future. The most interesting talk for me was Larry Jackel’s on the LAGR program (see John’s previous post on said program). I got some ideas as to what progress has been made. Larry really explained the types of benchmarks and the tradeoffs that had to be made to make the goals achievable but challenging. Hal Daume gave a very interesting talk about structured prediction using RL techniques, something near and dear to my own heart. He achieved rather impressive results using only a very greedy search. The non-parametric Bayes workshop was great. I enjoyed the entire morning session I spent there, and particularly (the usually desultory) discussion periods. One interesting topic was the Gibbs/Variational inference divide. I won’t try to summarize espe
3 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
4 hunch net-2005-12-22-Yes , I am applying
Introduction: Every year about now hundreds of applicants apply for a research/teaching job with the timing governed by the university recruitment schedule. This time, it’s my turn—the hat’s in the ring, I am a contender, etc… What I have heard is that this year is good in both directions—both an increased supply and an increased demand for machine learning expertise. I consider this post a bit of an abuse as it is neither about general research nor machine learning. Please forgive me this once. My hope is that I will learn about new places interested in funding basic research—it’s easy to imagine that I have overlooked possibilities. I am not dogmatic about where I end up in any particular way. Several earlier posts detail what I think of as a good research environment, so I will avoid a repeat. A few more details seem important: Application. There is often a tension between basic research and immediate application. This tension is not as strong as might be expected in my case. As
5 hunch net-2005-12-17-Workshops as Franchise Conferences
Introduction: Founding a successful new conference is extraordinarily difficult. As a conference founder, you must manage to attract a significant number of good papers—enough to entice the participants into participating next year and to (generally) to grow the conference. For someone choosing to participate in a new conference, there is a very significant decision to make: do you send a paper to some new conference with no guarantee that the conference will work out? Or do you send it to another (possibly less related) conference that you are sure will work? The conference founding problem is a joint agreement problem with a very significant barrier. Workshops are a way around this problem, and workshops attached to conferences are a particularly effective means for this. A workshop at a conference is sure to have people available to speak and attend and is sure to have a large audience available. Presenting work at a workshop is not generally exclusive: it can also be presented at a confe
6 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
7 hunch net-2005-12-11-More NIPS Papers
Introduction: Let me add to John’s post with a few of my own favourites from this year’s conference. First, let me say that Sanjoy’s talk, Coarse Sample Complexity Bounds for Active Learning was also one of my favourites, as was the Forgettron paper . I also really enjoyed the last third of Christos’ talk on the complexity of finding Nash equilibria. And, speaking of tagging, I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool. Finally, let me add my recommendations for specific papers: Z. Ghahramani, K. Heller: Bayesian Sets [no preprint] (A very elegant probabilistic information retrieval style model of which objects are “most like” a given subset of objects.) T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and the Indian Buffet Process [ preprint ] (A Dirichlet style prior over infinite binary matrices with beautiful exchangeability properties.) K. Weinberger, J. Blitzer, L. Saul: Distance Metric Lea
8 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
9 hunch net-2005-12-09-Machine Learning Thoughts
Introduction: I added a link to Olivier Bousquet’s machine learning thoughts blog. Several of the posts may be of interest.
10 hunch net-2005-12-07-Is the Google way the way for machine learning?
Introduction: Urs Hoelzle from Google gave an invited presentation at NIPS . In the presentation, he strongly advocates interacting with data in a particular scalable manner which is something like the following: Make a cluster of machines. Build a unified filesystem. (Google uses GFS, but NFS or other approaches work reasonably well for smaller clusters.) Interact with data via MapReduce . Creating a cluster of machines is, by this point, relatively straightforward. Unified filesystems are a little bit tricky—GFS is capable by design of essentially unlimited speed throughput to disk. NFS can bottleneck because all of the data has to move through one machine. Nevertheless, this may not be a limiting factor for smaller clusters. MapReduce is a programming paradigm. Essentially, it is a combination of a data element transform (map) and an agreggator/selector (reduce). These operations are highly parallelizable and the claim is that they support the forms of data interacti
11 hunch net-2005-12-04-Watchword: model
Introduction: In everyday use a model is a system which explains the behavior of some system, hopefully at the level where some alteration of the model predicts some alteration of the real-world system. In machine learning “model” has several variant definitions. Everyday . The common definition is sometimes used. Parameterized . Sometimes model is a short-hand for “parameterized model”. Here, it refers to a model with unspecified free parameters. In the Bayesian learning approach, you typically have a prior over (everyday) models. Predictive . Even further from everyday use is the predictive model. Examples of this are “my model is a decision tree” or “my model is a support vector machine”. Here, there is no real sense in which an SVM explains the underlying process. For example, an SVM tells us nothing in particular about how alterations to the real-world system would create a change. Which definition is being used at any particular time is important information. For examp
12 hunch net-2005-12-01-The Webscience Future
Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield
13 hunch net-2005-11-28-A question of quantification
Introduction: This is about methods for phrasing and think about the scope of some theorems in learning theory. The basic claim is that there are several different ways of quantifying the scope which sound different yet are essentially the same. For all sequences of examples . This is the standard quantification in online learning analysis. Standard theorems would say something like “for all sequences of predictions by experts, the algorithm A will perform almost as well as the best expert.” For all training sets . This is the standard quantification for boosting analysis such as adaboost or multiclass boosting . Standard theorems have the form “for all training sets the error rate inequalities … hold”. For all distributions over examples . This is the one that we have been using for reductions analysis. Standard theorem statements have the form “For all distributions over examples, the error rate inequalities … hold”. It is not quite true that each of these is equivalent. F
14 hunch net-2005-11-26-The Design of an Optimal Research Environment
Introduction: How do you create an optimal environment for research? Here are some essential ingredients that I see. Stability . University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between: easy small difficult big For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things. Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt rese
15 hunch net-2005-11-16-The Everything Ensemble Edge
Introduction: Rich Caruana , Alexandru Niculescu , Geoff Crew, and Alex Ksikes have done a lot of empirical testing which shows that using all methods to make a prediction is more powerful than using any single method. This is in rough agreement with the Bayesian way of solving problems, but based upon a different (essentially empirical) motivation. A rough summary is: Take all of {decision trees, boosted decision trees, bagged decision trees, boosted decision stumps, K nearest neighbors, neural networks, SVM} with all reasonable parameter settings. Run the methods on each problem of 8 problems with a large test set, calibrating margins using either sigmoid fitting or isotonic regression . For each loss of {accuracy, area under the ROC curve, cross entropy, squared error, etc…} evaluate the average performance of the method. A series of conclusions can be drawn from the observations. ( Calibrated ) boosted decision trees appear to perform best, in general although support v
16 hunch net-2005-11-16-MLSS 2006
Introduction: There will be two machine learning summer schools in 2006. One is in Canberra, Australia from February 6 to February 17 (Aussie summer). The webpage is fully ‘live’ so you should actively consider it now. The other is in Taipei, Taiwan from July 24 to August 4. This one is still in the planning phase, but that should be settled soon. Attending an MLSS is probably the quickest and easiest way to bootstrap yourself into a reasonable initial understanding of the field of machine learning.
17 hunch net-2005-11-07-Prediction Competitions
Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the
18 hunch net-2005-11-05-The design of a computing cluster
Introduction: This is about the design of a computing cluster from the viewpoint of applied machine learning using current technology. We just built a small one at TTI so this is some evidence of what is feasible and thoughts about the design choices. Architecture There are several architectural choices. AMD Athlon64 based system. This seems to have the cheapest bang/buck. Maximum RAM is typically 2-3GB. AMD Opteron based system. Opterons provide the additional capability to buy an SMP motherboard with two chips, and the motherboards often support 16GB of RAM. The RAM is also the more expensive error correcting type. Intel PIV or Xeon based system. The PIV and Xeon based systems are the intel analog of the above 2. Due to architectural design reasons, these chips tend to run a bit hotter and be a bit more expensive. Dual core chips. Both Intel and AMD have chips that actually have 2 processors embedded in them. In the end, we decided to go with option (2). Roughly speaking,
19 hunch net-2005-11-02-Progress in Active Learning
Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r
20 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
Introduction: The ideal of theoretical algorithm analysis is to construct an algorithm with accompanying optimality theorems proving that it is a useful algorithm. This ideal often fails, particularly for learning algorithms and theory. The general form of a theorem is: If preconditions Then postconditions When we design learning algorithms it is very common to come up with precondition assumptions such as “the data is IID”, “the learning problem is drawn from a known distribution over learning problems”, or “there is a perfect classifier”. All of these example preconditions can be false for real-world problems in ways that are not easily detectable. This means that algorithms derived and justified by these very common forms of analysis may be prone to catastrophic failure in routine (mis)application. We can hope for better. Several different kinds of learning algorithm analysis have been developed some of which have fewer preconditions. Simply demanding that these forms of analysi
21 hunch net-2005-10-20-Machine Learning in the News
22 hunch net-2005-10-19-Workshop: Atomic Learning
23 hunch net-2005-10-16-Complexity: It’s all in your head
24 hunch net-2005-10-13-Site tweak
25 hunch net-2005-10-12-The unrealized potential of the research lab
26 hunch net-2005-10-10-Predictive Search is Coming
27 hunch net-2005-10-08-We have a winner
28 hunch net-2005-10-07-On-line learning of regular decision rules
29 hunch net-2005-10-03-Not ICML
30 hunch net-2005-09-30-Research in conferences
31 hunch net-2005-09-26-Prediction Bounds as the Mathematics of Science
32 hunch net-2005-09-20-Workshop Proposal: Atomic Learning
33 hunch net-2005-09-19-NIPS Workshops
34 hunch net-2005-09-14-The Predictionist Viewpoint
35 hunch net-2005-09-12-Fast Gradient Descent
36 hunch net-2005-09-10-“Failure” is an option
37 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
38 hunch net-2005-09-06-A link
39 hunch net-2005-09-05-Site Update
40 hunch net-2005-09-04-Science in the Government
41 hunch net-2005-08-23-(Dis)similarities between academia and open source programmers
42 hunch net-2005-08-22-Do you believe in induction?
43 hunch net-2005-08-18-SVM Adaptability
44 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
45 hunch net-2005-08-08-Apprenticeship Reinforcement Learning for Control
46 hunch net-2005-08-04-Why Reinforcement Learning is Important
47 hunch net-2005-08-01-Peekaboom
48 hunch net-2005-07-27-Not goal metrics
49 hunch net-2005-07-23-Interesting papers at ACL
50 hunch net-2005-07-21-Six Months
51 hunch net-2005-07-14-What Learning Theory might do
52 hunch net-2005-07-13-Text Entailment at AAAI
53 hunch net-2005-07-13-“Sister Conference” presentations
54 hunch net-2005-07-11-AAAI blog
55 hunch net-2005-07-10-Thinking the Unthought
56 hunch net-2005-07-07-The Limits of Learning Theory
57 hunch net-2005-07-04-The Health of COLT
58 hunch net-2005-07-01-The Role of Impromptu Talks
59 hunch net-2005-06-29-Not EM for clustering at COLT
60 hunch net-2005-06-28-The cross validation problem: cash reward
61 hunch net-2005-06-28-A COLT paper
62 hunch net-2005-06-22-Languages of Learning
63 hunch net-2005-06-18-Lower Bounds for Learning Reductions
64 hunch net-2005-06-17-Reopening RL->Classification
65 hunch net-2005-06-13-Wikis for Summer Schools and Workshops
66 hunch net-2005-06-10-Workshops are not Conferences
67 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
68 hunch net-2005-06-06-Exact Online Learning for Classification
69 hunch net-2005-05-29-Maximum Margin Mismatch?
70 hunch net-2005-05-29-Bad ideas
71 hunch net-2005-05-28-Running A Machine Learning Summer School
72 hunch net-2005-05-21-What is the right form of modularity in structured prediction?
73 hunch net-2005-05-17-A Short Guide to PhD Graduate Study
74 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
76 hunch net-2005-05-12-Math on the Web
77 hunch net-2005-05-11-Visa Casualties
78 hunch net-2005-05-10-Learning Reductions are Reductionist
79 hunch net-2005-05-06-Don’t mix the solution into the problem
80 hunch net-2005-05-03-Conference attendance is mandatory
81 hunch net-2005-05-02-Reviewing techniques for conferences
82 hunch net-2005-04-28-Science Fiction and Research
83 hunch net-2005-04-27-DARPA project: LAGR
84 hunch net-2005-04-26-To calibrate or not?
85 hunch net-2005-04-25-Embeddings: what are they good for?
86 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
87 hunch net-2005-04-22-New Blog: [Lowerbounds,Upperbounds]
88 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
89 hunch net-2005-04-16-Which Assumptions are Reasonable?
90 hunch net-2005-04-14-Families of Learning Theory Statements
91 hunch net-2005-04-10-Is the Goal Understanding or Prediction?
92 hunch net-2005-04-08-Fast SVMs
93 hunch net-2005-04-06-Structured Regret Minimization
94 hunch net-2005-04-04-Grounds for Rejection
95 hunch net-2005-04-01-The Producer-Consumer Model of Research
96 hunch net-2005-04-01-Basic computer science research takes a hit
97 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
98 hunch net-2005-03-29-Academic Mechanism Design
99 hunch net-2005-03-28-Open Problems for Colt
100 hunch net-2005-03-24-The Role of Workshops
101 hunch net-2005-03-22-Active learning
102 hunch net-2005-03-21-Research Styles in Machine Learning
103 hunch net-2005-03-18-Binomial Weighting
104 hunch net-2005-03-17-Going all the Way, Sometimes
105 hunch net-2005-03-15-The State of Tight Bounds
106 hunch net-2005-03-13-Avoiding Bad Reviewing
107 hunch net-2005-03-10-Breaking Abstractions
108 hunch net-2005-03-09-Bad Reviewing
109 hunch net-2005-03-08-Fast Physics for Learning
110 hunch net-2005-03-05-Funding Research
111 hunch net-2005-03-04-The Big O and Constants in Learning
112 hunch net-2005-03-02-Prior, “Prior” and Bias
113 hunch net-2005-02-28-Regularization
114 hunch net-2005-02-27-Antilearning: When proximity goes bad
115 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics
116 hunch net-2005-02-25-Why Papers?
117 hunch net-2005-02-25-Solution: Reinforcement Learning with Classification
118 hunch net-2005-02-25-Problem: Online Learning
119 hunch net-2005-02-23-Problem: Reinforcement Learning with Classification
120 hunch net-2005-02-21-Problem: Cross Validation
121 hunch net-2005-02-20-At One Month
122 hunch net-2005-02-19-Machine learning reading groups
123 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models
124 hunch net-2005-02-18-What it means to do research.
125 hunch net-2005-02-17-Learning Research Programs
126 hunch net-2005-02-15-ESPgame and image labeling
127 hunch net-2005-02-14-Clever Methods of Overfitting
128 hunch net-2005-02-12-ROC vs. Accuracy vs. AROC
129 hunch net-2005-02-10-Conferences, Dates, Locations
130 hunch net-2005-02-09-Intuitions from applied learning
131 hunch net-2005-02-08-Some Links
132 hunch net-2005-02-07-The State of the Reduction
134 hunch net-2005-02-03-Learning Theory, by assumption
135 hunch net-2005-02-02-Paper Deadlines
136 hunch net-2005-02-02-Kolmogorov Complexity and Googling
137 hunch net-2005-02-01-Watchword: Loss
138 hunch net-2005-02-01-NIPS: Online Bayes
139 hunch net-2005-01-31-Watchword: Assumption
140 hunch net-2005-01-27-Learning Complete Problems
141 hunch net-2005-01-26-Watchword: Probability
142 hunch net-2005-01-26-Summer Schools
143 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
144 hunch net-2005-01-24-Holy grails of machine learning?
145 hunch net-2005-01-19-Why I decided to run a weblog.