hunch_net hunch_net-2005 hunch_net-2005-140 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
sentIndex sentText sentNum sentScore
1 I thought this was a very good NIPS with many excellent papers. [sent-1, score-0.089]
2 The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. [sent-2, score-0.148]
3 The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. [sent-3, score-0.107]
4 I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. [sent-5, score-0.43]
5 The results look good and I’d like to understand the limitations. [sent-6, score-0.073]
6 Rodeo: Sparse nonparametric regression in high dimensions. [sent-7, score-0.335]
7 A very interesting approach to feature selection in nonparametric regression from a frequentist framework. [sent-9, score-0.784]
8 The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. [sent-10, score-0.616]
9 Interpolating between types and tokens by estimating power law generators. [sent-11, score-0.281]
10 I had wondered how Chinese restaurant processes and Pitman-Yor processes related to Zipf’s plots and power laws for word frequencies. [sent-16, score-0.794]
11 When I first learned about spatial scan statistics I wondered what a Bayesian counterpart would be. [sent-23, score-0.699]
12 I liked the fact they their method was simple, more accurate, and much faster than the usual frequentist method. [sent-24, score-0.52]
13 A very interesting application of sub-modular function optimization to clustering. [sent-30, score-0.095]
14 It’s useful for Gaussian process practitioners to know that their approaches don’t do silly things when viewed from a worst-case frequentist setting. [sent-37, score-0.573]
wordName wordTfidf (topN-words)
[('frequentist', 0.282), ('preconditioner', 0.242), ('rodeo', 0.242), ('scan', 0.242), ('spatial', 0.242), ('gaussian', 0.217), ('wondered', 0.215), ('regression', 0.174), ('nonparametric', 0.161), ('liked', 0.148), ('processes', 0.129), ('power', 0.12), ('john', 0.113), ('matthias', 0.107), ('griffiths', 0.107), ('determination', 0.107), ('exhaustive', 0.107), ('gregory', 0.107), ('narasimhan', 0.107), ('plots', 0.107), ('ravikumar', 0.107), ('seeger', 0.107), ('reminds', 0.099), ('novel', 0.099), ('chinese', 0.099), ('lafferty', 0.099), ('feels', 0.099), ('practitioners', 0.099), ('silly', 0.099), ('interesting', 0.095), ('larry', 0.094), ('laws', 0.094), ('process', 0.093), ('usual', 0.09), ('relevance', 0.09), ('hot', 0.09), ('thought', 0.089), ('bayesian', 0.089), ('nips', 0.087), ('approximations', 0.086), ('dean', 0.086), ('estimating', 0.083), ('dimension', 0.083), ('kakade', 0.078), ('types', 0.078), ('accurate', 0.074), ('results', 0.073), ('variables', 0.072), ('selection', 0.072), ('sham', 0.072)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 140 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
2 0.16161557 5 hunch net-2005-01-26-Watchword: Probability
Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in
3 0.11935037 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
Introduction: I have recently completed a 500+ page-book on MDL , the first comprehensive overview of the field (yes, this is a sneak advertisement ). Chapter 17 compares MDL to a menagerie of other methods and paradigms for learning and statistics. By far the most time (20 pages) is spent on the relation between MDL and Bayes. My two main points here are: In sharp contrast to Bayes, MDL is by definition based on designing universal codes for the data relative to some given (parametric or nonparametric) probabilistic model M. By some theorems due to Andrew Barron , MDL inference must therefore be statistically consistent, and it is immune to Bayesian inconsistency results such as those by Diaconis, Freedman and Barron (I explain what I mean by “inconsistency” further below). Hence, MDL must be different from Bayes! In contrast to what has sometimes been claimed, practical MDL algorithms do have a subjective component (which in many, but not all cases, may be implemented by somethin
4 0.097882144 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
Introduction: Here’s a list of papers that I found interesting at ICML / COLT / UAI in 2009. Elad Hazan and Comandur Seshadhri Efficient learning algorithms for changing environments at ICML. This paper shows how to adapt learning algorithms that compete with fixed predictors to compete with changing policies. The definition of regret they deal with seems particularly useful in many situation. Hal Daume , Unsupervised Search-based Structured Prediction at ICML. This paper shows a technique for reducing unsupervised learning to supervised learning which (a) make a fast unsupervised learning algorithm and (b) makes semisupervised learning both easy and highly effective. There were two papers with similar results on active learning in the KWIK framework for linear regression, both reducing the sample complexity to . One was Nicolo Cesa-Bianchi , Claudio Gentile , and Francesco Orabona Robust Bounds for Classification via Selective Sampling at ICML and the other was Thoma
5 0.093975388 466 hunch net-2012-06-05-ICML acceptance statistics
Introduction: People are naturally interested in slicing the ICML acceptance statistics in various ways. Here’s a rundown for the top categories. 18/66 = 0.27 in (0.18,0.36) Reinforcement Learning 10/52 = 0.19 in (0.17,0.37) Supervised Learning 9/51 = 0.18 not in (0.18, 0.37) Clustering 12/46 = 0.26 in (0.17, 0.37) Kernel Methods 11/40 = 0.28 in (0.15, 0.4) Optimization Algorithms 8/33 = 0.24 in (0.15, 0.39) Learning Theory 14/33 = 0.42 not in (0.15, 0.39) Graphical Models 10/32 = 0.31 in (0.15, 0.41) Applications (+5 invited) 8/29 = 0.28 in (0.14, 0.41]) Probabilistic Models 13/29 = 0.45 not in (0.14, 0.41) NN & Deep Learning 8/26 = 0.31 in (0.12, 0.42) Transfer and Multi-Task Learning 13/25 = 0.52 not in (0.12, 0.44) Online Learning 5/25 = 0.20 in (0.12, 0.44) Active Learning 6/22 = 0.27 in (0.14, 0.41) Semi-Superv
6 0.0930392 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
7 0.090480916 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
8 0.08808089 8 hunch net-2005-02-01-NIPS: Online Bayes
9 0.084947459 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
10 0.079750746 456 hunch net-2012-02-24-ICML+50%
11 0.077129096 61 hunch net-2005-04-25-Embeddings: what are they good for?
12 0.077034764 39 hunch net-2005-03-10-Breaking Abstractions
13 0.076217405 189 hunch net-2006-07-05-more icml papers
14 0.068979204 201 hunch net-2006-08-07-The Call of the Deep
15 0.068964764 95 hunch net-2005-07-14-What Learning Theory might do
16 0.068852365 144 hunch net-2005-12-28-Yet more nips thoughts
17 0.066465609 60 hunch net-2005-04-23-Advantages and Disadvantages of Bayesian Learning
18 0.066373736 77 hunch net-2005-05-29-Maximum Margin Mismatch?
19 0.065137744 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
20 0.06333854 347 hunch net-2009-03-26-Machine Learning is too easy
topicId topicWeight
[(0, 0.141), (1, 0.032), (2, 0.013), (3, -0.028), (4, 0.061), (5, 0.041), (6, 0.012), (7, -0.008), (8, 0.079), (9, -0.059), (10, 0.023), (11, -0.055), (12, -0.083), (13, -0.07), (14, 0.023), (15, -0.031), (16, -0.096), (17, 0.063), (18, 0.086), (19, -0.107), (20, 0.017), (21, 0.027), (22, 0.0), (23, 0.025), (24, -0.011), (25, 0.015), (26, -0.022), (27, -0.021), (28, -0.047), (29, 0.03), (30, -0.043), (31, -0.01), (32, 0.017), (33, -0.03), (34, -0.013), (35, 0.065), (36, -0.013), (37, 0.029), (38, 0.026), (39, 0.068), (40, 0.04), (41, 0.036), (42, 0.071), (43, 0.07), (44, -0.052), (45, 0.073), (46, 0.015), (47, 0.076), (48, -0.087), (49, 0.052)]
simIndex simValue blogId blogTitle
same-blog 1 0.96984261 140 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
2 0.63378155 77 hunch net-2005-05-29-Maximum Margin Mismatch?
Introduction: John makes a fascinating point about structured classification (and slightly scooped my post!). Maximum Margin Markov Networks (M3N) are an interesting example of the second class of structured classifiers (where the classification of one label depends on the others), and one of my favorite papers. I’m not alone: the paper won the best student paper award at NIPS in 2003. There are some things I find odd about the paper. For instance, it says of probabilistic models “cannot handle high dimensional feature spaces and lack strong theoretical guarrantees.” I’m aware of no such limitations. Also: “Unfortunately, even probabilistic graphical models that are trained discriminatively do not achieve the same level of performance as SVMs, especially when kernel features are used.” This is quite interesting and contradicts my own experience as well as that of a number of people I greatly respect . I wonder what the root cause is: perhaps there is something different abo
3 0.62690318 139 hunch net-2005-12-11-More NIPS Papers
Introduction: Let me add to John’s post with a few of my own favourites from this year’s conference. First, let me say that Sanjoy’s talk, Coarse Sample Complexity Bounds for Active Learning was also one of my favourites, as was the Forgettron paper . I also really enjoyed the last third of Christos’ talk on the complexity of finding Nash equilibria. And, speaking of tagging, I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool. Finally, let me add my recommendations for specific papers: Z. Ghahramani, K. Heller: Bayesian Sets [no preprint] (A very elegant probabilistic information retrieval style model of which objects are “most like” a given subset of objects.) T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and the Indian Buffet Process [ preprint ] (A Dirichlet style prior over infinite binary matrices with beautiful exchangeability properties.) K. Weinberger, J. Blitzer, L. Saul: Distance Metric Lea
4 0.62361908 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
Introduction: Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. P. Mudigonda, V. Kolmogorov, P. Torr. An Analysis of Convex Relaxations for MAP Estimation. A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. Be careful next time you try a cool new convex relaxation! 2. D. Sontag, T. Jaakkola. New Outer Bounds on the Marginal Polytope. The title says it all. The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distributio
5 0.59573233 5 hunch net-2005-01-26-Watchword: Probability
Introduction: Probability is one of the most confusingly used words in machine learning. There are at least 3 distinct ways the word is used. Bayesian The Bayesian notion of probability is a ‘degree of belief’. The degree of belief that some event (i.e. “stock goes up” or “stock goes down”) occurs can be measured by asking a sequence of questions of the form “Would you bet the stock goes up or down at Y to 1 odds?” A consistent better will switch from ‘for’ to ‘against’ at some single value of Y . The probability is then Y/(Y+1) . Bayesian probabilities express lack of knowledge rather than randomization. They are useful in learning because we often lack knowledge and expressing that lack flexibly makes the learning algorithms work better. Bayesian Learning uses ‘probability’ in this way exclusively. Frequentist The Frequentist notion of probability is a rate of occurence. A rate of occurrence can be measured by doing an experiment many times. If an event occurs k times in
6 0.59011543 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
7 0.58454841 189 hunch net-2006-07-05-more icml papers
8 0.56786036 144 hunch net-2005-12-28-Yet more nips thoughts
9 0.55843371 330 hunch net-2008-12-07-A NIPS paper
10 0.54097152 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
11 0.52248579 123 hunch net-2005-10-16-Complexity: It’s all in your head
12 0.47700387 61 hunch net-2005-04-25-Embeddings: what are they good for?
13 0.47574511 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
14 0.47387481 97 hunch net-2005-07-23-Interesting papers at ACL
15 0.46526557 440 hunch net-2011-08-06-Interesting thing at UAI 2011
16 0.46355492 185 hunch net-2006-06-16-Regularization = Robustness
17 0.44771105 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
18 0.44394511 118 hunch net-2005-10-07-On-line learning of regular decision rules
19 0.43553829 62 hunch net-2005-04-26-To calibrate or not?
20 0.42734134 466 hunch net-2012-06-05-ICML acceptance statistics
topicId topicWeight
[(1, 0.079), (3, 0.029), (7, 0.239), (10, 0.029), (24, 0.018), (27, 0.157), (30, 0.052), (33, 0.026), (44, 0.037), (48, 0.019), (53, 0.077), (55, 0.079), (84, 0.011), (94, 0.036), (95, 0.013)]
simIndex simValue blogId blogTitle
same-blog 1 0.90118295 140 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
2 0.86343551 355 hunch net-2009-05-19-CI Fellows
Introduction: Lev Reyzin points out the CI Fellows Project . Essentially, NSF is funding 60 postdocs in computer science for graduates from a wide array of US places to a wide array of US places. This is particularly welcome given a tough year for new hires. I expect some fraction of these postdocs will be in ML. The time frame is quite short, so those interested should look it over immediately.
3 0.78652471 184 hunch net-2006-06-15-IJCAI is out of season
Introduction: IJCAI is running January 6-12 in Hyderabad India rather than a more traditional summer date. (Presumably, this is to avoid melting people in the Indian summer.) The paper deadline(June 23 abstract / June 30 submission) are particularly inconvenient if you attend COLT or ICML . But on the other hand, it’s a good excuse to visit India.
4 0.74257696 209 hunch net-2006-09-19-Luis von Ahn is awarded a MacArthur fellowship.
Introduction: For his work on the subject of human computation including ESPGame , Peekaboom , and Phetch . The new MacArthur fellows .
5 0.62521416 39 hunch net-2005-03-10-Breaking Abstractions
Introduction: Sam Roweis ‘s comment reminds me of a more general issue that comes up in doing research: abstractions always break. Real number’s aren’t. Most real numbers can not be represented with any machine. One implication of this is that many real-number based algorithms have difficulties when implemented with floating point numbers. The box on your desk is not a turing machine. A turing machine can compute anything computable, given sufficient time. A typical computer fails terribly when the state required for the computation exceeds some limit. Nash equilibria aren’t equilibria. This comes up when trying to predict human behavior based on the result of the equilibria computation. Often, it doesn’t work. The probability isn’t. Probability is an abstraction expressing either our lack of knowledge (the Bayesian viewpoint) or fundamental randomization (the frequentist viewpoint). From the frequentist viewpoint the lack of knowledge typically precludes actually knowing the fu
6 0.62437415 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?
7 0.60435683 76 hunch net-2005-05-29-Bad ideas
8 0.59053111 225 hunch net-2007-01-02-Retrospective
9 0.58374208 444 hunch net-2011-09-07-KDD and MUCMD 2011
10 0.5834291 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
11 0.57819885 207 hunch net-2006-09-12-Incentive Compatible Reviewing
12 0.57811373 96 hunch net-2005-07-21-Six Months
13 0.57782745 437 hunch net-2011-07-10-ICML 2011 and the future
14 0.5764659 5 hunch net-2005-01-26-Watchword: Probability
15 0.57625771 134 hunch net-2005-12-01-The Webscience Future
16 0.57610804 484 hunch net-2013-06-16-Representative Reviewing
17 0.57471669 201 hunch net-2006-08-07-The Call of the Deep
18 0.57416809 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
19 0.5732525 40 hunch net-2005-03-13-Avoiding Bad Reviewing
20 0.57301807 343 hunch net-2009-02-18-Decision by Vetocracy