hunch_net hunch_net-2007 hunch_net-2007-280 knowledge-graph by maker-knowledge-mining

280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

meta infos for this blog

Source: html

Introduction: Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. P. Mudigonda, V. Kolmogorov, P. Torr. An Analysis of Convex Relaxations for MAP Estimation. A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. Be careful next time you try a cool new convex relaxation! 2. D. Sontag, T. Jaakkola. New Outer Bounds on the Marginal Polytope. The title says it all. The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distributio

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. [sent-1, score-0.667]

2 Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. [sent-3, score-0.428]

3 A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. [sent-9, score-0.496]

4 Be careful next time you try a cool new convex relaxation! [sent-10, score-0.327]

5 The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distribution over all the variables consistent with all the local marginal distributions. [sent-17, score-1.879]

6 It is an interesting mathematical object to study, and this work builds on the work by Martin Wainwright’s upper bounding the log partition function paper, proposing improved outer bounds on the marginal polytope. [sent-18, score-1.135]

7 I think there is a little theme going on this year relating approximate inference to convex optimization. [sent-19, score-0.385]

8 Besides the above two papers there were some other papers as well. [sent-20, score-0.226]

9 A cute idea of how you can construct an experimental set-up such that people act as accept/reject modules in a Metropolis-Hastings framework, so that we can probe what is the prior distribution encoded in people’s brains. [sent-26, score-0.276]

10 Another surprising result, that in attractive networks, if loopy belief propagation converges, the Bethe free energy is actually a LOWER bound on the log partition function. [sent-33, score-0.764]

11 An interesting idea to construct Bayesian networks with infinite number of states, using a pretty complex set-up involving hierarchical Dirichlet processes. [sent-40, score-0.418]

12 I am not sure if the software is out, but I think building such general frameworks for nonparametric models is quite useful for many people who want to use such models but don’t want to spend too much time coding up the sometimes involved MCMC samplers. [sent-41, score-0.081]

13 I also liked Luis von Ahn’s invited talk on Human Computation. [sent-42, score-0.121]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('marginal', 0.35), ('convex', 0.222), ('bethe', 0.197), ('outer', 0.197), ('wainwright', 0.162), ('attractive', 0.162), ('relaxations', 0.153), ('partition', 0.146), ('infinite', 0.14), ('surprising', 0.121), ('liked', 0.121), ('variables', 0.118), ('bounds', 0.117), ('consistent', 0.115), ('construct', 0.113), ('papers', 0.113), ('cool', 0.105), ('local', 0.1), ('post', 0.097), ('networks', 0.088), ('propagation', 0.087), ('modules', 0.087), ('sudderth', 0.087), ('globally', 0.087), ('loopy', 0.087), ('theme', 0.087), ('apologies', 0.087), ('bounding', 0.087), ('mcmc', 0.087), ('sontag', 0.087), ('kolmogorov', 0.081), ('ahn', 0.081), ('loop', 0.081), ('neat', 0.081), ('variational', 0.081), ('lp', 0.081), ('relaxation', 0.081), ('guest', 0.081), ('builds', 0.081), ('frameworks', 0.081), ('free', 0.081), ('log', 0.08), ('interesting', 0.077), ('relating', 0.076), ('chain', 0.076), ('encoded', 0.076), ('carlo', 0.076), ('monte', 0.076), ('subsets', 0.076), ('posting', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

2 0.12511504 235 hunch net-2007-03-03-All Models of Learning have Flaws

Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c

3 0.123962 189 hunch net-2006-07-05-more icml papers

Introduction: Here are a few other papers I enjoyed from ICML06. Topic Models: Dynamic Topic Models David Blei, John Lafferty A nice model for how topics in LDA type models can evolve over time, using a linear dynamical system on the natural parameters and a very clever structured variational approximation (in which the mean field parameters are pseudo-observations of a virtual LDS). Like all Blei papers, he makes it look easy, but it is extremely impressive. Pachinko Allocation Wei Li, Andrew McCallum A very elegant (but computationally challenging) model which induces correlation amongst topics using a multi-level DAG whose interior nodes are “super-topics” and “sub-topics” and whose leaves are the vocabulary words. Makes the slumbering monster of structure learning stir. Sequence Analysis (I missed these talks since I was chairing another session) Online Decoding of Markov Models with Latency Constraints Mukund Narasimhan, Paul Viola, Michael Shilman An “a

4 0.11750229 77 hunch net-2005-05-29-Maximum Margin Mismatch?

Introduction: John makes a fascinating point about structured classification (and slightly scooped my post!). Maximum Margin Markov Networks (M3N) are an interesting example of the second class of structured classifiers (where the classification of one label depends on the others), and one of my favorite papers. I’m not alone: the paper won the best student paper award at NIPS in 2003. There are some things I find odd about the paper. For instance, it says of probabilistic models “cannot handle high dimensional feature spaces and lack strong theoretical guarrantees.” I’m aware of no such limitations. Also: “Unfortunately, even probabilistic graphical models that are trained discriminatively do not achieve the same level of performance as SVMs, especially when kernel features are used.” This is quite interesting and contradicts my own experience as well as that of a number of people I greatly respect . I wonder what the root cause is: perhaps there is something different abo

5 0.10676257 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

6 0.098859653 225 hunch net-2007-01-02-Retrospective

7 0.09544275 177 hunch net-2006-05-05-An ICML reject

8 0.093075305 185 hunch net-2006-06-16-Regularization = Robustness

9 0.0930392 140 hunch net-2005-12-14-More NIPS Papers II

10 0.089738801 144 hunch net-2005-12-28-Yet more nips thoughts

11 0.087795854 139 hunch net-2005-12-11-More NIPS Papers

12 0.08730121 420 hunch net-2010-12-26-NIPS 2010

13 0.079869211 166 hunch net-2006-03-24-NLPers

14 0.079447478 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

15 0.078738615 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

16 0.077185653 8 hunch net-2005-02-01-NIPS: Online Bayes

17 0.07561142 456 hunch net-2012-02-24-ICML+50%

18 0.074776076 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

19 0.074641369 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011

20 0.073838688 351 hunch net-2009-05-02-Wielding a New Abstraction

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.194), (1, 0.009), (2, 0.023), (3, -0.01), (4, 0.053), (5, 0.011), (6, 0.005), (7, -0.076), (8, 0.11), (9, -0.082), (10, 0.048), (11, 0.006), (12, -0.116), (13, -0.061), (14, 0.074), (15, -0.054), (16, -0.054), (17, 0.053), (18, 0.062), (19, -0.071), (20, 0.024), (21, 0.024), (22, -0.07), (23, -0.051), (24, 0.03), (25, 0.051), (26, -0.079), (27, 0.02), (28, 0.04), (29, -0.051), (30, -0.032), (31, -0.021), (32, 0.02), (33, 0.01), (34, -0.01), (35, 0.019), (36, -0.001), (37, -0.005), (38, -0.053), (39, 0.014), (40, 0.001), (41, -0.035), (42, 0.027), (43, -0.003), (44, -0.018), (45, 0.043), (46, -0.07), (47, 0.027), (48, -0.024), (49, -0.004)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96078134 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

2 0.78292 139 hunch net-2005-12-11-More NIPS Papers

Introduction: Let me add to John’s post with a few of my own favourites from this year’s conference. First, let me say that Sanjoy’s talk, Coarse Sample Complexity Bounds for Active Learning was also one of my favourites, as was the Forgettron paper . I also really enjoyed the last third of Christos’ talk on the complexity of finding Nash equilibria. And, speaking of tagging, I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool. Finally, let me add my recommendations for specific papers: Z. Ghahramani, K. Heller: Bayesian Sets [no preprint] (A very elegant probabilistic information retrieval style model of which objects are “most like” a given subset of objects.) T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and the Indian Buffet Process [ preprint ] (A Dirichlet style prior over infinite binary matrices with beautiful exchangeability properties.) K. Weinberger, J. Blitzer, L. Saul: Distance Metric Lea

3 0.77168131 189 hunch net-2006-07-05-more icml papers

4 0.71768558 140 hunch net-2005-12-14-More NIPS Papers II

Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating

5 0.69950265 97 hunch net-2005-07-23-Interesting papers at ACL

Introduction: A recent discussion indicated that one goal of this blog might be to allow people to post comments about recent papers that they liked. I think this could potentially be very useful, especially for those with diverse interests but only finite time to read through conference proceedings. ACL 2005 recently completed, and here are four papers from that conference that I thought were either good or perhaps of interest to a machine learning audience. David Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation . (Best paper award.) This paper takes the standard phrase-based MT model that is popular in our field (basically, translate a sentence by individually translating phrases and reordering them according to a complicated statistical model) and extends it to take into account hierarchy in phrases, so that you can learn things like “X ‘s Y” -> “Y de X” in chinese, where X and Y are arbitrary phrases. This takes a step toward linguistic syntax for MT, whic

6 0.69759136 77 hunch net-2005-05-29-Maximum Margin Mismatch?

7 0.66078866 144 hunch net-2005-12-28-Yet more nips thoughts

8 0.64600825 185 hunch net-2006-06-16-Regularization = Robustness

9 0.57271338 188 hunch net-2006-06-30-ICML papers

10 0.56588483 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

11 0.54937649 440 hunch net-2011-08-06-Interesting thing at UAI 2011

12 0.54425246 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

13 0.52858859 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

14 0.50016308 87 hunch net-2005-06-29-Not EM for clustering at COLT

15 0.4998273 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models

16 0.49185351 330 hunch net-2008-12-07-A NIPS paper

17 0.48918253 439 hunch net-2011-08-01-Interesting papers at COLT 2011

18 0.48819181 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

19 0.48691016 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize

20 0.48599499 61 hunch net-2005-04-25-Embeddings: what are they good for?

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(8, 0.355), (27, 0.22), (30, 0.019), (38, 0.03), (48, 0.034), (49, 0.037), (53, 0.056), (55, 0.079), (94, 0.052), (95, 0.036)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.86112285 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

2 0.85254794 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear

3 0.56421709 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

Introduction: Suppose we have a set of observations over time x 1 ,x 2 ,…,x t and want to predict some future event y t+1 . An inevitable problem arises, because learning a predictor h(x 1 ,…,x t ) of y t+1 is generically intractable due to the size of the input. To make this problem tractable, what’s necessary is a method for summarizing the relevant information in past observations for the purpose of prediction in the future. In other words, state is required. Existing approaches for deriving state have some limitations. Hidden Markov models learned with EM suffer from local minima, use tabular learning approaches which provide dubious generalization ability, and often require substantial a.priori specification of the observations. Kalman Filters and Particle Filters are very parametric in the sense that substantial information must be specified up front. Dynamic Bayesian Networks ( graphical models through time) require substantial a.priori specification and often re

4 0.56340832 194 hunch net-2006-07-11-New Models

Introduction: How should we, as researchers in machine learning, organize ourselves? The most immediate measurable objective of computer science research is publishing a paper. The most difficult aspect of publishing a paper is having reviewers accept and recommend it for publication. The simplest mechanism for doing this is to show theoretical progress on some standard, well-known easily understood problem. In doing this, we often fall into a local minima of the research process. The basic problem in machine learning is that it is very unclear that the mathematical model is the right one for the (or some) real problem. A good mathematical model in machine learning should have one fundamental trait: it should aid the design of effective learning algorithms. To date, our ability to solve interesting learning problems (speech recognition, machine translation, object recognition, etc…) remains limited (although improving), so the “rightness” of our models is in doubt. If our mathematical mod

5 0.563178 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto

6 0.56260681 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

7 0.56170762 360 hunch net-2009-06-15-In Active Learning, the question changes

8 0.56134647 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.56075698 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

10 0.56055295 225 hunch net-2007-01-02-Retrospective

11 0.56050485 370 hunch net-2009-09-18-Necessary and Sufficient Research

12 0.55884987 220 hunch net-2006-11-27-Continuizing Solutions

13 0.55757195 347 hunch net-2009-03-26-Machine Learning is too easy

14 0.55751908 351 hunch net-2009-05-02-Wielding a New Abstraction

15 0.55699855 12 hunch net-2005-02-03-Learning Theory, by assumption

16 0.55638063 378 hunch net-2009-11-15-The Other Online Learning

17 0.55627954 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.55623043 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models

19 0.55546296 41 hunch net-2005-03-15-The State of Tight Bounds

20 0.55526251 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class