hunch_net hunch_net-2011 hunch_net-2011-440 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I had a chance to attend UAI this year, where several papers interested me, including: Hoifung Poon and Pedro Domingos Sum-Product Networks: A New Deep Architecture . We’ve already discussed this one , but in a nutshell, they identify a large class of efficiently normalizable distributions and do learning with it. Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. Laurent Charlin , Richard Zemel and Craig Boutilier , A Framework for Optimizing Paper Matching . This is about what works in matching papers to reviewers, as has been tested at several previous
sentIndex sentText sentNum sentScore
1 Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . [sent-3, score-0.085]
2 This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. [sent-4, score-0.279]
3 I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. [sent-5, score-0.385]
4 This is about what works in matching papers to reviewers, as has been tested at several previous NIPS. [sent-7, score-0.141]
5 In addition I wanted to comment on Karl Friston ‘s invited talk . [sent-9, score-0.08]
6 At the outset, he made a claim that seems outlandish to me: The way the brain works is to minimize surprise as measured by a probabilistic model. [sent-10, score-0.645]
7 The majority of the talk was not actually about this—instead it was about how probabilistic models can plausibly do things that you might not have thought possible, such as birdsong. [sent-11, score-0.472]
8 Nevertheless, I think several of us in the room ended up stuck on the claim in questions afterward. [sent-12, score-0.093]
9 My personal belief is that world modeling (probabilistic or not) is a useful subroutine for intelligence, but it could not possibly be the entirety of intelligence. [sent-13, score-0.373]
10 A key reason for this is the bandwidth of our senses—we simply take in too much information to model everything with equal attention. [sent-14, score-0.18]
11 It seems critical for the efficient functioning of intelligence that only things which might plausibly matter are modeled, and only to the degree that matters. [sent-15, score-0.394]
12 In other words, I do not model the precise placement of items on my desk, or even the precise content of my desk, because these details simply do not matter. [sent-16, score-0.485]
13 Suppose for the moment that all the brain does is probabilistic modeling. [sent-18, score-0.411]
14 Then, the primary notion of failure to model is “surprise”, which is low probability events occurring. [sent-19, score-0.216]
15 Surprises (stumbles, car wrecks, and other accidents) certainly can be unpleasant, but this could be correct if modeling is a subroutine as well. [sent-20, score-0.458]
16 The clincher is that there are many unpleasant things which are not surprises, including keeping your head under water, fasting, and self-inflicted wounds. [sent-21, score-0.365]
17 Accounting for the unpleasantness of these events requires more than probabilistic modeling. [sent-22, score-0.495]
18 In other words, it requires rewards, which is why reinforcement learning is important . [sent-23, score-0.093]
19 As a byproduct, rewards also naturally create a focus of attention, addressing the computational efficiency issue. [sent-24, score-0.133]
20 Believing that intelligence is just probabilistic modeling is another example of simple wrong answer . [sent-25, score-0.68]
wordName wordTfidf (topN-words)
[('probabilistic', 0.29), ('surprises', 0.206), ('unpleasant', 0.206), ('intelligence', 0.2), ('modeling', 0.19), ('desk', 0.183), ('subroutine', 0.183), ('matrices', 0.153), ('surprise', 0.141), ('matching', 0.141), ('rewards', 0.133), ('matrix', 0.126), ('brain', 0.121), ('events', 0.112), ('precise', 0.11), ('words', 0.105), ('model', 0.104), ('plausibly', 0.102), ('solutions', 0.094), ('requires', 0.093), ('claim', 0.093), ('nutshell', 0.092), ('dale', 0.092), ('schuurmans', 0.092), ('functioning', 0.092), ('yu', 0.092), ('domingos', 0.092), ('hoifung', 0.092), ('poon', 0.092), ('accounting', 0.092), ('believing', 0.092), ('charlin', 0.092), ('outset', 0.092), ('zemel', 0.092), ('pedro', 0.085), ('byproduct', 0.085), ('inclined', 0.085), ('subspace', 0.085), ('car', 0.085), ('placement', 0.085), ('head', 0.085), ('water', 0.085), ('modeled', 0.085), ('anytime', 0.08), ('richard', 0.08), ('laurent', 0.08), ('talk', 0.08), ('items', 0.076), ('equal', 0.076), ('including', 0.074)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 440 hunch net-2011-08-06-Interesting thing at UAI 2011
Introduction: I had a chance to attend UAI this year, where several papers interested me, including: Hoifung Poon and Pedro Domingos Sum-Product Networks: A New Deep Architecture . We’ve already discussed this one , but in a nutshell, they identify a large class of efficiently normalizable distributions and do learning with it. Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. Laurent Charlin , Richard Zemel and Craig Boutilier , A Framework for Optimizing Paper Matching . This is about what works in matching papers to reviewers, as has been tested at several previous
2 0.13754176 431 hunch net-2011-04-18-A paper not at Snowbird
Introduction: Unfortunately, a scheduling failure meant I missed all of AIStat and most of the learning workshop , otherwise known as Snowbird, when it’s at Snowbird . At snowbird, the talk on Sum-Product networks by Hoifung Poon stood out to me ( Pedro Domingos is a coauthor.). The basic point was that by appropriately constructing networks based on sums and products, the normalization problem in probabilistic models is eliminated, yielding a highly tractable yet flexible representation+learning algorithm. As an algorithm, this is noticeably cleaner than deep belief networks with a claim to being an order of magnitude faster and working better on an image completion task. Snowbird doesn’t have real papers—just the abstract above. I look forward to seeing the paper. (added: Rodrigo points out the deep learning workshop draft .)
3 0.13158503 330 hunch net-2008-12-07-A NIPS paper
Introduction: I’m skipping NIPS this year in favor of Ada , but I wanted to point out this paper by Andriy Mnih and Geoff Hinton . The basic claim of the paper is that by carefully but automatically constructing a binary tree over words, it’s possible to predict words well with huge computational resource savings over unstructured approaches. I’m interested in this beyond the application to word prediction because it is relevant to the general normalization problem: If you want to predict the probability of one of a large number of events, often you must compute a predicted score for all the events and then normalize, a computationally inefficient operation. The problem comes up in many places using probabilistic models, but I’ve run into it with high-dimensional regression. There are a couple workarounds for this computational bug: Approximate. There are many ways. Often the approximations are uncontrolled (i.e. can be arbitrarily bad), and hence finicky in application. Avoid. Y
4 0.12010767 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models
Introduction: This is a paper by Yann LeCun and Fu Jie Huang published at AISTAT 2005 . I found this paper very difficult to read, but it does have some point about a computational shortcut. This paper takes for granted that the method of solving a problem is gradient descent on parameters. Given this assumption, the question arises: Do you want to do gradient descent on a probabilistic model or something else? All (conditional) probabilistic models have the form p(y|x) = f(x,y)/Z(x) where Z(x) = sum y f(x,y) (the paper calls - log f(x,y) an “energy”). If f is parameterized by some w , the gradient has a term for Z(x) , and hence for every value of y . The paper claims, that such models can be optimized for classification purposes using only the correct y and the other y’ not y which maximizes f(x,y) . This can even be done on unnormalizable models. The paper further claims that this can be done with an approximate maximum. These claims are plausible based on experimen
5 0.11272525 352 hunch net-2009-05-06-Machine Learning to AI
Introduction: I recently had fun discussions with both Vikash Mansinghka and Thomas Breuel about approaching AI with machine learning. The general interest in taking a crack at AI with machine learning seems to be rising on many fronts including DARPA . As a matter of history, there was a great deal of interest in AI which died down before I began research. There remain many projects and conferences spawned in this earlier AI wave, as well as a good bit of experience about what did not work, or at least did not work yet. Here are a few examples of failure modes that people seem to run into: Supply/Product confusion . Sometimes we think “Intelligences use X, so I’ll create X and have an Intelligence.” An example of this is the Cyc Project which inspires some people as “intelligences use ontologies, so I’ll create an ontology and a system using it to have an Intelligence.” The flaw here is that Intelligences create ontologies, which they use, and without the ability to create ont
6 0.090463094 353 hunch net-2009-05-08-Computability in Artificial Intelligence
7 0.089261755 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
8 0.089196831 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
9 0.087438934 454 hunch net-2012-01-30-ICML Posters and Scope
10 0.082832016 437 hunch net-2011-07-10-ICML 2011 and the future
11 0.082625791 160 hunch net-2006-03-02-Why do people count for learning?
12 0.078044556 287 hunch net-2008-01-28-Sufficient Computation
13 0.076992169 194 hunch net-2006-07-11-New Models
14 0.076291136 297 hunch net-2008-04-22-Taking the next step
15 0.075493023 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?
16 0.074323431 484 hunch net-2013-06-16-Representative Reviewing
17 0.074311957 5 hunch net-2005-01-26-Watchword: Probability
18 0.073570006 28 hunch net-2005-02-25-Problem: Online Learning
19 0.073218867 343 hunch net-2009-02-18-Decision by Vetocracy
20 0.072368786 325 hunch net-2008-11-10-ICML Reviewing Criteria
topicId topicWeight
[(0, 0.2), (1, -0.002), (2, 0.008), (3, 0.029), (4, 0.038), (5, 0.006), (6, -0.023), (7, 0.015), (8, 0.034), (9, -0.08), (10, -0.017), (11, -0.056), (12, -0.096), (13, -0.006), (14, -0.001), (15, 0.042), (16, -0.03), (17, 0.061), (18, 0.106), (19, 0.006), (20, -0.071), (21, 0.084), (22, -0.041), (23, 0.047), (24, 0.019), (25, 0.011), (26, 0.006), (27, -0.104), (28, 0.004), (29, -0.016), (30, -0.02), (31, 0.034), (32, 0.019), (33, -0.023), (34, 0.011), (35, -0.013), (36, 0.028), (37, 0.042), (38, -0.017), (39, -0.006), (40, -0.093), (41, -0.056), (42, -0.061), (43, 0.085), (44, 0.05), (45, 0.009), (46, -0.039), (47, 0.041), (48, 0.012), (49, 0.079)]
simIndex simValue blogId blogTitle
same-blog 1 0.96483505 440 hunch net-2011-08-06-Interesting thing at UAI 2011
Introduction: I had a chance to attend UAI this year, where several papers interested me, including: Hoifung Poon and Pedro Domingos Sum-Product Networks: A New Deep Architecture . We’ve already discussed this one , but in a nutshell, they identify a large class of efficiently normalizable distributions and do learning with it. Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. Laurent Charlin , Richard Zemel and Craig Boutilier , A Framework for Optimizing Paper Matching . This is about what works in matching papers to reviewers, as has been tested at several previous
2 0.80470371 330 hunch net-2008-12-07-A NIPS paper
Introduction: I’m skipping NIPS this year in favor of Ada , but I wanted to point out this paper by Andriy Mnih and Geoff Hinton . The basic claim of the paper is that by carefully but automatically constructing a binary tree over words, it’s possible to predict words well with huge computational resource savings over unstructured approaches. I’m interested in this beyond the application to word prediction because it is relevant to the general normalization problem: If you want to predict the probability of one of a large number of events, often you must compute a predicted score for all the events and then normalize, a computationally inefficient operation. The problem comes up in many places using probabilistic models, but I’ve run into it with high-dimensional regression. There are a couple workarounds for this computational bug: Approximate. There are many ways. Often the approximations are uncontrolled (i.e. can be arbitrarily bad), and hence finicky in application. Avoid. Y
3 0.59047562 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models
Introduction: This is a paper by Yann LeCun and Fu Jie Huang published at AISTAT 2005 . I found this paper very difficult to read, but it does have some point about a computational shortcut. This paper takes for granted that the method of solving a problem is gradient descent on parameters. Given this assumption, the question arises: Do you want to do gradient descent on a probabilistic model or something else? All (conditional) probabilistic models have the form p(y|x) = f(x,y)/Z(x) where Z(x) = sum y f(x,y) (the paper calls - log f(x,y) an “energy”). If f is parameterized by some w , the gradient has a term for Z(x) , and hence for every value of y . The paper claims, that such models can be optimized for classification purposes using only the correct y and the other y’ not y which maximizes f(x,y) . This can even be done on unnormalizable models. The paper further claims that this can be done with an approximate maximum. These claims are plausible based on experimen
4 0.58420753 97 hunch net-2005-07-23-Interesting papers at ACL
Introduction: A recent discussion indicated that one goal of this blog might be to allow people to post comments about recent papers that they liked. I think this could potentially be very useful, especially for those with diverse interests but only finite time to read through conference proceedings. ACL 2005 recently completed, and here are four papers from that conference that I thought were either good or perhaps of interest to a machine learning audience. David Chiang, A Hierarchical Phrase-Based Model for Statistical Machine Translation . (Best paper award.) This paper takes the standard phrase-based MT model that is popular in our field (basically, translate a sentence by individually translating phrases and reordering them according to a complicated statistical model) and extends it to take into account hierarchy in phrases, so that you can learn things like “X ‘s Y” -> “Y de X” in chinese, where X and Y are arbitrary phrases. This takes a step toward linguistic syntax for MT, whic
5 0.57540512 77 hunch net-2005-05-29-Maximum Margin Mismatch?
Introduction: John makes a fascinating point about structured classification (and slightly scooped my post!). Maximum Margin Markov Networks (M3N) are an interesting example of the second class of structured classifiers (where the classification of one label depends on the others), and one of my favorite papers. I’m not alone: the paper won the best student paper award at NIPS in 2003. There are some things I find odd about the paper. For instance, it says of probabilistic models “cannot handle high dimensional feature spaces and lack strong theoretical guarrantees.” I’m aware of no such limitations. Also: “Unfortunately, even probabilistic graphical models that are trained discriminatively do not achieve the same level of performance as SVMs, especially when kernel features are used.” This is quite interesting and contradicts my own experience as well as that of a number of people I greatly respect . I wonder what the root cause is: perhaps there is something different abo
6 0.54974645 189 hunch net-2006-07-05-more icml papers
7 0.54759276 135 hunch net-2005-12-04-Watchword: model
8 0.54053336 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
9 0.53993833 431 hunch net-2011-04-18-A paper not at Snowbird
10 0.531986 185 hunch net-2006-06-16-Regularization = Robustness
11 0.52878857 62 hunch net-2005-04-26-To calibrate or not?
12 0.52785802 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
13 0.52068907 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
14 0.50755847 140 hunch net-2005-12-14-More NIPS Papers II
15 0.50748175 139 hunch net-2005-12-11-More NIPS Papers
16 0.50031 52 hunch net-2005-04-04-Grounds for Rejection
17 0.4997606 153 hunch net-2006-02-02-Introspectionism as a Disease
18 0.49821845 238 hunch net-2007-04-13-What to do with an unreasonable conditional accept
19 0.49391475 5 hunch net-2005-01-26-Watchword: Probability
20 0.49203423 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
topicId topicWeight
[(3, 0.043), (27, 0.195), (37, 0.039), (38, 0.041), (48, 0.011), (53, 0.073), (55, 0.087), (70, 0.354), (94, 0.052), (95, 0.031)]
simIndex simValue blogId blogTitle
same-blog 1 0.85789984 440 hunch net-2011-08-06-Interesting thing at UAI 2011
Introduction: I had a chance to attend UAI this year, where several papers interested me, including: Hoifung Poon and Pedro Domingos Sum-Product Networks: A New Deep Architecture . We’ve already discussed this one , but in a nutshell, they identify a large class of efficiently normalizable distributions and do learning with it. Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. Laurent Charlin , Richard Zemel and Craig Boutilier , A Framework for Optimizing Paper Matching . This is about what works in matching papers to reviewers, as has been tested at several previous
2 0.7001555 44 hunch net-2005-03-21-Research Styles in Machine Learning
Introduction: Machine Learning is a field with an impressively diverse set of reseearch styles. Understanding this may be important in appreciating what you see at a conference. Engineering . How can I solve this problem? People in the engineering research style try to solve hard problems directly by any means available and then describe how they did it. This is typical of problem-specific conferences and communities. Scientific . What are the principles for solving learning problems? People in this research style test techniques on many different problems. This is fairly common at ICML and NIPS. Mathematical . How can the learning problem be mathematically understood? People in this research style prove theorems with implications for learning but often do not implement (or test algorithms). COLT is a typical conference for this style. Many people manage to cross these styles, and that is often beneficial. Whenver we list a set of alternative, it becomes natural to think “wh
3 0.55729365 194 hunch net-2006-07-11-New Models
Introduction: How should we, as researchers in machine learning, organize ourselves? The most immediate measurable objective of computer science research is publishing a paper. The most difficult aspect of publishing a paper is having reviewers accept and recommend it for publication. The simplest mechanism for doing this is to show theoretical progress on some standard, well-known easily understood problem. In doing this, we often fall into a local minima of the research process. The basic problem in machine learning is that it is very unclear that the mathematical model is the right one for the (or some) real problem. A good mathematical model in machine learning should have one fundamental trait: it should aid the design of effective learning algorithms. To date, our ability to solve interesting learning problems (speech recognition, machine translation, object recognition, etc…) remains limited (although improving), so the “rightness” of our models is in doubt. If our mathematical mod
4 0.54167408 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
Introduction: “Deep learning” is used to describe learning architectures which have significant depth (as a circuit). One claim is that shallow architectures (one or two layers) can not concisely represent some functions while a circuit with more depth can concisely represent these same functions. Proving lower bounds on the size of a circuit is substantially harder than upper bounds (which are constructive), but some results are known. Luca Trevisan ‘s class notes detail how XOR is not concisely representable by “AC0″ (= constant depth unbounded fan-in AND, OR, NOT gates). This doesn’t quite prove that depth is necessary for the representations commonly used in learning (such as a thresholded weighted sum), but it is strongly suggestive that this is so. Examples like this are a bit disheartening because existing algorithms for deep learning (deep belief nets, gradient descent on deep neural networks, and a perhaps decision trees depending on who you ask) can’t learn XOR very easily.
5 0.54075688 134 hunch net-2005-12-01-The Webscience Future
Introduction: The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized. First, let’s acknowledge some known effects. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository ( Arxiv ) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research. Now, let’s look at attempts to go further. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic). Organization of some subfield
6 0.54068846 343 hunch net-2009-02-18-Decision by Vetocracy
7 0.54022789 183 hunch net-2006-06-14-Explorations of Exploration
8 0.54011959 370 hunch net-2009-09-18-Necessary and Sufficient Research
9 0.53947651 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
10 0.53876388 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
11 0.53863084 351 hunch net-2009-05-02-Wielding a New Abstraction
12 0.53841281 225 hunch net-2007-01-02-Retrospective
13 0.53786308 437 hunch net-2011-07-10-ICML 2011 and the future
14 0.53772008 320 hunch net-2008-10-14-Who is Responsible for a Bad Review?
15 0.53678763 95 hunch net-2005-07-14-What Learning Theory might do
16 0.5367347 12 hunch net-2005-02-03-Learning Theory, by assumption
17 0.53652531 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
18 0.53568268 403 hunch net-2010-07-18-ICML & COLT 2010
19 0.53552055 347 hunch net-2009-03-26-Machine Learning is too easy
20 0.53527719 484 hunch net-2013-06-16-Representative Reviewing