hunch_net hunch_net-2005 hunch_net-2005-61 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. It also makes me wonder, what are embeddings good for? A few things immediately come to mind: (1) For visualization of high-dimensional data sets. In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. (2) For nonparametric modeling. The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. (3) As a prelude to classifier learning. The hope here is presumably that learning will be easier in the low-dimensional space,
sentIndex sentText sentNum sentScore
1 I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. [sent-1, score-1.029]
2 It also makes me wonder, what are embeddings good for? [sent-2, score-0.562]
3 A few things immediately come to mind: (1) For visualization of high-dimensional data sets. [sent-3, score-0.373]
4 In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. [sent-4, score-0.494]
5 The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. [sent-6, score-0.848]
6 So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. [sent-7, score-1.335]
7 Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. [sent-8, score-0.664]
8 The hope here is presumably that learning will be easier in the low-dimensional space, because of (i) better generalization and (ii) a more “natural” layout of the data. [sent-10, score-0.466]
9 I’d be curious to know of other uses for embeddings. [sent-11, score-0.207]
wordName wordTfidf (topN-words)
[('embeddings', 0.485), ('surface', 0.299), ('nonparametric', 0.242), ('embed', 0.162), ('embedding', 0.162), ('ii', 0.162), ('layout', 0.162), ('struck', 0.162), ('visualization', 0.15), ('lie', 0.15), ('euclidean', 0.141), ('usual', 0.135), ('curious', 0.135), ('beautiful', 0.129), ('wonder', 0.129), ('mapping', 0.129), ('resources', 0.129), ('dimension', 0.125), ('identify', 0.125), ('data', 0.124), ('incidentally', 0.121), ('presumably', 0.117), ('specifically', 0.117), ('functional', 0.114), ('generalization', 0.109), ('applying', 0.109), ('neighbor', 0.104), ('nearest', 0.099), ('exponential', 0.099), ('immediately', 0.099), ('mind', 0.097), ('recent', 0.096), ('close', 0.094), ('yield', 0.087), ('looking', 0.083), ('easier', 0.078), ('require', 0.078), ('good', 0.077), ('low', 0.076), ('classifier', 0.074), ('algorithms', 0.074), ('uses', 0.072), ('space', 0.072), ('actually', 0.071), ('techniques', 0.067), ('models', 0.066), ('high', 0.064), ('like', 0.064), ('applications', 0.063), ('natural', 0.061)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 61 hunch net-2005-04-25-Embeddings: what are they good for?
Introduction: I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. It also makes me wonder, what are embeddings good for? A few things immediately come to mind: (1) For visualization of high-dimensional data sets. In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. (2) For nonparametric modeling. The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. (3) As a prelude to classifier learning. The hope here is presumably that learning will be easier in the low-dimensional space,
2 0.10825843 45 hunch net-2005-03-22-Active learning
Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (
3 0.10240769 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
4 0.077129096 140 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
5 0.075116053 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
Introduction: Suppose we have a set of observations over time x 1 ,x 2 ,…,x t and want to predict some future event y t+1 . An inevitable problem arises, because learning a predictor h(x 1 ,…,x t ) of y t+1 is generically intractable due to the size of the input. To make this problem tractable, what’s necessary is a method for summarizing the relevant information in past observations for the purpose of prediction in the future. In other words, state is required. Existing approaches for deriving state have some limitations. Hidden Markov models learned with EM suffer from local minima, use tabular learning approaches which provide dubious generalization ability, and often require substantial a.priori specification of the observations. Kalman Filters and Particle Filters are very parametric in the sense that substantial information must be specified up front. Dynamic Bayesian Networks ( graphical models through time) require substantial a.priori specification and often re
6 0.071586087 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
7 0.069397599 332 hunch net-2008-12-23-Use of Learning Theory
8 0.060687404 95 hunch net-2005-07-14-What Learning Theory might do
9 0.060011253 91 hunch net-2005-07-10-Thinking the Unthought
10 0.057890326 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
11 0.057863772 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
12 0.057144411 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
13 0.056782681 234 hunch net-2007-02-22-Create Your Own ICML Workshop
14 0.056047693 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
15 0.05572373 34 hunch net-2005-03-02-Prior, “Prior” and Bias
16 0.055492707 47 hunch net-2005-03-28-Open Problems for Colt
17 0.053872 89 hunch net-2005-07-04-The Health of COLT
18 0.052430976 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
19 0.051254086 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
20 0.051004447 143 hunch net-2005-12-27-Automated Labeling
topicId topicWeight
[(0, 0.122), (1, 0.032), (2, -0.01), (3, 0.001), (4, 0.049), (5, -0.022), (6, 0.013), (7, 0.003), (8, 0.044), (9, -0.06), (10, 0.013), (11, 0.0), (12, -0.042), (13, 0.016), (14, 0.011), (15, -0.03), (16, 0.018), (17, 0.045), (18, 0.007), (19, -0.051), (20, 0.048), (21, 0.018), (22, -0.022), (23, -0.006), (24, 0.019), (25, 0.003), (26, 0.012), (27, 0.043), (28, -0.014), (29, -0.062), (30, -0.012), (31, -0.093), (32, -0.01), (33, 0.011), (34, -0.008), (35, 0.003), (36, 0.026), (37, 0.019), (38, -0.002), (39, -0.073), (40, 0.028), (41, 0.068), (42, 0.008), (43, -0.008), (44, -0.046), (45, 0.152), (46, 0.067), (47, 0.081), (48, -0.066), (49, -0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.95513427 61 hunch net-2005-04-25-Embeddings: what are they good for?
Introduction: I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. It also makes me wonder, what are embeddings good for? A few things immediately come to mind: (1) For visualization of high-dimensional data sets. In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. (2) For nonparametric modeling. The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. (3) As a prelude to classifier learning. The hope here is presumably that learning will be easier in the low-dimensional space,
2 0.58177495 408 hunch net-2010-08-24-Alex Smola starts a blog
Introduction: Adventures in Data Land .
3 0.58143061 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
4 0.52469581 444 hunch net-2011-09-07-KDD and MUCMD 2011
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
5 0.52193379 140 hunch net-2005-12-14-More NIPS Papers II
Introduction: I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order… Preconditioner Approximations for Probabilistic Graphical Models. Pradeeep Ravikumar and John Lafferty. I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations. Rodeo: Sparse nonparametric regression in high dimensions. John Lafferty and Larry Wasserman. A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs. Interpolating between types and tokens by estimating
6 0.5131954 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
7 0.50720185 87 hunch net-2005-06-29-Not EM for clustering at COLT
8 0.50336862 45 hunch net-2005-03-22-Active learning
9 0.48887438 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
10 0.48778152 263 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)
11 0.48604968 143 hunch net-2005-12-27-Automated Labeling
12 0.47530392 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
13 0.47340214 139 hunch net-2005-12-11-More NIPS Papers
14 0.47248381 397 hunch net-2010-05-02-What’s the difference between gambling and rewarding good prediction?
15 0.46803519 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
16 0.46433586 325 hunch net-2008-11-10-ICML Reviewing Criteria
17 0.46243718 260 hunch net-2007-08-25-The Privacy Problem
18 0.44743007 77 hunch net-2005-05-29-Maximum Margin Mismatch?
19 0.44286969 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?
20 0.44238809 471 hunch net-2012-08-24-Patterns for research in machine learning
topicId topicWeight
[(27, 0.159), (33, 0.463), (38, 0.039), (53, 0.082), (55, 0.035), (94, 0.101)]
simIndex simValue blogId blogTitle
same-blog 1 0.89034861 61 hunch net-2005-04-25-Embeddings: what are they good for?
Introduction: I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. It also makes me wonder, what are embeddings good for? A few things immediately come to mind: (1) For visualization of high-dimensional data sets. In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. (2) For nonparametric modeling. The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. (3) As a prelude to classifier learning. The hope here is presumably that learning will be easier in the low-dimensional space,
2 0.83340657 350 hunch net-2009-04-23-Jonathan Chang at Slycoder
Introduction: Jonathan Chang has a research blog on aspects of machine learning.
3 0.77685553 384 hunch net-2009-12-24-Top graduates this season
Introduction: I would like to point out 3 graduates this season as having my confidence they are capable of doing great things. Daniel Hsu has diverse papers with diverse coauthors on {active learning, mulitlabeling, temporal learning, …} each covering new algorithms and methods of analysis. He is also a capable programmer, having helped me with some nitty-gritty details of cluster parallel Vowpal Wabbit this summer. He has an excellent tendency to just get things done. Nicolas Lambert doesn’t nominally work in machine learning, but I’ve found his work in elicitation relevant nevertheless. In essence, elicitable properties are closely related to learnable properties, and the elicitation complexity is related to a notion of learning complexity. See the Surrogate regret bounds paper for some related discussion. Few people successfully work at such a general level that it crosses fields, but he’s one of them. Yisong Yue is deeply focused on interactive learning, which he has a
4 0.66258687 493 hunch net-2014-02-16-Metacademy: a package manager for knowledge
Introduction: In recent years, there’s been an explosion of free educational resources that make high-level knowledge and skills accessible to an ever-wider group of people. In your own field, you probably have a good idea of where to look for the answer to any particular question. But outside your areas of expertise, sifting through textbooks, Wikipedia articles, research papers, and online lectures can be bewildering (unless you’re fortunate enough to have a knowledgeable colleague to consult). What are the key concepts in the field, how do they relate to each other, which ones should you learn, and where should you learn them? Courses are a major vehicle for packaging educational materials for a broad audience. The trouble is that they’re typically meant to be consumed linearly, regardless of your specific background or goals. Also, unless thousands of other people have had the same background and learning goals, there may not even be a course that fits your needs. Recently, we ( Roger Grosse
5 0.55273199 234 hunch net-2007-02-22-Create Your Own ICML Workshop
Introduction: As usual ICML 2007 will be hosting a workshop program to be held this year on June 24th. The success of the program depends on having researchers like you propose interesting workshop topics and then organize the workshops. I’d like to encourage all of you to consider sending a workshop proposal. The proposal deadline has been extended to March 5. See the workshop web-site for details. Organizing a workshop is a unique way to gather an international group of researchers together to focus for an entire day on a topic of your choosing. I’ve always found that the cost of organizing a workshop is not so large, and very low compared to the benefits. The topic and format of a workshop are limited only by your imagination (and the attractiveness to potential participants) and need not follow the usual model of a mini-conference on a particular ML sub-area. Hope to see some interesting proposals rolling in.
6 0.44475627 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
7 0.42665738 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
8 0.40630099 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
9 0.3984046 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
10 0.39814451 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
11 0.39666879 95 hunch net-2005-07-14-What Learning Theory might do
12 0.39609721 237 hunch net-2007-04-02-Contextual Scaling
13 0.3958264 253 hunch net-2007-07-06-Idempotent-capable Predictors
14 0.39545706 143 hunch net-2005-12-27-Automated Labeling
15 0.39532194 347 hunch net-2009-03-26-Machine Learning is too easy
16 0.39341521 163 hunch net-2006-03-12-Online learning or online preservation of learning?
17 0.39340046 131 hunch net-2005-11-16-The Everything Ensemble Edge
18 0.39337906 351 hunch net-2009-05-02-Wielding a New Abstraction
19 0.39057049 358 hunch net-2009-06-01-Multitask Poisoning
20 0.39038482 41 hunch net-2005-03-15-The State of Tight Bounds