hunch_net hunch_net-2005 hunch_net-2005-102 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
sentIndex sentText sentNum sentScore
1 Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. [sent-2, score-0.729]
2 Associate with the edge a weight which is the distance between the points in the connected nodes. [sent-3, score-0.394]
3 This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. [sent-5, score-0.633]
4 Find a set of points in a low dimensional space which preserve the digested properties. [sent-6, score-0.906]
5 Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. [sent-7, score-0.153]
6 The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. [sent-8, score-1.165]
7 Many of them can be shown to work in interesting ways producing various compelling pictures. [sent-9, score-0.134]
8 Despite doing some early work in this direction, I suffer from a motivational problem: Why do we want to recover the low dimensional structure? [sent-10, score-0.828]
9 This is compelling if you have data visualization problems. [sent-12, score-0.28]
10 (One approximation = the projection into the low dimensional space, another approximation = the classifier learned on that space. [sent-16, score-0.991]
11 Several people have experimented with using a vision sensor and a dimension reduction technique in an attempt to extract the manifold of pose space. [sent-18, score-0.475]
12 These attempts have not generally worked well, basically because the euclidean distance on pixels is not particularly good at predicting which things are “nearby”. [sent-19, score-0.388]
13 Any stream S of images i 1 , i 2 , i 3 , …, i n can be transformed into a binary problem according to: {((i j ,i k ),1 – I(j = k+1 or k = j+1): i j ,i k in S} . [sent-22, score-0.377]
14 In unmath “the binary problem formed by predicting whether images are adjacent in the chain of experience”. [sent-23, score-0.455]
15 Using regression and counting numbers of transitions might provide a more conventional multibit metric. [sent-25, score-0.146]
16 This metric, if well solved, has a concrete meaning: the minimum distance in terms of actuator transitions between positions. [sent-26, score-0.488]
17 A shortest path in this space is a sequence of actuator movements leading from a position A to a position B. [sent-27, score-0.834]
18 A projection of this space into low dimensions provides some common format which both the human and the robot can understand. [sent-28, score-0.622]
19 Commanding the robot to go to some location is just a matter of pointing out that location in the low dimensional projection. [sent-29, score-0.88]
20 This is a possible use for manifold based dimension reduction techniques which I find compelling, if it works out. [sent-30, score-0.396]
wordName wordTfidf (topN-words)
[('dimensional', 0.336), ('low', 0.223), ('manifold', 0.198), ('actuator', 0.178), ('distance', 0.164), ('shortest', 0.158), ('approximation', 0.15), ('visualization', 0.146), ('transitions', 0.146), ('space', 0.14), ('compelling', 0.134), ('nearby', 0.132), ('projection', 0.132), ('points', 0.128), ('node', 0.127), ('recover', 0.127), ('robot', 0.127), ('path', 0.122), ('dimension', 0.122), ('binary', 0.121), ('position', 0.118), ('images', 0.118), ('metric', 0.102), ('edge', 0.102), ('location', 0.097), ('dana', 0.079), ('interpolate', 0.079), ('robots', 0.079), ('digested', 0.079), ('commanding', 0.079), ('isomap', 0.079), ('sensor', 0.079), ('wilkinson', 0.079), ('predicting', 0.078), ('worked', 0.077), ('include', 0.076), ('reduction', 0.076), ('accomplishing', 0.073), ('connecting', 0.073), ('digest', 0.073), ('motivational', 0.073), ('violates', 0.073), ('structure', 0.073), ('point', 0.07), ('want', 0.069), ('transformed', 0.069), ('stream', 0.069), ('chain', 0.069), ('euclidean', 0.069), ('adjacent', 0.069)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
2 0.17148151 87 hunch net-2005-06-29-Not EM for clustering at COLT
Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec
3 0.13208458 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
Introduction: David Mcallester gave a talk about this paper (with Pedro Felzenszwalb ). I’ll try to give a high level summary of why it’s interesting. Dynamic programming is most familiar as instantiated by Viterbi decoding in a hidden markov model. It is a general paradigm for problem solving where subproblems are solved and used to solve larger problems. In the Viterbi decoding example, the subproblem is “What is the most probable path ending at each state at timestep t ?”, and the larger problem is the same except at timestep t+1 . There are a few optimizations you can do here: Dynamic Programming -> queued Dynamic Programming . Keep track of the “cost so far” (or “most probable path”) and (carefully) only look at extensions to paths likely to yield the shortest path. “Carefully” here is defined by Dijkstra’s shortest path algorithm . queued Dynamic programming -> A * Add a lower bound on the cost to complete a path (or an upper bound on the probability of a completion) for
4 0.11624894 351 hunch net-2009-05-02-Wielding a New Abstraction
Introduction: This post is partly meant as an advertisement for the reductions tutorial Alina , Bianca , and I are planning to do at ICML . Please come, if you are interested. Many research programs can be thought of as finding and building new useful abstractions. The running example I’ll use is learning reductions where I have experience. The basic abstraction here is that we can build a learning algorithm capable of solving classification problems up to a small expected regret. This is used repeatedly to solve more complex problems. In working on a new abstraction, I think you typically run into many substantial problems of understanding, which make publishing particularly difficult. It is difficult to seriously discuss the reason behind or mechanism for abstraction in a conference paper with small page limits. People rarely see such discussions and hence have little basis on which to think about new abstractions. Another difficulty is that when building an abstraction, yo
5 0.11225967 165 hunch net-2006-03-23-The Approximation Argument
Introduction: An argument is sometimes made that the Bayesian way is the “right” way to do machine learning. This is a serious argument which deserves a serious reply. The approximation argument is a serious reply for which I have not yet seen a reply 2 . The idea for the Bayesian approach is quite simple, elegant, and general. Essentially, you first specify a prior P(D) over possible processes D producing the data, observe the data, then condition on the data according to Bayes law to construct a posterior: P(D|x) = P(x|D)P(D)/P(x) After this, hard decisions are made (such as “turn left” or “turn right”) by choosing the one which minimizes the expected (with respect to the posterior) loss. This basic idea is reused thousands of times with various choices of P(D) and loss functions which is unsurprising given the many nice properties: There is an extremely strong associated guarantee: If the actual distribution generating the data is drawn from P(D) there is no better method.
6 0.11153068 143 hunch net-2005-12-27-Automated Labeling
7 0.11105656 153 hunch net-2006-02-02-Introspectionism as a Disease
8 0.10917402 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
9 0.10323001 20 hunch net-2005-02-15-ESPgame and image labeling
10 0.10240769 61 hunch net-2005-04-25-Embeddings: what are they good for?
11 0.10036695 259 hunch net-2007-08-19-Choice of Metrics
12 0.095396027 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
13 0.094818421 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
14 0.09479481 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
15 0.093567505 3 hunch net-2005-01-24-The Humanloop Spectrum of Machine Learning
16 0.093024157 160 hunch net-2006-03-02-Why do people count for learning?
17 0.090589821 95 hunch net-2005-07-14-What Learning Theory might do
18 0.085419275 139 hunch net-2005-12-11-More NIPS Papers
19 0.083609521 237 hunch net-2007-04-02-Contextual Scaling
20 0.08181119 98 hunch net-2005-07-27-Not goal metrics
topicId topicWeight
[(0, 0.201), (1, 0.063), (2, 0.002), (3, -0.009), (4, 0.018), (5, -0.04), (6, 0.05), (7, 0.004), (8, 0.01), (9, -0.078), (10, -0.103), (11, -0.009), (12, -0.067), (13, 0.008), (14, -0.058), (15, -0.065), (16, -0.008), (17, 0.023), (18, 0.034), (19, 0.052), (20, 0.011), (21, -0.025), (22, 0.006), (23, 0.013), (24, 0.075), (25, 0.05), (26, 0.047), (27, 0.009), (28, -0.006), (29, -0.047), (30, -0.008), (31, -0.154), (32, 0.101), (33, 0.008), (34, -0.003), (35, 0.005), (36, -0.106), (37, 0.001), (38, -0.025), (39, -0.03), (40, 0.016), (41, 0.023), (42, 0.103), (43, 0.068), (44, 0.049), (45, 0.073), (46, 0.059), (47, 0.087), (48, 0.002), (49, -0.088)]
simIndex simValue blogId blogTitle
same-blog 1 0.96672338 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
2 0.65830016 87 hunch net-2005-06-29-Not EM for clustering at COLT
Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec
3 0.619398 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
Introduction: Say we have two random variables X,Y with mutual information I(X,Y) . Let’s say we want to represent them with a bayes net of the form X< -M->Y , such that the entropy of M equals the mutual information, i.e. H(M)=I(X,Y) . Intuitively, we would like our hidden state to be as simple as possible (entropy wise). The data processing inequality means that H(M)>=I(X,Y) , so the mutual information is a lower bound on how simple the M could be. Furthermore, if such a construction existed it would have a nice coding interpretation — one could jointly code X and Y by first coding the mutual information, then coding X with this mutual info (without Y ) and coding Y with this mutual info (without X ). It turns out that such a construction does not exist in general (Thx Alina Beygelzimer for a counterexample! see below for the sketch). What are the implications of this? Well, it’s hard for me to say, but it does suggest to me that the ‘generative’ model philosophy might be
4 0.58356524 153 hunch net-2006-02-02-Introspectionism as a Disease
Introduction: In the AI-related parts of machine learning, it is often tempting to examine how you do things in order to imagine how a machine should do things. This is introspection, and it can easily go awry. I will call introspection gone awry introspectionism. Introspectionism is almost unique to AI (and the AI-related parts of machine learning) and it can lead to huge wasted effort in research. It’s easiest to show how introspectionism arises by an example. Suppose we want to solve the problem of navigating a robot from point A to point B given a camera. Then, the following research action plan might seem natural when you examine your own capabilities: Build an edge detector for still images. Build an object recognition system given the edge detector. Build a system to predict distance and orientation to objects given the object recognition system. Build a system to plan a path through the scene you construct from {object identification, distance, orientation} predictions.
5 0.57278931 61 hunch net-2005-04-25-Embeddings: what are they good for?
Introduction: I’ve been looking at some recent embeddings work, and am struck by how beautiful the theory and algorithms are. It also makes me wonder, what are embeddings good for? A few things immediately come to mind: (1) For visualization of high-dimensional data sets. In this case, one would like good algorithms for embedding specifically into 2- and 3-dimensional Euclidean spaces. (2) For nonparametric modeling. The usual nonparametric models (histograms, nearest neighbor) often require resources which are exponential in the dimension. So if the data actually lie close to some low-dimensional surface, it might be a good idea to first identify this surface and embed the data before applying the model. Incidentally, for applications like these, it’s important to have a functional mapping from high to low dimension, which some techniques do not yield up. (3) As a prelude to classifier learning. The hope here is presumably that learning will be easier in the low-dimensional space,
6 0.55227923 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
7 0.53708726 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time
8 0.53407204 217 hunch net-2006-11-06-Data Linkage Problems
9 0.52523309 334 hunch net-2009-01-07-Interesting Papers at SODA 2009
10 0.51353651 444 hunch net-2011-09-07-KDD and MUCMD 2011
11 0.50125408 143 hunch net-2005-12-27-Automated Labeling
12 0.48904178 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use
13 0.48561895 287 hunch net-2008-01-28-Sufficient Computation
14 0.48496616 446 hunch net-2011-10-03-Monday announcements
15 0.47528139 77 hunch net-2005-05-29-Maximum Margin Mismatch?
16 0.47516876 237 hunch net-2007-04-02-Contextual Scaling
17 0.45783022 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
18 0.45643875 139 hunch net-2005-12-11-More NIPS Papers
19 0.45523581 373 hunch net-2009-10-03-Static vs. Dynamic multiclass prediction
20 0.4507699 312 hunch net-2008-08-04-Electoralmarkets.com
topicId topicWeight
[(3, 0.047), (21, 0.054), (27, 0.171), (38, 0.042), (53, 0.086), (55, 0.11), (63, 0.307), (77, 0.031), (94, 0.037), (95, 0.022)]
simIndex simValue blogId blogTitle
1 0.94786036 99 hunch net-2005-08-01-Peekaboom
Introduction: Luis has released Peekaboom a successor to ESPgame ( game site ). The purpose of the game is similar—using the actions of people playing a game to gather data helpful in solving AI. Peekaboom gathers more detailed, and perhaps more useful, data about vision. For ESPgame, the byproduct of the game was mutually agreed upon labels for common images. For Peekaboom, the location of the subimage generating the label is revealed by the game as well. Given knowledge about what portion of the image is related to a label it may be more feasible learn to recognize the appropriate parts. There isn’t a dataset yet available for this game as there is for ESPgame, but hopefully a significant number of people will play and we’ll have one to work wtih soon.
2 0.86004102 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets
Introduction: Say we have two random variables X,Y with mutual information I(X,Y) . Let’s say we want to represent them with a bayes net of the form X< -M->Y , such that the entropy of M equals the mutual information, i.e. H(M)=I(X,Y) . Intuitively, we would like our hidden state to be as simple as possible (entropy wise). The data processing inequality means that H(M)>=I(X,Y) , so the mutual information is a lower bound on how simple the M could be. Furthermore, if such a construction existed it would have a nice coding interpretation — one could jointly code X and Y by first coding the mutual information, then coding X with this mutual info (without Y ) and coding Y with this mutual info (without X ). It turns out that such a construction does not exist in general (Thx Alina Beygelzimer for a counterexample! see below for the sketch). What are the implications of this? Well, it’s hard for me to say, but it does suggest to me that the ‘generative’ model philosophy might be
same-blog 3 0.85092956 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational
4 0.78491586 260 hunch net-2007-08-25-The Privacy Problem
Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings
5 0.57850993 484 hunch net-2013-06-16-Representative Reviewing
Introduction: When thinking about how best to review papers, it seems helpful to have some conception of what good reviewing is. As far as I can tell, this is almost always only discussed in the specific context of a paper (i.e. your rejected paper), or at most an area (i.e. what a “good paper” looks like for that area) rather than general principles. Neither individual papers or areas are sufficiently general for a large conference—every paper differs in the details, and what if you want to build a new area and/or cross areas? An unavoidable reason for reviewing is that the community of research is too large. In particular, it is not possible for a researcher to read every paper which someone thinks might be of interest. This reason for reviewing exists independent of constraints on rooms or scheduling formats of individual conferences. Indeed, history suggests that physical constraints are relatively meaningless over the long term — growing conferences simply use more rooms and/or change fo
6 0.57655603 437 hunch net-2011-07-10-ICML 2011 and the future
7 0.56980371 194 hunch net-2006-07-11-New Models
8 0.5679931 134 hunch net-2005-12-01-The Webscience Future
9 0.56721008 116 hunch net-2005-09-30-Research in conferences
10 0.56691384 403 hunch net-2010-07-18-ICML & COLT 2010
11 0.56523043 225 hunch net-2007-01-02-Retrospective
12 0.56468737 378 hunch net-2009-11-15-The Other Online Learning
13 0.56377786 40 hunch net-2005-03-13-Avoiding Bad Reviewing
14 0.56305122 452 hunch net-2012-01-04-Why ICML? and the summer conferences
15 0.5625971 466 hunch net-2012-06-05-ICML acceptance statistics
16 0.56219357 309 hunch net-2008-07-10-Interesting papers, ICML 2008
17 0.56173176 77 hunch net-2005-05-29-Maximum Margin Mismatch?
18 0.5615176 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
19 0.56033278 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
20 0.56007832 343 hunch net-2009-02-18-Decision by Vetocracy