hunch_net hunch_net-2005 hunch_net-2005-87 knowledge-graph by maker-knowledge-mining

87 hunch net-2005-06-29-Not EM for clustering at COLT


meta infos for this blog

Source: html

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One standard approach for clustering data with a set of gaussians is using EM. [sent-1, score-0.4]

2 Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. [sent-2, score-1.544]

3 This process is difficult to work with because EM can become “stuck” in local optima. [sent-3, score-0.307]

4 There are various hacks like “rerun with t different random starting points”. [sent-4, score-0.328]

5 One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. [sent-5, score-0.577]

6 Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. [sent-7, score-0.251]

7 A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. [sent-8, score-1.523]

8 It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Project to low dimensional space. [sent-9, score-0.562]

9 Project gross structure into the high dimensional space. [sent-11, score-0.758]

10 Run EM (or some other local improvement algorithm) to find a final fit. [sent-12, score-0.549]

11 The effects of steps 1-3 is to “seed” the local optimization algorithm in a good place from which a global optima is plausibly reachable. [sent-13, score-0.853]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('gross', 0.351), ('local', 0.307), ('guassians', 0.301), ('dimensional', 0.284), ('pick', 0.238), ('em', 0.223), ('project', 0.151), ('gaussians', 0.134), ('hacks', 0.134), ('kannan', 0.134), ('projecting', 0.134), ('ravi', 0.134), ('subroutine', 0.134), ('alternating', 0.124), ('optima', 0.124), ('structure', 0.123), ('random', 0.119), ('seed', 0.117), ('stuck', 0.107), ('set', 0.106), ('suffer', 0.1), ('find', 0.096), ('explain', 0.095), ('clustering', 0.092), ('summary', 0.09), ('adaptive', 0.09), ('rough', 0.09), ('algorithm', 0.09), ('steps', 0.087), ('global', 0.085), ('effects', 0.085), ('expectation', 0.085), ('speaking', 0.082), ('showing', 0.082), ('cool', 0.08), ('presented', 0.079), ('roughly', 0.078), ('tractable', 0.077), ('unclear', 0.077), ('hopefully', 0.077), ('starting', 0.075), ('plausibly', 0.075), ('improvement', 0.073), ('early', 0.073), ('final', 0.073), ('shows', 0.071), ('lower', 0.069), ('data', 0.068), ('especially', 0.067), ('computationally', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 87 hunch net-2005-06-29-Not EM for clustering at COLT

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec

2 0.17148151 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational

3 0.13605313 177 hunch net-2006-05-05-An ICML reject

Introduction: Hal , Daniel , and I have been working on the algorithm Searn for structured prediction. This was just conditionally accepted and then rejected from ICML, and we were quite surprised. By any reasonable criteria, it seems this is an interesting algorithm. Prediction Performance: Searn performed better than any other algorithm on all the problems we tested against using the same feature set. This is true even using the numbers reported by authors in their papers. Theoretical underpinning. Searn is a reduction which comes with a reduction guarantee: the good performance on a base classifiers implies good performance for the overall system. No other theorem of this type has been made for other structured prediction algorithms, as far as we know. Speed. Searn has no problem handling much larger datasets than other algorithms we tested against. Simplicity. Given code for a binary classifier and a problem-specific search algorithm, only a few tens of lines are necessary to

4 0.10892148 394 hunch net-2010-04-24-COLT Treasurer is now Phil Long

Introduction: For about 5 years, I’ve been the treasurer of the Association for Computational Learning, otherwise known as COLT, taking over from John Case before me. A transfer of duties to Phil Long is now about complete. This probably matters to almost no one, but I wanted to describe things a bit for those interested. The immediate impetus for this decision was unhappiness over reviewing decisions at COLT 2009 , one as an author and several as a member of the program committee. I seem to have disagreements fairly often about what is important work, partly because I’m focused on learning theory with practical implications, partly because I define learning theory more broadly than is typical amongst COLT members, and partly because COLT suffers a bit from insider-clique issues. The degree to which these issues come up varies substantially each year so last year is not predictive of this one. And, it’s important to understand that COLT remains healthy with these issues not nearly so bad

5 0.097148426 388 hunch net-2010-01-24-Specializations of the Master Problem

Introduction: One thing which is clear on a little reflection is that there exists a single master learning problem capable of encoding essentially all learning problems. This problem is of course a very general sort of reinforcement learning where the world interacts with an agent as: The world announces an observation x . The agent makes a choice a . The world announces a reward r . The goal here is to maximize the sum of the rewards over the time of the agent. No particular structure relating x to a or a to r is implied by this setting so we do not know effective general algorithms for the agent. It’s very easy to prove lower bounds showing that an agent cannot hope to succeed here—just consider the case where actions are unrelated to rewards. Nevertheless, there is a real sense in which essentially all forms of life are agents operating in this setting, somehow succeeding. The gap between these observations drives research—How can we find tractable specializations of

6 0.084214084 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

7 0.079901434 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

8 0.078980185 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

9 0.078535378 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?

10 0.077998355 347 hunch net-2009-03-26-Machine Learning is too easy

11 0.077314287 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

12 0.073787555 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

13 0.07336545 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

14 0.070109092 424 hunch net-2011-02-17-What does Watson mean?

15 0.069333561 199 hunch net-2006-07-26-Two more UAI papers of interest

16 0.068239942 235 hunch net-2007-03-03-All Models of Learning have Flaws

17 0.068018146 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

18 0.066459462 131 hunch net-2005-11-16-The Everything Ensemble Edge

19 0.066175975 189 hunch net-2006-07-05-more icml papers

20 0.064602643 351 hunch net-2009-05-02-Wielding a New Abstraction


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.159), (1, 0.037), (2, 0.006), (3, -0.005), (4, 0.077), (5, -0.025), (6, -0.018), (7, -0.015), (8, -0.013), (9, -0.017), (10, 0.007), (11, 0.058), (12, -0.072), (13, -0.014), (14, 0.027), (15, 0.002), (16, 0.056), (17, 0.04), (18, -0.015), (19, 0.004), (20, 0.023), (21, -0.043), (22, -0.018), (23, -0.009), (24, -0.023), (25, 0.072), (26, -0.035), (27, -0.081), (28, 0.056), (29, -0.08), (30, -0.049), (31, -0.099), (32, 0.072), (33, 0.027), (34, -0.027), (35, 0.003), (36, -0.09), (37, 0.01), (38, 0.037), (39, -0.0), (40, -0.011), (41, -0.001), (42, 0.085), (43, -0.014), (44, 0.078), (45, 0.021), (46, 0.068), (47, 0.08), (48, -0.038), (49, -0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95894969 87 hunch net-2005-06-29-Not EM for clustering at COLT

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec

2 0.64564341 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational

3 0.62852627 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?

Introduction: Compressed Sensing (CS) is a new framework developed by Emmanuel Candes , Terry Tao and David Donoho . To summarize, if you acquire a signal in some basis that is incoherent with the basis in which you know the signal to be sparse in, it is very likely you will be able to reconstruct the signal from these incoherent projections. Terry Tao, the recent Fields medalist , does a very nice job at explaining the framework here . He goes further in the theory description in this post where he mentions the central issue of the Uniform Uncertainty Principle. It so happens that random projections are on average incoherent, within the UUP meaning, with most known basis (sines, polynomials, splines, wavelets, curvelets …) and are therefore an ideal basis for Compressed Sensing. [ For more in-depth information on the subject, the Rice group has done a very good job at providing a central library of papers relevant to the growing subject: http://www.dsp.ece.rice.edu/cs/ ] The Machine

4 0.57321477 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

Introduction: Several talks seem potentially interesting to ML folks at this year’s SODA. Maria-Florina Balcan , Avrim Blum , and Anupam Gupta , Approximate Clustering without the Approximation . This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. Essentially, they show that if any near-approximation to an optimal solution is good, then it’s computationally easy to find a near-optimal solution. This subtle shift bears serious thought. A similar one occurred in our ranking paper with respect to minimum feedback arcset. With two known examples, it suggests that many more NP-complete problems might be finessed into irrelevance in this style. Yury Lifshits and Shengyu Zhang , Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Smal

5 0.56704253 189 hunch net-2006-07-05-more icml papers

Introduction: Here are a few other papers I enjoyed from ICML06. Topic Models: Dynamic Topic Models David Blei, John Lafferty A nice model for how topics in LDA type models can evolve over time, using a linear dynamical system on the natural parameters and a very clever structured variational approximation (in which the mean field parameters are pseudo-observations of a virtual LDS). Like all Blei papers, he makes it look easy, but it is extremely impressive. Pachinko Allocation Wei Li, Andrew McCallum A very elegant (but computationally challenging) model which induces correlation amongst topics using a multi-level DAG whose interior nodes are “super-topics” and “sub-topics” and whose leaves are the vocabulary words. Makes the slumbering monster of structure learning stir. Sequence Analysis (I missed these talks since I was chairing another session) Online Decoding of Markov Models with Latency Constraints Mukund Narasimhan, Paul Viola, Michael Shilman An “a

6 0.51380754 177 hunch net-2006-05-05-An ICML reject

7 0.5019809 61 hunch net-2005-04-25-Embeddings: what are they good for?

8 0.49577057 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility

9 0.4901168 317 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?

10 0.47633904 58 hunch net-2005-04-21-Dynamic Programming Generalizations and Their Use

11 0.47207037 150 hunch net-2006-01-23-On Coding via Mutual Information & Bayes Nets

12 0.47130123 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

13 0.4610056 153 hunch net-2006-02-02-Introspectionism as a Disease

14 0.45083269 444 hunch net-2011-09-07-KDD and MUCMD 2011

15 0.44973549 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

16 0.44844824 219 hunch net-2006-11-22-Explicit Randomization in Learning algorithms

17 0.44746059 185 hunch net-2006-06-16-Regularization = Robustness

18 0.43894076 77 hunch net-2005-05-29-Maximum Margin Mismatch?

19 0.43575656 388 hunch net-2010-01-24-Specializations of the Master Problem

20 0.43463552 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.418), (27, 0.148), (38, 0.105), (55, 0.092), (94, 0.1), (95, 0.026)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90948731 87 hunch net-2005-06-29-Not EM for clustering at COLT

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec

2 0.86976779 473 hunch net-2012-09-29-Vowpal Wabbit, version 7.0

Introduction: A new version of VW is out . The primary changes are: Learning Reductions : I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct: Multiclass Classification –oaa or –ect . Cost Sensitive Multiclass Classification –csoaa or –wap . Contextual Bandit Classification –cb . Sequential Structured Prediction –searn or –dagger In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning. Library interface : VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over

3 0.80528468 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

Introduction: …is discussed in this nytimes article . I generally expect such approaches to become more common since computers are getting faster, machine learning is getting better, and data is becoming more plentiful. This is another example where machine learning technology may have a huge economic impact. Some side notes: We-in-research know almost nothing about how these things are done (because it is typically a corporate secret). … but the limited discussion in the article seem naive from a machine learning viewpoint. The learning process used apparently often fails to take into account transaction costs. What little of the approaches is discussed appears modeling based. It seems plausible that more direct prediction methods can yield an edge. One difficulty with stock picking as a research topic is that it is inherently a zero sum game (for every winner, there is a loser). Much of the rest of research is positive sum (basically, everyone wins).

4 0.79928106 62 hunch net-2005-04-26-To calibrate or not?

Introduction: A calibrated predictor is one which predicts the probability of a binary event with the property: For all predictions p , the proportion of the time that 1 is observed is p . Since there are infinitely many p , this definition must be “softened” to make sense for any finite number of samples. The standard method for “softening” is to consider all predictions in a small neighborhood about each possible p . A great deal of effort has been devoted to strategies for achieving calibrated (such as here ) prediction. With statements like: (under minimal conditions) you can always make calibrated predictions. Given the strength of these statements, we might conclude we are done, but that would be a “confusion of ends”. A confusion of ends arises in the following way: We want good probabilistic predictions. Good probabilistic predictions are calibrated. Therefore, we want calibrated predictions. The “Therefore” step misses the fact that calibration is a necessary b

5 0.78838229 215 hunch net-2006-10-22-Exemplar programming

Introduction: There are many different abstractions for problem definition and solution. Here are a few examples: Functional programming: a set of functions are defined. The composed execution of these functions yields the solution. Linear programming: a set of constraints and a linear objective function are defined. An LP solver finds the constrained optimum. Quadratic programming: Like linear programming, but the language is a little more flexible (and the solution slower). Convex programming: like quadratic programming, but the language is more flexible (and the solutions even slower). Dynamic programming: a recursive definition of the problem is defined and then solved efficiently via caching tricks. SAT programming: A problem is specified as a satisfiability involving a conjunction of a disjunction of boolean variables. A general engine attempts to find a good satisfying assignment. For example Kautz’s blackbox planner. These abstractions have different tradeoffs betw

6 0.70836514 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

7 0.58902562 133 hunch net-2005-11-28-A question of quantification

8 0.4962965 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

9 0.47597647 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

10 0.45973289 5 hunch net-2005-01-26-Watchword: Probability

11 0.45423877 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

12 0.4528116 262 hunch net-2007-09-16-Optimizing Machine Learning Programs

13 0.45240378 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

14 0.44991282 233 hunch net-2007-02-16-The Forgetting

15 0.44662845 177 hunch net-2006-05-05-An ICML reject

16 0.44640428 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning

17 0.44619724 14 hunch net-2005-02-07-The State of the Reduction

18 0.4454703 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

19 0.44437698 210 hunch net-2006-09-28-Programming Languages for Machine Learning Implementations

20 0.44345501 131 hunch net-2005-11-16-The Everything Ensemble Edge