hunch_net hunch_net-2009 hunch_net-2009-334 knowledge-graph by maker-knowledge-mining

334 hunch net-2009-01-07-Interesting Papers at SODA 2009

meta infos for this blog

Source: html

Introduction: Several talks seem potentially interesting to ML folks at this year’s SODA. Maria-Florina Balcan , Avrim Blum , and Anupam Gupta , Approximate Clustering without the Approximation . This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. Essentially, they show that if any near-approximation to an optimal solution is good, then it’s computationally easy to find a near-optimal solution. This subtle shift bears serious thought. A similar one occurred in our ranking paper with respect to minimum feedback arcset. With two known examples, it suggests that many more NP-complete problems might be finessed into irrelevance in this style. Yury Lifshits and Shengyu Zhang , Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates, and Smal

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Several talks seem potentially interesting to ML folks at this year’s SODA. [sent-1, score-0.102]

2 Maria-Florina Balcan , Avrim Blum , and Anupam Gupta , Approximate Clustering without the Approximation . [sent-2, score-0.086]

3 This paper gives reasonable algorithms with provable approximation guarantees for k-median and other notions of clustering. [sent-3, score-0.568]

4 It’s conceptually interesting, because it’s the second example I’ve seen where NP hardness is subverted by changing the problem definition subtle but reasonable way. [sent-4, score-0.422]

5 A similar one occurred in our ranking paper with respect to minimum feedback arcset. [sent-7, score-0.305]

6 The basic idea of this paper is that actually creating a metric with a valid triangle inequality inequality is hard for real-world problems, so it’s desirable to have a datastructure which works with a relaxed notion of triangle inequality. [sent-10, score-1.427]

7 The precise relaxation is more extreme than you might imagine, implying the associated algorithms give substantial potential speedups in incomparable applications. [sent-11, score-0.287]

8 Yuri tells me that a cover tree style “true O(n) space” algorithm is possible. [sent-12, score-0.098]

9 If worked out and implemented, I could imagine substantial use. [sent-13, score-0.094]

10 The basic idea of this paper is that in real-world applications, an adversary is less powerful than is commonly supposed, so carefully taking into account the observed variance can yield an algorithm which works much better in practice, without sacrificing the worst case performance. [sent-15, score-0.64]

11 The basic point of this paper is that testing halfspaces is qualitatively easier than finding a good half space with respect to 0/1 loss. [sent-17, score-0.851]

12 Although the analysis is laughably far from practical, the result is striking, and it’s plausible that the algorithm works much better than the analysis. [sent-18, score-0.226]

13 The core algorithm is at least conceptually simple: test that two correlated random points have the same sign, with “yes” being evidence of a halfspace and “no” not. [sent-19, score-0.496]

14 Martingale’s are endemic to learning, especially online learning, and I suspect we can tighten and clarify several arguments using some of the techniques discussed. [sent-21, score-0.19]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('triangle', 0.22), ('halfspaces', 0.203), ('conceptually', 0.192), ('inequality', 0.176), ('subtle', 0.145), ('approximation', 0.139), ('testing', 0.137), ('paper', 0.131), ('works', 0.128), ('servedio', 0.11), ('unreasonable', 0.11), ('martingales', 0.11), ('halfspace', 0.11), ('kevin', 0.11), ('martingale', 0.11), ('qualitatively', 0.11), ('relaxed', 0.11), ('sacrificing', 0.11), ('clarify', 0.102), ('rocco', 0.102), ('kale', 0.102), ('satyen', 0.102), ('relaxation', 0.102), ('combinatorial', 0.102), ('folks', 0.102), ('provable', 0.102), ('algorithms', 0.1), ('algorithm', 0.098), ('space', 0.097), ('blum', 0.096), ('notions', 0.096), ('neighbors', 0.096), ('correlated', 0.096), ('balcan', 0.096), ('imagine', 0.094), ('supposed', 0.091), ('datastructure', 0.091), ('np', 0.091), ('striking', 0.088), ('occurred', 0.088), ('effectiveness', 0.088), ('valid', 0.088), ('bears', 0.088), ('endemic', 0.088), ('bandits', 0.088), ('basic', 0.087), ('without', 0.086), ('respect', 0.086), ('speedups', 0.085), ('hardness', 0.085)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000004 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

2 0.18511805 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

Introduction: Given John’s recent posts on CMU’s new machine learning department and “Deep Learning,” I asked for an opportunity to give a computational learning theory perspective on these issues. To my mind, the answer to the question “Are the core problems from machine learning different from the core problems of statistics?” is a clear Yes. The point of this post is to describe a core problem in machine learning that is computational in nature and will appeal to statistical learning folk (as an extreme example note that if P=NP– which, for all we know, is true– then we would suddenly find almost all of our favorite machine learning problems considerably more tractable). If the central question of statistical learning theory were crudely summarized as “given a hypothesis with a certain loss bound over a test set, how well will it generalize?” then the central question of computational learning theory might be “how can we find such a hypothesis efficently (e.g., in polynomial-time)?” With t

3 0.13576025 385 hunch net-2009-12-27-Interesting things at NIPS 2009

Introduction: Several papers at NIPS caught my attention. Elad Hazan and Satyen Kale , Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing . At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling. Mark Palatucci , Dean Pomerlau , Tom Mitchell , and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading. Sh

4 0.12950262 304 hunch net-2008-06-27-Reviewing Horror Stories

Introduction: Essentially everyone who writes research papers suffers rejections. They always sting immediately, but upon further reflection many of these rejections come to seem reasonable. Maybe the equations had too many typos or maybe the topic just isn’t as important as was originally thought. A few rejections do not come to seem acceptable, and these form the basis of reviewing horror stories, a great material for conversations. I’ve decided to share three of mine, now all safely a bit distant in the past. Prediction Theory for Classification Tutorial . This is a tutorial about tight sample complexity bounds for classification that I submitted to JMLR . The first decision I heard was a reject which appeared quite unjust to me—for example one of the reviewers appeared to claim that all the content was in standard statistics books. Upon further inquiry, several citations were given, none of which actually covered the content. Later, I was shocked to hear the paper was accepted. App

5 0.12711589 332 hunch net-2008-12-23-Use of Learning Theory

Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example

6 0.11929329 165 hunch net-2006-03-23-The Approximation Argument

7 0.11876392 188 hunch net-2006-06-30-ICML papers

8 0.10571131 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.10415236 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

10 0.10375197 47 hunch net-2005-03-28-Open Problems for Colt

11 0.10185426 454 hunch net-2012-01-30-ICML Posters and Scope

12 0.1001787 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

13 0.098647162 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

14 0.095094889 186 hunch net-2006-06-24-Online convex optimization at COLT

15 0.094818421 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

16 0.0934828 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

17 0.093062527 23 hunch net-2005-02-19-Loss Functions for Discriminative Training of Energy-Based Models

18 0.092794403 244 hunch net-2007-05-09-The Missing Bound

19 0.092556469 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

20 0.092305556 448 hunch net-2011-10-24-2011 ML symposium and the bears

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.24), (1, 0.048), (2, 0.013), (3, 0.004), (4, 0.082), (5, -0.021), (6, -0.031), (7, -0.029), (8, -0.021), (9, -0.015), (10, 0.05), (11, 0.056), (12, -0.022), (13, -0.03), (14, 0.028), (15, 0.02), (16, 0.01), (17, 0.021), (18, 0.044), (19, 0.048), (20, -0.028), (21, 0.089), (22, -0.044), (23, 0.027), (24, 0.013), (25, -0.008), (26, 0.103), (27, 0.046), (28, 0.021), (29, -0.076), (30, -0.119), (31, -0.081), (32, 0.049), (33, 0.05), (34, -0.011), (35, -0.085), (36, -0.018), (37, 0.048), (38, -0.016), (39, 0.007), (40, 0.027), (41, 0.017), (42, 0.043), (43, 0.121), (44, 0.054), (45, 0.002), (46, 0.021), (47, 0.046), (48, -0.016), (49, -0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96575284 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

2 0.65930057 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?

Introduction: Manifold based dimension-reduction algorithms share the following general outline. Given: a metric d() and a set of points S Construct a graph with a point in every node and every edge connecting to the node of one of the k -nearest neighbors. Associate with the edge a weight which is the distance between the points in the connected nodes. Digest the graph. This might include computing the shortest path between all points or figuring out how to linearly interpolate the point from it’s neighbors. Find a set of points in a low dimensional space which preserve the digested properties. Examples include LLE, Isomap (which I worked on), Hessian-LLE, SDE, and many others. The hope with these algorithms is that they can recover the low dimensional structure of point sets in high dimensional spaces. Many of them can be shown to work in interesting ways producing various compelling pictures. Despite doing some early work in this direction, I suffer from a motivational

3 0.65509623 87 hunch net-2005-06-29-Not EM for clustering at COLT

Introduction: One standard approach for clustering data with a set of gaussians is using EM. Roughly speaking, you pick a set of k random guassians and then use alternating expectation maximization to (hopefully) find a set of guassians that “explain” the data well. This process is difficult to work with because EM can become “stuck” in local optima. There are various hacks like “rerun with t different random starting points”. One cool observation is that this can often be solved via other algorithm which do not suffer from local optima. This is an early paper which shows this. Ravi Kannan presented a new paper showing this is possible in a much more adaptive setting. A very rough summary of these papers is that by projecting into a lower dimensional space, it is computationally tractable to pick out the gross structure of the data. It is unclear how well these algorithms work in practice, but they might be effective, especially if used as a subroutine of the form: Projec

4 0.61707026 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

Introduction: The prevailing wisdom in machine learning seems to be that motivating a paper is the responsibility of the author. I think this is a harmful view—instead, it’s healthier for the community to regard this as the responsibility of the reviewer. There are lots of reasons to prefer a reviewer-responsibility approach. Authors are the most biased possible source of information about the motivation of the paper. Systems which rely upon very biased sources of information are inherently unreliable. Authors are highly variable in their ability and desire to express motivation for their work. This adds greatly to variance on acceptance of an idea, and it can systematically discriminate or accentuate careers. It’s great if you have a career accentuated by awesome wording choice, but wise decision making by reviewers is important for the field. The motivation section in a paper doesn’t do anything in some sense—it’s there to get the paper in. Reading the motivation of a paper is of

5 0.57963789 185 hunch net-2006-06-16-Regularization = Robustness

Introduction: The Gibbs-Jaynes theorem is a classical result that tells us that the highest entropy distribution (most uncertain, least committed, etc.) subject to expectation constraints on a set of features is an exponential family distribution with the features as sufficient statistics. In math, argmax_p H(p) s.t. E_p[f_i] = c_i is given by e^{\sum \lambda_i f_i}/Z. (Z here is the necessary normalization constraint, and the lambdas are free parameters we set to meet the expectation constraints). A great deal of statistical mechanics flows from this result, and it has proven very fruitful in learning as well. (Motivating work in models in text learning and Conditional Random Fields, for instance. ) The result has been demonstrated a number of ways. One of the most elegant is the Ã¢â‚¬Å“geometricÃ¢â‚¬Â version here . In the case when the expectation constraints come from data, this tells us that the maximum entropy distribution is exactly the maximum likelihood distribution in the expone

6 0.5791108 411 hunch net-2010-09-21-Regretting the dead

7 0.5784108 188 hunch net-2006-06-30-ICML papers

8 0.57220608 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

9 0.55915338 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time

10 0.55337232 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

11 0.5526849 439 hunch net-2011-08-01-Interesting papers at COLT 2011

12 0.55207968 348 hunch net-2009-04-02-Asymmophobia

13 0.55171287 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms

14 0.54922986 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

15 0.54744256 420 hunch net-2010-12-26-NIPS 2010

16 0.54537946 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning

17 0.54494655 385 hunch net-2009-12-27-Interesting things at NIPS 2009

18 0.54127955 138 hunch net-2005-12-09-Some NIPS papers

19 0.5303666 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

20 0.52972281 440 hunch net-2011-08-06-Interesting thing at UAI 2011

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(10, 0.027), (27, 0.206), (38, 0.05), (51, 0.37), (53, 0.057), (55, 0.107), (77, 0.033), (94, 0.054), (95, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93501592 489 hunch net-2013-09-20-No NY ML Symposium in 2013, and some good news

Introduction: There will be no New York ML Symposium this year. The core issue is that NYAS is disorganized by people leaving, pushing back the date, with the current candidate a spring symposium on March 28. Gunnar and I were outvoted hereâ€”we were gung ho on organizing a fall symposium, but the rest of the committee wants to wait. In some good news, most of the ICML 2012 videos have been restored from a deep backup.

2 0.91622442 324 hunch net-2008-11-09-A Healthy COLT

Introduction: A while ago , we discussed the health of COLT . COLT 2008 substantially addressed my concerns. The papers were diverse and several were interesting. Attendance was up, which is particularly notable in Europe. In my opinion, the colocation with UAI and ICML was the best colocation since 1998. And, perhaps best of all, registration ended up being free for all students due to various grants from the Academy of Finland , Google , IBM , and Yahoo . A basic question is: what went right? There seem to be several answers. Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel. Organization-wise, the Finns were great with hordes of volunteers helping set everything up. Having too many volunteers is a good failure mode. Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program. Facilities-wise, proximity in time and space made

same-blog 3 0.85516596 334 hunch net-2009-01-07-Interesting Papers at SODA 2009

4 0.85243571 393 hunch net-2010-04-14-MLcomp: a website for objectively comparing ML algorithms

Introduction: Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We ( Percy and Jake ) believe there are currently a number of shortcomings: Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets. Great. But what about on other datasets? How sensitive is this result? What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster? Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset. Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing. Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options

5 0.80698752 179 hunch net-2006-05-16-The value of the orthodox view of Boosting

Introduction: The term “boosting” comes from the idea of using a meta-algorithm which takes “weak” learners (that may be able to only barely predict slightly better than random) and turn them into strongly capable learners (which predict very well). Adaboost in 1995 was the first widely used (and useful) boosting algorithm, although there were theoretical boosting algorithms floating around since 1990 (see the bottom of this page ). Since then, many different interpretations of why boosting works have arisen. There is significant discussion about these different views in the annals of statistics , including a response by Yoav Freund and Robert Schapire . I believe there is a great deal of value to be found in the original view of boosting (meta-algorithm for creating a strong learner from a weak learner). This is not a claim that one particular viewpoint obviates the value of all others, but rather that no other viewpoint seems to really capture important properties. Comparing wit

6 0.77007723 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

7 0.65873593 235 hunch net-2007-03-03-All Models of Learning have Flaws

8 0.61261743 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

9 0.59945208 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

10 0.5852412 309 hunch net-2008-07-10-Interesting papers, ICML 2008

11 0.57907885 406 hunch net-2010-08-22-KDD 2010

12 0.57556963 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006

13 0.57521158 439 hunch net-2011-08-01-Interesting papers at COLT 2011

14 0.57406586 416 hunch net-2010-10-29-To Vidoelecture or not

15 0.57164139 403 hunch net-2010-07-18-ICML & COLT 2010

16 0.5694533 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

17 0.5675227 259 hunch net-2007-08-19-Choice of Metrics

18 0.56582677 258 hunch net-2007-08-12-Exponentiated Gradient

19 0.5647198 454 hunch net-2012-01-30-ICML Posters and Scope

20 0.56470442 286 hunch net-2008-01-25-Turing’s Club for Machine Learning