hunch_net hunch_net-2011 hunch_net-2011-444 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
sentIndex sentText sentNum sentScore
1 At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. [sent-1, score-0.133]
2 However, the most interesting talk for me was David Haussler ‘s. [sent-2, score-0.133]
3 His talk started out with a formidable load of biological complexity. [sent-3, score-0.133]
4 The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. [sent-6, score-0.335]
5 However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. [sent-7, score-0.67]
6 Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example cache to speedup learning. [sent-8, score-0.227]
7 The approach here uses a combination of random projection and partition which appears to be compelling for some nonlinear and relatively high computation settings. [sent-11, score-0.281]
8 The paper explores an interesting idea: having lots of weight vectors (effectively infinity) associated with a particular label, showing that algorithms on this representation can deal with lots of data as per linear predictors, but with superior-to-linear performance. [sent-14, score-0.556]
9 The authors don’t use the hashing trick , but their representation is begging for it. [sent-15, score-0.236]
10 This is about email spam filtering, where the authors use a theory of adversarial equilibria to construct a more robust filter, at least in some cases. [sent-17, score-0.502]
11 Demonstrating this on noninteractive data is inherently difficult. [sent-18, score-0.195]
12 This is about helping people organize their email with machine learning. [sent-25, score-0.136]
13 I also attended MUCMD , a workshop on the Meaningful Use of Complex Medical Data shortly afterwards. [sent-29, score-0.089]
14 This workshop is about the emergent area of using data to improve medicine. [sent-30, score-0.284]
15 The combination of electronic health records , the economic importance of getting medicine right, and the relatively weak use of existing data implies there is much good work to do. [sent-31, score-0.449]
16 This finally gave us a chance to discuss radically superior medical trial designs based on work in exploration and learning Jeff Hammerbacher ‘s talk was a hilarilously blunt and well stated monologue about the need and how to gather data in a usable way. [sent-32, score-1.028]
17 Amongst the talks on using medical data, Suchi Saria ‘s seemed the most mature. [sent-33, score-0.283]
18 They’ve constructed a noninvasive test for problem infants which is radically superior to the existing Apgar score according to leave-one-out cross validation . [sent-34, score-0.2]
19 From the doctor’s side, there was discussion of the deep balkanization of data systems within hospitals, efforts to overcome that, and the (un)trustworthiness of data. [sent-35, score-0.272]
20 Overall, the workshop went well, with the broad cross-section of talks providing quite a bit of extra context you don’t normally see. [sent-37, score-0.171]
wordName wordTfidf (topN-words)
[('medical', 0.201), ('data', 0.195), ('maarek', 0.172), ('mucmd', 0.172), ('yoelle', 0.172), ('koren', 0.153), ('cancer', 0.153), ('adversarial', 0.138), ('email', 0.136), ('talk', 0.133), ('yehuda', 0.128), ('complex', 0.111), ('nonlinear', 0.108), ('lots', 0.103), ('superior', 0.101), ('radically', 0.099), ('relatively', 0.092), ('workshop', 0.089), ('michael', 0.088), ('talks', 0.082), ('use', 0.081), ('combination', 0.081), ('effectively', 0.08), ('representation', 0.079), ('within', 0.077), ('composing', 0.076), ('wilkinson', 0.076), ('cache', 0.076), ('wang', 0.076), ('iterated', 0.076), ('roth', 0.076), ('blunt', 0.076), ('crammer', 0.076), ('detecting', 0.076), ('koby', 0.076), ('designs', 0.076), ('demonstrating', 0.076), ('monologue', 0.076), ('explores', 0.076), ('doctor', 0.076), ('chang', 0.076), ('gideon', 0.076), ('jeff', 0.076), ('believing', 0.076), ('authors', 0.076), ('creating', 0.073), ('usable', 0.071), ('speedup', 0.071), ('equilibria', 0.071), ('attacking', 0.071)]
simIndex simValue blogId blogTitle
same-blog 1 1.000001 444 hunch net-2011-09-07-KDD and MUCMD 2011
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
2 0.13643788 454 hunch net-2012-01-30-ICML Posters and Scope
Introduction: Normally, I don’t indulge in posters for ICML , but this year is naturally an exception for me. If you want one, there are a small number left here , if you sign up before February. It also seems worthwhile to give some sense of the scope and reviewing criteria for ICML for authors considering submitting papers. At ICML, the (very large) program committee does the reviewing which informs final decisions by area chairs on most papers. Program chairs setup the process, deal with exceptions or disagreements, and provide advice for the reviewing process. Providing advice is tricky (and easily misleading) because a conference is a community, and in the end the aggregate interests of the community determine the conference. Nevertheless, as a program chair this year it seems worthwhile to state the overall philosophy I have and what I plan to encourage (and occasionally discourage). At the highest level, I believe ICML exists to further research into machine learning, which I gene
3 0.13440242 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto
4 0.12458973 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
Introduction: One conventional wisdom is that learning algorithms with linear representations are sufficient to solve natural learning problems. This conventional wisdom appears unsupported by empirical evidence as far as I can tell. In nearly all vision, language, robotics, and speech applications I know where machine learning is effectively applied, the approach involves either a linear representation on hand crafted features capturing substantial nonlinearities or learning directly on nonlinear representations. There are a few exceptions to this—for example, if the problem of interest to you is predicting the next word given previous words, n-gram methods have been shown effective. Viewed the right way, n-gram methods are essentially linear predictors on an enormous sparse feature space, learned from an enormous number of examples. Hal’s post here describes some of this in more detail. In contrast, if you go to a machine learning conference, a large number of the new algorithms are v
5 0.11366428 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
Introduction: Compressed Sensing (CS) is a new framework developed by Emmanuel Candes , Terry Tao and David Donoho . To summarize, if you acquire a signal in some basis that is incoherent with the basis in which you know the signal to be sparse in, it is very likely you will be able to reconstruct the signal from these incoherent projections. Terry Tao, the recent Fields medalist , does a very nice job at explaining the framework here . He goes further in the theory description in this post where he mentions the central issue of the Uniform Uncertainty Principle. It so happens that random projections are on average incoherent, within the UUP meaning, with most known basis (sines, polynomials, splines, wavelets, curvelets …) and are therefore an ideal basis for Compressed Sensing. [ For more in-depth information on the subject, the Rice group has done a very good job at providing a central library of papers relevant to the growing subject: http://www.dsp.ece.rice.edu/cs/ ] The Machine
6 0.11225678 470 hunch net-2012-07-17-MUCMD and BayLearn
7 0.1120394 156 hunch net-2006-02-11-Yahoo’s Learning Problems.
8 0.11030155 406 hunch net-2010-08-22-KDD 2010
9 0.11018376 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
10 0.10870866 403 hunch net-2010-07-18-ICML & COLT 2010
11 0.10523346 420 hunch net-2010-12-26-NIPS 2010
12 0.10232482 26 hunch net-2005-02-21-Problem: Cross Validation
13 0.099457666 332 hunch net-2008-12-23-Use of Learning Theory
14 0.097831063 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
15 0.09691859 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class
16 0.096415795 385 hunch net-2009-12-27-Interesting things at NIPS 2009
17 0.093604766 364 hunch net-2009-07-11-Interesting papers at KDD
18 0.091831796 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
19 0.091537602 127 hunch net-2005-11-02-Progress in Active Learning
20 0.091350958 437 hunch net-2011-07-10-ICML 2011 and the future
topicId topicWeight
[(0, 0.251), (1, 0.0), (2, -0.067), (3, -0.041), (4, 0.095), (5, 0.056), (6, -0.024), (7, -0.032), (8, 0.005), (9, -0.101), (10, -0.053), (11, 0.04), (12, -0.078), (13, 0.053), (14, -0.065), (15, -0.02), (16, -0.091), (17, 0.088), (18, -0.029), (19, -0.021), (20, -0.001), (21, -0.095), (22, 0.025), (23, 0.028), (24, 0.043), (25, -0.027), (26, 0.012), (27, -0.007), (28, 0.044), (29, 0.001), (30, -0.063), (31, -0.05), (32, -0.016), (33, 0.083), (34, 0.022), (35, 0.097), (36, -0.006), (37, -0.004), (38, -0.026), (39, -0.071), (40, 0.06), (41, 0.106), (42, 0.045), (43, -0.094), (44, 0.011), (45, -0.035), (46, -0.062), (47, 0.036), (48, -0.091), (49, -0.046)]
simIndex simValue blogId blogTitle
same-blog 1 0.96836001 444 hunch net-2011-09-07-KDD and MUCMD 2011
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
2 0.70737541 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an
3 0.58340061 144 hunch net-2005-12-28-Yet more nips thoughts
Introduction: I only managed to make it out to the NIPS workshops this year so I’ll give my comments on what I saw there. The Learing and Robotics workshops lives again. I hope it continues and gets more high quality papers in the future. The most interesting talk for me was Larry Jackel’s on the LAGR program (see John’s previous post on said program). I got some ideas as to what progress has been made. Larry really explained the types of benchmarks and the tradeoffs that had to be made to make the goals achievable but challenging. Hal Daume gave a very interesting talk about structured prediction using RL techniques, something near and dear to my own heart. He achieved rather impressive results using only a very greedy search. The non-parametric Bayes workshop was great. I enjoyed the entire morning session I spent there, and particularly (the usually desultory) discussion periods. One interesting topic was the Gibbs/Variational inference divide. I won’t try to summarize espe
4 0.57438338 337 hunch net-2009-01-21-Nearly all natural problems require nonlinearity
Introduction: One conventional wisdom is that learning algorithms with linear representations are sufficient to solve natural learning problems. This conventional wisdom appears unsupported by empirical evidence as far as I can tell. In nearly all vision, language, robotics, and speech applications I know where machine learning is effectively applied, the approach involves either a linear representation on hand crafted features capturing substantial nonlinearities or learning directly on nonlinear representations. There are a few exceptions to this—for example, if the problem of interest to you is predicting the next word given previous words, n-gram methods have been shown effective. Viewed the right way, n-gram methods are essentially linear predictors on an enormous sparse feature space, learned from an enormous number of examples. Hal’s post here describes some of this in more detail. In contrast, if you go to a machine learning conference, a large number of the new algorithms are v
5 0.57312357 420 hunch net-2010-12-26-NIPS 2010
Introduction: I enjoyed attending NIPS this year, with several things interesting me. For the conference itself: Peter Welinder , Steve Branson , Serge Belongie , and Pietro Perona , The Multidimensional Wisdom of Crowds . This paper is about using mechanical turk to get label information, with results superior to a majority vote approach. David McAllester , Tamir Hazan , and Joseph Keshet Direct Loss Minimization for Structured Prediction . This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades . This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested. Alan Fern and Prasad Tadepalli , A Computational Decision Theory for Interactive Assistants . This paper carves out some
6 0.57134515 260 hunch net-2007-08-25-The Privacy Problem
7 0.56373757 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features
8 0.55720419 248 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?
9 0.55309492 406 hunch net-2010-08-22-KDD 2010
10 0.53928411 408 hunch net-2010-08-24-Alex Smola starts a blog
11 0.53171253 61 hunch net-2005-04-25-Embeddings: what are they good for?
12 0.52929842 475 hunch net-2012-10-26-ML Symposium and Strata-Hadoop World
13 0.52786273 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
14 0.52251577 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop
15 0.5206694 77 hunch net-2005-05-29-Maximum Margin Mismatch?
16 0.50994778 319 hunch net-2008-10-01-NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’
17 0.50870442 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
18 0.5080176 143 hunch net-2005-12-27-Automated Labeling
19 0.50382531 87 hunch net-2005-06-29-Not EM for clustering at COLT
20 0.49764979 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility
topicId topicWeight
[(3, 0.011), (10, 0.028), (27, 0.119), (30, 0.288), (33, 0.013), (38, 0.075), (48, 0.023), (49, 0.027), (53, 0.091), (55, 0.096), (56, 0.016), (64, 0.017), (68, 0.017), (77, 0.013), (94, 0.066), (95, 0.027)]
simIndex simValue blogId blogTitle
1 0.91503841 189 hunch net-2006-07-05-more icml papers
Introduction: Here are a few other papers I enjoyed from ICML06. Topic Models: Dynamic Topic Models David Blei, John Lafferty A nice model for how topics in LDA type models can evolve over time, using a linear dynamical system on the natural parameters and a very clever structured variational approximation (in which the mean field parameters are pseudo-observations of a virtual LDS). Like all Blei papers, he makes it look easy, but it is extremely impressive. Pachinko Allocation Wei Li, Andrew McCallum A very elegant (but computationally challenging) model which induces correlation amongst topics using a multi-level DAG whose interior nodes are “super-topics” and “sub-topics” and whose leaves are the vocabulary words. Makes the slumbering monster of structure learning stir. Sequence Analysis (I missed these talks since I was chairing another session) Online Decoding of Markov Models with Latency Constraints Mukund Narasimhan, Paul Viola, Michael Shilman An “a
2 0.88759106 364 hunch net-2009-07-11-Interesting papers at KDD
Introduction: I attended KDD this year. The conference has always had a strong grounding in what works based on the KDDcup , but it has developed a halo of workshops on various subjects. It seems that KDD has become a place where the economy meets machine learning in a stronger sense than many other conferences. There were several papers that other people might like to take a look at. Yehuda Koren Collaborative Filtering with Temporal Dynamics . This paper describes how to incorporate temporal dynamics into a couple of collaborative filtering approaches. This was also a best paper award. D. Sculley , Robert Malkin, Sugato Basu , Roberto J. Bayardo , Predicting Bounce Rates in Sponsored Search Advertisements . The basic claim of this paper is that the probability people immediately leave (“bounce”) after clicking on an advertisement is predictable. Frank McSherry and Ilya Mironov Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contende
3 0.86247659 154 hunch net-2006-02-04-Research Budget Changes
Introduction: The announcement of an increase in funding for basic research in the US is encouraging. There is some discussion of this at the Computing Research Policy blog. One part of this discussion has a graph of NSF funding over time, presumably in dollar budgets. I don’t believe that dollar budgets are the right way to judge the impact of funding changes on researchers. A better way to judge seems to be in terms of dollar budget divided by GDP which provides a measure of the relative emphasis on research. This graph was assembled by dividing the NSF budget by the US GDP . For 2005 GDP, I used the current estimate and for 2006 and 2007 assumed an increase by a factor of 1.04 per year. The 2007 number also uses the requested 2007 budget which is certain to change. This graph makes it clear why researchers were upset: research funding emphasis has fallen for 3 years in a row. The reality has been significantly more severe due to DARPA decreasing funding and industrial
same-blog 4 0.86114162 444 hunch net-2011-09-07-KDD and MUCMD 2011
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
5 0.82262415 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop
Introduction: The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.
6 0.81724346 292 hunch net-2008-03-15-COLT Open Problems
7 0.73698086 85 hunch net-2005-06-28-A COLT paper
8 0.62319034 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
9 0.5601626 297 hunch net-2008-04-22-Taking the next step
10 0.54683864 466 hunch net-2012-06-05-ICML acceptance statistics
11 0.54598379 77 hunch net-2005-05-29-Maximum Margin Mismatch?
12 0.54572231 140 hunch net-2005-12-14-More NIPS Papers II
13 0.54374385 437 hunch net-2011-07-10-ICML 2011 and the future
14 0.53739369 454 hunch net-2012-01-30-ICML Posters and Scope
15 0.53732985 204 hunch net-2006-08-28-Learning Theory standards for NIPS 2006
16 0.53526068 438 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
17 0.53469092 151 hunch net-2006-01-25-1 year
18 0.53295583 423 hunch net-2011-02-02-User preferences for search engines
19 0.53261572 141 hunch net-2005-12-17-Workshops as Franchise Conferences
20 0.5322395 19 hunch net-2005-02-14-Clever Methods of Overfitting