hunch_net hunch_net-2010 hunch_net-2010-406 knowledge-graph by maker-knowledge-mining

406 hunch net-2010-08-22-KDD 2010


meta infos for this blog

Source: html

Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There were several papers that seemed fairly interesting at KDD this year . [sent-1, score-0.176]

2 This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. [sent-3, score-0.485]

3 It is not differentially private, so no external sources of join information can exist. [sent-4, score-0.274]

4 However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. [sent-5, score-0.26]

5 This paper shows how to create effective differentially private decision trees. [sent-7, score-0.782]

6 Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . [sent-8, score-0.683]

7 David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Campaigns in a Pipeline: Causal Models At Scale This paper is about automated estimation of ad campaign effectiveness. [sent-9, score-0.408]

8 The double robust estimation technique seems intuitively appealing and plausibly greatly enhances effectiveness. [sent-10, score-0.206]

9 Optimizing Debt Collections Using Constrained Reinforcement Learning This is an application paper about optimizing the New York State income tax collection agency. [sent-12, score-0.416]

10 As you might expect, there are several cludgy aspects due to working within legal and organizational constraints. [sent-13, score-0.181]

11 Too bad I live in NY Vikas C Raykar , Balaji Krishnapuram , and Shinpeng Yu Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost This paper is about a continuization based solution to designing a cost-efficient yet accurate classifier cascade. [sent-15, score-0.246]

12 This paper shows a simple combined-loss approach to getting both which empirically improves on either metric. [sent-20, score-0.326]

13 In addition, I enjoyed Konrad Feldman ‘s invited talk on Quantcast ‘s data and learning systems which sounded pretty slick. [sent-21, score-0.218]

14 The work on empirically effective privacy-preserving algorithms and some of the stats-work is ahead of what I’ve seen at other machine learning conferences. [sent-23, score-0.268]

15 Presumably this is due to KDD being closer to the business side of machine learning and hence more aware of what are real problems there. [sent-24, score-0.09]

16 An annoying aspect of KDD as a publishing venue is that they don’t put the papers on the conference website, due to ACM constraints. [sent-25, score-0.436]

17 A substantial compensation is that all talks are scheduled to appear on videolectures. [sent-26, score-0.183]

18 net and, as you can see, most papers can be found on author webpages. [sent-27, score-0.094]

19 KDD also experimented with crowdvine again this year so people could announce which talks they were interested in and setup meetings. [sent-28, score-0.351]

20 Small changes in the interface might make a big difference—for example, just providing a ranking of papers by interest might make it pretty compelling. [sent-30, score-0.337]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('differentially', 0.274), ('kdd', 0.232), ('private', 0.199), ('privacy', 0.177), ('publishing', 0.167), ('ad', 0.152), ('ny', 0.152), ('zhang', 0.133), ('paper', 0.13), ('estimation', 0.126), ('pretty', 0.125), ('ranking', 0.118), ('designing', 0.116), ('optimizing', 0.11), ('empirically', 0.1), ('talks', 0.098), ('shows', 0.096), ('papers', 0.094), ('data', 0.093), ('preservation', 0.091), ('income', 0.091), ('viola', 0.091), ('legal', 0.091), ('tim', 0.091), ('crowdvine', 0.091), ('causal', 0.091), ('vikas', 0.091), ('yu', 0.091), ('due', 0.09), ('state', 0.085), ('venue', 0.085), ('campaigns', 0.085), ('datamining', 0.085), ('tax', 0.085), ('ahead', 0.085), ('scheduled', 0.085), ('hospital', 0.085), ('debt', 0.085), ('publishable', 0.085), ('lambert', 0.085), ('effective', 0.083), ('year', 0.082), ('appealing', 0.08), ('pushed', 0.08), ('et', 0.08), ('announce', 0.08), ('acm', 0.08), ('sculley', 0.08), ('differential', 0.08), ('estimated', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 406 hunch net-2010-08-22-KDD 2010

Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp

2 0.21101551 364 hunch net-2009-07-11-Interesting papers at KDD

Introduction: I attended KDD this year. The conference has always had a strong grounding in what works based on the KDDcup , but it has developed a halo of workshops on various subjects. It seems that KDD has become a place where the economy meets machine learning in a stronger sense than many other conferences. There were several papers that other people might like to take a look at. Yehuda Koren Collaborative Filtering with Temporal Dynamics . This paper describes how to incorporate temporal dynamics into a couple of collaborative filtering approaches. This was also a best paper award. D. Sculley , Robert Malkin, Sugato Basu , Roberto J. Bayardo , Predicting Bounce Rates in Sponsored Search Advertisements . The basic claim of this paper is that the probability people immediately leave (“bounce”) after clicking on an advertisement is predictable. Frank McSherry and Ilya Mironov Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contende

3 0.16306426 452 hunch net-2012-01-04-Why ICML? and the summer conferences

Introduction: Here’s a quick reference for summer ML-related conferences sorted by due date: Conference Due date Location Reviewing KDD Feb 10 August 12-16, Beijing, China Single Blind COLT Feb 14 June 25-June 27, Edinburgh, Scotland Single Blind? (historically) ICML Feb 24 June 26-July 1, Edinburgh, Scotland Double Blind, author response, zero SPOF UAI March 30 August 15-17, Catalina Islands, California Double Blind, author response Geographically, this is greatly dispersed and the UAI/KDD conflict is unfortunate. Machine Learning conferences are triannual now, between NIPS , AIStat , and ICML . This has not always been the case: the academic default is annual summer conferences, then NIPS started with a December conference, and now AIStat has grown into an April conference. However, the first claim is not quite correct. NIPS and AIStat have few competing venues while ICML implicitly competes with many other conf

4 0.16120297 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

5 0.13317721 260 hunch net-2007-08-25-The Privacy Problem

Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings

6 0.13161792 420 hunch net-2010-12-26-NIPS 2010

7 0.12896873 304 hunch net-2008-06-27-Reviewing Horror Stories

8 0.12327796 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

9 0.11708277 437 hunch net-2011-07-10-ICML 2011 and the future

10 0.11680467 454 hunch net-2012-01-30-ICML Posters and Scope

11 0.11030155 444 hunch net-2011-09-07-KDD and MUCMD 2011

12 0.10583309 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

13 0.10533655 439 hunch net-2011-08-01-Interesting papers at COLT 2011

14 0.1029337 97 hunch net-2005-07-23-Interesting papers at ACL

15 0.10213138 430 hunch net-2011-04-11-The Heritage Health Prize

16 0.10003561 403 hunch net-2010-07-18-ICML & COLT 2010

17 0.097885437 456 hunch net-2012-02-24-ICML+50%

18 0.095534243 247 hunch net-2007-06-14-Interesting Papers at COLT 2007

19 0.095485508 416 hunch net-2010-10-29-To Vidoelecture or not

20 0.094908051 447 hunch net-2011-10-10-ML Symposium and ICML details


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.256), (1, -0.07), (2, -0.006), (3, -0.055), (4, 0.055), (5, -0.006), (6, -0.06), (7, -0.003), (8, -0.036), (9, -0.094), (10, -0.007), (11, 0.118), (12, -0.104), (13, 0.029), (14, -0.055), (15, 0.02), (16, -0.047), (17, 0.046), (18, 0.138), (19, -0.058), (20, 0.05), (21, -0.04), (22, 0.024), (23, 0.055), (24, -0.081), (25, 0.016), (26, 0.027), (27, 0.063), (28, -0.041), (29, -0.014), (30, -0.058), (31, -0.006), (32, -0.095), (33, -0.01), (34, 0.038), (35, -0.079), (36, 0.074), (37, 0.006), (38, 0.138), (39, 0.026), (40, 0.085), (41, 0.02), (42, -0.003), (43, 0.007), (44, -0.065), (45, 0.008), (46, -0.083), (47, -0.073), (48, -0.098), (49, -0.113)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96510834 406 hunch net-2010-08-22-KDD 2010

Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp

2 0.82785428 364 hunch net-2009-07-11-Interesting papers at KDD

Introduction: I attended KDD this year. The conference has always had a strong grounding in what works based on the KDDcup , but it has developed a halo of workshops on various subjects. It seems that KDD has become a place where the economy meets machine learning in a stronger sense than many other conferences. There were several papers that other people might like to take a look at. Yehuda Koren Collaborative Filtering with Temporal Dynamics . This paper describes how to incorporate temporal dynamics into a couple of collaborative filtering approaches. This was also a best paper award. D. Sculley , Robert Malkin, Sugato Basu , Roberto J. Bayardo , Predicting Bounce Rates in Sponsored Search Advertisements . The basic claim of this paper is that the probability people immediately leave (“bounce”) after clicking on an advertisement is predictable. Frank McSherry and Ilya Mironov Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contende

3 0.71683145 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

4 0.66804057 260 hunch net-2007-08-25-The Privacy Problem

Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings

5 0.61941701 420 hunch net-2010-12-26-NIPS 2010

Introduction: I enjoyed attending NIPS this year, with several things interesting me. For the conference itself: Peter Welinder , Steve Branson , Serge Belongie , and Pietro Perona , The Multidimensional Wisdom of Crowds . This paper is about using mechanical turk to get label information, with results superior to a majority vote approach. David McAllester , Tamir Hazan , and Joseph Keshet Direct Loss Minimization for Structured Prediction . This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition. Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades . This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested. Alan Fern and Prasad Tadepalli , A Computational Decision Theory for Interactive Assistants . This paper carves out some

6 0.59548581 125 hunch net-2005-10-20-Machine Learning in the News

7 0.59353834 475 hunch net-2012-10-26-ML Symposium and Strata-Hadoop World

8 0.58916199 200 hunch net-2006-08-03-AOL’s data drop

9 0.57753348 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

10 0.5699681 423 hunch net-2011-02-02-User preferences for search engines

11 0.5625357 444 hunch net-2011-09-07-KDD and MUCMD 2011

12 0.53621238 199 hunch net-2006-07-26-Two more UAI papers of interest

13 0.53335708 452 hunch net-2012-01-04-Why ICML? and the summer conferences

14 0.52913713 439 hunch net-2011-08-01-Interesting papers at COLT 2011

15 0.52792817 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

16 0.52778995 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

17 0.5184086 97 hunch net-2005-07-23-Interesting papers at ACL

18 0.51467603 188 hunch net-2006-06-30-ICML papers

19 0.51104277 335 hunch net-2009-01-08-Predictive Analytics World

20 0.5007875 403 hunch net-2010-07-18-ICML & COLT 2010


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.014), (10, 0.026), (13, 0.227), (27, 0.229), (38, 0.074), (42, 0.036), (51, 0.032), (53, 0.044), (55, 0.107), (64, 0.016), (94, 0.037), (95, 0.08)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94026262 386 hunch net-2010-01-13-Sam Roweis died

Introduction: and I can’t help but remember him. I first met Sam as an undergraduate at Caltech where he was TA for Hopfield ‘s class, and again when I visited Gatsby , when he invited me to visit Toronto , and at too many conferences to recount. His personality was a combination of enthusiastic and thoughtful, with a great ability to phrase a problem so it’s solution must be understood. With respect to my own work, Sam was the one who advised me to make my first tutorial , leading to others, and to other things, all of which I’m grateful to him for. In fact, my every interaction with Sam was positive, and that was his way. His death is being called a suicide which is so incompatible with my understanding of Sam that it strains my credibility. But we know that his many responsibilities were great, and it is well understood that basically all sane researchers have legions of inner doubts. Having been depressed now and then myself, it’s helpful to understand at least intellectually

2 0.92112881 212 hunch net-2006-10-04-Health of Conferences Wiki

Introduction: Aaron Hertzmann points out the health of conferences wiki , which has a great deal of information about how many different conferences function.

3 0.91501355 137 hunch net-2005-12-09-Machine Learning Thoughts

Introduction: I added a link to Olivier Bousquet’s machine learning thoughts blog. Several of the posts may be of interest.

4 0.91120696 117 hunch net-2005-10-03-Not ICML

Introduction: Alex Smola showed me this ICML 2006 webpage. This is NOT the ICML we know, but rather some people at “Enformatika”. Investigation shows that they registered with an anonymous yahoo email account from dotregistrar.com the “Home of the $6.79 wholesale domain!” and their nameservers are by Turkticaret , a Turkish internet company. It appears the website has since been altered to “ ICNL ” (the above link uses the google cache). They say that imitation is the sincerest form of flattery, so the organizers of the real ICML 2006 must feel quite flattered.

same-blog 5 0.8808679 406 hunch net-2010-08-22-KDD 2010

Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp

6 0.86502147 492 hunch net-2013-12-01-NIPS tutorials and Vowpal Wabbit 7.4

7 0.84365165 51 hunch net-2005-04-01-The Producer-Consumer Model of Research

8 0.77331823 312 hunch net-2008-08-04-Electoralmarkets.com

9 0.76956701 214 hunch net-2006-10-13-David Pennock starts Oddhead

10 0.75483835 343 hunch net-2009-02-18-Decision by Vetocracy

11 0.75061536 194 hunch net-2006-07-11-New Models

12 0.74681276 89 hunch net-2005-07-04-The Health of COLT

13 0.74452817 36 hunch net-2005-03-05-Funding Research

14 0.74296534 360 hunch net-2009-06-15-In Active Learning, the question changes

15 0.7424072 225 hunch net-2007-01-02-Retrospective

16 0.7420727 466 hunch net-2012-06-05-ICML acceptance statistics

17 0.74154419 235 hunch net-2007-03-03-All Models of Learning have Flaws

18 0.73636407 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

19 0.73626369 464 hunch net-2012-05-03-Microsoft Research, New York City

20 0.73539388 309 hunch net-2008-07-10-Interesting papers, ICML 2008