hunch_net hunch_net-2005 hunch_net-2005-125 knowledge-graph by maker-knowledge-mining

125 hunch net-2005-10-20-Machine Learning in the News


meta infos for this blog

Source: html

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. [sent-1, score-1.905]

2 This is a big application area of machine learning. [sent-2, score-0.313]

3 It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). [sent-3, score-1.808]

4 This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations. [sent-4, score-1.259]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('corporations', 0.41), ('marketing', 0.41), ('demonstrates', 0.205), ('harmful', 0.205), ('interview', 0.205), ('target', 0.205), ('databases', 0.19), ('campaigns', 0.19), ('datamining', 0.19), ('annoying', 0.171), ('expectations', 0.171), ('treatment', 0.171), ('targeting', 0.164), ('beneficial', 0.158), ('government', 0.149), ('rely', 0.145), ('logic', 0.138), ('privacy', 0.133), ('school', 0.123), ('another', 0.122), ('google', 0.116), ('york', 0.113), ('upon', 0.108), ('times', 0.104), ('short', 0.097), ('large', 0.097), ('application', 0.094), ('evidence', 0.092), ('area', 0.084), ('far', 0.081), ('predict', 0.076), ('various', 0.075), ('big', 0.075), ('less', 0.069), ('yet', 0.066), ('machine', 0.06), ('used', 0.056), ('way', 0.052), ('better', 0.051), ('new', 0.036), ('learning', 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

2 0.096390948 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

Introduction: This post is about a technology which could develop in the future. Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the average outcome for those treated is measurably better than the average outcome for those not treated, the drug might become a standard treatment. Generalizing this, a filter F sorts people into two groups: those for treatment A and those not for treatment B based upon observations x . To measure the outcome, you randomize between treatment and nontreatment of group A and measure the relative performance of the treatment. A problem often arises: in many cases the treated group does not do better than the nontreated group. A basic question is: does this mean the treatment is bad? With respect to the filter F it may mean that, but with respect to another filter F’ , the treatment might be very effective. For exampl

3 0.093376003 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

4 0.089444123 406 hunch net-2010-08-22-KDD 2010

Introduction: There were several papers that seemed fairly interesting at KDD this year . The ones that caught my attention are: Xin Jin , Mingyang Zhang, Nan Zhang , and Gautam Das , Versatile Publishing For Privacy Preservation . This paper provides a conservative method for safely determining which data is publishable from any complete source of information (for example, a hospital) such that it does not violate privacy rules in a natural language. It is not differentially private, so no external sources of join information can exist. However, it is a mechanism for publishing data rather than (say) the output of a learning algorithm. Arik Friedman Assaf Schuster , Data Mining with Differential Privacy . This paper shows how to create effective differentially private decision trees. Progress in differentially private datamining is pretty impressive, as it was defined in 2006 . David Chan, Rong Ge, Ori Gershony, Tim Hesterberg , Diane Lambert , Evaluating Online Ad Camp

5 0.069528386 460 hunch net-2012-03-24-David Waltz

Introduction: has died . He lived a full life. I know him personally as a founder of the Center for Computational Learning Systems and the New York Machine Learning Symposium , both of which have sheltered and promoted the advancement of machine learning. I expect much of the New York area machine learning community will miss him, as well as many others around the world.

6 0.069248423 260 hunch net-2007-08-25-The Privacy Problem

7 0.068836108 344 hunch net-2009-02-22-Effective Research Funding

8 0.067974344 411 hunch net-2010-09-21-Regretting the dead

9 0.056097008 423 hunch net-2011-02-02-User preferences for search engines

10 0.052452981 4 hunch net-2005-01-26-Summer Schools

11 0.050374255 178 hunch net-2006-05-08-Big machine learning

12 0.049694512 256 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

13 0.049252637 429 hunch net-2011-04-06-COLT open questions

14 0.047934636 335 hunch net-2009-01-08-Predictive Analytics World

15 0.047165532 282 hunch net-2008-01-06-Research Political Issues

16 0.046703458 217 hunch net-2006-11-06-Data Linkage Problems

17 0.042897537 132 hunch net-2005-11-26-The Design of an Optimal Research Environment

18 0.039693136 437 hunch net-2011-07-10-ICML 2011 and the future

19 0.039221328 136 hunch net-2005-12-07-Is the Google way the way for machine learning?

20 0.039080158 456 hunch net-2012-02-24-ICML+50%


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.079), (1, -0.017), (2, -0.049), (3, 0.029), (4, -0.022), (5, -0.013), (6, -0.025), (7, 0.032), (8, -0.036), (9, -0.063), (10, -0.018), (11, 0.026), (12, 0.027), (13, -0.009), (14, -0.062), (15, 0.007), (16, -0.024), (17, 0.03), (18, 0.049), (19, -0.008), (20, 0.012), (21, -0.025), (22, 0.007), (23, 0.033), (24, -0.112), (25, 0.002), (26, 0.006), (27, 0.078), (28, 0.046), (29, -0.041), (30, -0.032), (31, -0.02), (32, -0.042), (33, 0.033), (34, -0.02), (35, 0.025), (36, 0.022), (37, -0.016), (38, 0.024), (39, -0.025), (40, 0.064), (41, -0.03), (42, -0.024), (43, -0.035), (44, -0.027), (45, 0.026), (46, 0.028), (47, -0.029), (48, -0.048), (49, -0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9142009 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

2 0.64499533 260 hunch net-2007-08-25-The Privacy Problem

Introduction: Machine Learning is rising in importance because data is being collected for all sorts of tasks where it either wasn’t previously collected, or for tasks that did not previously exist. While this is great for Machine Learning, it has a downside—the massive data collection which is so useful can also lead to substantial privacy problems. It’s important to understand that this is a much harder problem than many people appreciate. The AOL data release is a good example. To those doing machine learning, the following strategies might be obvious: Just delete any names or other obviously personally identifiable information. The logic here seems to be “if I can’t easily find the person then no one can”. That doesn’t work as demonstrated by the people who were found circumstantially from the AOL data. … then just hash all the search terms! The logic here is “if I can’t read it, then no one can”. It’s also trivially broken by a dictionary attack—just hash all the strings

3 0.54647833 390 hunch net-2010-03-12-Netflix Challenge 2 Canceled

Introduction: The second Netflix prize is canceled due to privacy problems . I continue to believe my original assessment of this paper, that the privacy break was somewhat overstated. I still haven’t seen any serious privacy failures on the scale of the AOL search log release . I expect privacy concerns to continue to be a big issue when dealing with data releases by companies or governments. The theory of maintaining privacy while using data is improving, but it is not yet in a state where the limits of what’s possible are clear let alone how to achieve these limits in a manner friendly to a prediction competition.

4 0.54033381 475 hunch net-2012-10-26-ML Symposium and Strata-Hadoop World

Introduction: The New York ML symposium was last Friday. There were 303 registrations, up a bit from last year . I particularly enjoyed talks by Bill Freeman on vision and ML, Jon Lenchner on strategy in Jeopardy, and Tara N. Sainath and Brian Kingsbury on deep learning for speech recognition . If anyone has suggestions or thoughts for next year, please speak up. I also attended Strata + Hadoop World for the first time. This is primarily a trade conference rather than an academic conference, but I found it pretty interesting as a first time attendee. This is ground zero for the Big data buzzword, and I see now why. It’s about data, and the word “big” is so ambiguous that everyone can lay claim to it. There were essentially zero academic talks. Instead, the focus was on war stories, product announcements, and education. The general level of education is much lower—explaining Machine Learning to the SQL educated is the primary operating point. Nevertheless that’s happening, a

5 0.52071053 423 hunch net-2011-02-02-User preferences for search engines

Introduction: I want to comment on the “Bing copies Google” discussion here , here , and here , because there are data-related issues which the general public may not understand, and some of the framing seems substantially misleading to me. As a not-distant-outsider, let me mention the sources of bias I may have. I work at Yahoo! , which has started using Bing . This might predispose me towards Bing, but on the other hand I’m still at Yahoo!, and have been using Linux exclusively as an OS for many years, including even a couple minor kernel patches. And, on the gripping hand , I’ve spent quite a bit of time thinking about the basic principles of incorporating user feedback in machine learning . Also note, this post is not related to official Yahoo! policy, it’s just my personal view. The issue Google engineers inserted synthetic responses to synthetic queries on google.com, then executed the synthetic searches on google.com using Internet Explorer with the Bing toolbar and later

6 0.50702357 406 hunch net-2010-08-22-KDD 2010

7 0.49047682 171 hunch net-2006-04-09-Progress in Machine Translation

8 0.48704582 193 hunch net-2006-07-09-The Stock Prediction Machine Learning Problem

9 0.47623208 415 hunch net-2010-10-28-NY ML Symposium 2010

10 0.47188783 178 hunch net-2006-05-08-Big machine learning

11 0.46718454 411 hunch net-2010-09-21-Regretting the dead

12 0.4664762 314 hunch net-2008-08-24-Mass Customized Medicine in the Future?

13 0.46625108 106 hunch net-2005-09-04-Science in the Government

14 0.46440423 462 hunch net-2012-04-20-Both new: STOC workshops and NEML

15 0.45452616 241 hunch net-2007-04-28-The Coming Patent Apocalypse

16 0.43524408 460 hunch net-2012-03-24-David Waltz

17 0.41986811 455 hunch net-2012-02-20-Berkeley Streaming Data Workshop

18 0.41862354 471 hunch net-2012-08-24-Patterns for research in machine learning

19 0.41857123 322 hunch net-2008-10-20-New York’s ML Day

20 0.41665977 369 hunch net-2009-08-27-New York Area Machine Learning Events


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.049), (38, 0.779), (94, 0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96709025 125 hunch net-2005-10-20-Machine Learning in the News

Introduction: The New York Times had a short interview about machine learning in datamining being used pervasively by the IRS and large corporations to predict who to audit and who to target for various marketing campaigns. This is a big application area of machine learning. It can be harmful (learning + databases = another way to invade privacy) or beneficial (as google demonstrates, better targeting of marketing campaigns is far less annoying). This is yet more evidence that we can not rely upon “I’m just another fish in the school” logic for our expectations about treatment by government and large corporations.

2 0.91746873 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

Introduction: Manik and I are organizing the extreme classification workshop at NIPS this year. We have a number of good speakers lined up, but I would further encourage anyone working in the area to submit an abstract by October 9. I believe this is an idea whose time has now come. The NIPS website doesn’t have other workshops listed yet, but I expect several others to be of significant interest.

3 0.88776189 339 hunch net-2009-01-27-Key Scientific Challenges

Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…

4 0.84480524 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

5 0.83492619 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

6 0.78306961 170 hunch net-2006-04-06-Bounds greater than 1

7 0.67269313 233 hunch net-2007-02-16-The Forgetting

8 0.662678 353 hunch net-2009-05-08-Computability in Artificial Intelligence

9 0.61800015 284 hunch net-2008-01-18-Datasets

10 0.61351597 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

11 0.60429466 239 hunch net-2007-04-18-$50K Spock Challenge

12 0.59603125 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

13 0.48612747 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

14 0.42596897 26 hunch net-2005-02-21-Problem: Cross Validation

15 0.3903752 82 hunch net-2005-06-17-Reopening RL->Classification

16 0.38710085 409 hunch net-2010-09-13-AIStats

17 0.36582097 19 hunch net-2005-02-14-Clever Methods of Overfitting

18 0.36122549 49 hunch net-2005-03-30-What can Type Theory teach us about Machine Learning?

19 0.34542128 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

20 0.3449696 131 hunch net-2005-11-16-The Everything Ensemble Edge