nips nips2009 nips2009-27 nips2009-27-reference knowledge-graph by maker-knowledge-mining

27 nips-2009-Adaptive Regularization of Weight Vectors


Source: pdf

Author: Koby Crammer, Alex Kulesza, Mark Dredze

Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1


reference text

[1] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.

[2] Nicol´ Cesa-Bianchi, Alex Conconi, and Claudio Gentile. A second-order perceptron o algorithm. Siam J. of Comm., 34, 2005.

[3] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551– 585, 2006.

[4] Koby Crammer, Mark Dredze, and Alex Kulesza. Multi-class confidence weighted algorithms. In Empirical Methods in Natural Language Processing (EMNLP), 2009.

[5] Koby Crammer, Mark Dredze, and Fernando Pereira. Exact convex confidence-weighted learning. In Neural Information Processing Systems (NIPS), 2008.

[6] Mark Dredze, Koby Crammer, and Fernando Pereira. Confidence-weighted linear classification. In International Conference on Machine Learning, 2008.

[7] Simon Haykin. Adaptive Filter Theory. 1996.

[8] Elad Hazan. Efficient algorithms for online convex optimization and their applications. PhD thesis, Princeton University, 2006.

[9] Ralf Herbrich, Thore Graepel, and Colin Campbell. Bayes point machines. Journal of Machine Learning Research (JMLR), 1:245–279, 2001.

[10] David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. JMLR, 5:361–397, 2004.

[11] Nick Littlestone. Learning when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. 9