nips nips2012 nips2012-258 nips2012-258-reference knowledge-graph by maker-knowledge-mining

258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

Source: pdf

Author: Shiva P. Kasiviswanathan, Huahua Wang, Arindam Banerjee, Prem Melville

Abstract: Given their pervasive use, social media, such as Twitter, have become a leading source of breaking news. A key task in the automated identiﬁcation of such news is the detection of novel documents from a voluminous stream of text documents in a scalable manner. Motivated by this challenge, we introduce the problem of online 1 -dictionary learning where unlike traditional dictionary learning, which uses squared loss, the 1 -penalty is used for measuring the reconstruction error. We present an efﬁcient online algorithm for this problem based on alternating directions method of multipliers, and establish a sublinear regret bound for this algorithm. Empirical results on news-stream and Twitter data, shows that this online 1 -dictionary learning algorithm for novel document detection gives more than an order of magnitude speedup over the previously known batch algorithm, without any signiﬁcant loss in quality of results. 1

reference text

[1] M. Aharon, M. Elad, and A. Bruckstein. The K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54(11), 2006.

[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 2011.

[3] V. Chenthamarakshan, P. Melville, V. Sindhwani, and R. D. Lawrence. Concept Labeling: Building Text Classiﬁers with Minimal Supervision. In IJCAI, pages 1225–1230, 2011.

[4] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efﬁcient Projections onto the l1-ball for Learning in High Dimensions. In ICML, pages 272–279, 2008.

[5] J. Duchi and Y. Singer. Efﬁcient Online and Batch Learning using Forward Backward Splitting. JMLR, 10:2873–2898, 2009.

[6] J. C. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite Objective Mirror Descent. In COLT, pages 14–26, 2010.

[7] J. Friedman, T. Hastie, H. Hﬂing, and R. Tibshirani. Pathwise Coordinate Optimization. The Annals of Applied Statistics, 1(2):302–332, 2007.

[8] E. Hazan, A. Agarwal, and S. Kale. Logarithmic Regret Algorithms for Online Convex Optimization. Machine Learning, 69(2-3):169–192, 2007.

[9] P. O. Hoyer. Non-Negative Sparse Coding. In IEEE Workshop on Neural Networks for Signal Processing, pages 557–565, 2002.

[10] S. P. Kasiviswanathan, P. Melville, A. Banerjee, and V. Sindhwani. Emerging Topic Detection using Dictionary Learning. In CIKM, pages 745–754, 2011.

[11] S. P. Kasiviswanathan, H. Wang, A. Banerjee, and P. Melville. Online 1 -Dictionary Learning with Application to Novel Document Detection. http://www.cse.psu.edu/˜kasivisw/ fullonlinedict.pdf.

[12] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Matrix Factorization and Sparse Coding. JMLR, 11:19–60, 2010.

[13] C. Manning, P. Raghavan, and H. Sch¨ tze. Introduction to Information Retrieval. Cambridge University u Press, 2008.

[14] P. Melville, J. Leskovec, and F. Provost, editors. Proceedings of the First Workshop on Social Media Analytics. ACM, 2010.

[15] B. Olshausen and D. Field. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? Vision Research, 37(23):3311–3325, 1997.

[16] S. Petrovi´ , M. Osborne, and V. Lavrenko. Streaming First Story Detection with Application to Twitter. c In HLT ’10, pages 181–189. ACL, 2010.

[17] A. Saha and V. Sindhwani. Learning Evolving and Emerging Topics in Social Media: A Dynamic NMF Approach with Temporal Regularization. In WSDM, pages 693–702, 2012.

[18] S. Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 4(2), 2012.

[19] H. Wang and A. Banerjee. Online Alternating Direction Method. In ICML, 2012.

[20] J. Wright and Y. Ma. Dense Error Correction Via L1-Minimization. IEEE Transactions on Information Theory, 56(7):3540–3560, 2010.

[21] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelliegence, 31(2):210–227, Feb. 2009.

[22] L. Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. JMLR, 11:2543–2596, 2010.

[23] A. Y. Yang, S. S. Sastry, A. Ganesh, and Y. Ma. Fast L1-minimization Algorithms and an Application in Robust Face Recognition: A Review. In International Conference on Image Processing, pages 1849– 1852, 2010.

[24] J. Yang and Y. Zhang. Alternating Direction Algorithms for L1-Problems in Compressive Sensing. SIAM Journal of Scientiﬁc Computing, 33(1):250–278, 2011.

[25] M. Zinkevich. Online Convex Programming and Generalized Inﬁnitesimal Gradient Ascent. In ICML, pages 928–936, 2003. 9