nips nips2008 nips2008-62 nips2008-62-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
[1] J. A. Tropp, “Algorithms for simultaneous sparse approximation: part ii: Convex relaxation,” Signal Process., vol. 86, no. 3, pp. 589–602, 2006.
[2] B. Olshausen and D. Field, “Sparse coding with an overcomplete basis set: A strategy employed by v1?” Vision Research, 1997.
[3] Y. Karklin and M. S. Lewicki, “A hierarchical bayesian model for learning non-linear statistical regularities in non-stationary natural signals,” Neural Computation, vol. 17, no. 2, pp. 397–423, 2005.
[4] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: Transfer learning from unlabeled data,” in ICML ’07: Proceedings of the 24th international conference on Machine learning, 2007.
[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Advances in Neural Information Processing Systems 19, B. Sch¨ lkopf, J. Platt, and T. Hoffman, Eds. o Cambridge, MA: MIT Press, 2007, pp. 153–160.
[6] E. Rietsch, “The maximum entropy approach to inverse problems,” Journal of Geophysics, vol. 42, pp. 489–506, 1977.
[7] G. Besnerais, J. Bercher, and G. Demoment, “A new look at entropy for solving linear inverse problems,” IEEE Trans. on Information Theory, vol. 45, no. 5, pp. 1565–1578, July 1999.
[8] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with bregman divergences,” Journal of Machine Learning Research, vol. 6, pp. 1705–1749, 2005.
[9] M. Brand, “Pattern discovery via entropy minimization,” in AISTATS 99, 1999.
[10] M. Shashanka, B. Raj, and P. Smaragdis, “Sparse overcomplete latent variable decomposition of counts data,” in NIPS, 2007.
[11] J. Kivinen and M. Warmuth, “Exponentiated gradient versus gradient descent for linear predictors,” Information and Computation, pp. 1–63, 1997.
[12] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge University Press, 2006.
[13] R. Rifkin and R. Lippert, “Value regularization and fenchel duality,” The Journal of Machine Learning Research, vol. 8, pp. 441–479, 2007.
[14] D. Widder, Advanced Calculus, 2nd ed. Dover Publications, 1989.
[15] R. Duda, P. Hart, and D. Stork, Pattern classification. Wiley New York, 2001.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[17] K. Nigam, J. Lafferty, and A. McCallum, “Using maximum entropy for text classification,” 1999. [Online]. Available: citeseer.ist.psu.edu/article/nigam99using.html
[18] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE Computer Society, 2003, p. 958.
[19] D. M. Blei and J. D. McAuliffe, “Supervised topic models,” in NIPS 19, 2007.
[20] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in Proceedings of the ACL, 2005, pp. 115–124.
[21] R. Salakhutdinov and G. Hinton, “Semantic hashing,” in SIGIR workshop on Information Retrieval and applications of Graphical Models, 2007. 8