nips nips2010 nips2010-235 nips2010-235-reference knowledge-graph by maker-knowledge-mining

235 nips-2010-Self-Paced Learning for Latent Variable Models

Source: pdf

Author: M. P. Kumar, Benjamin Packer, Daphne Koller

Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif ﬁnding and handwritten digit recognition. 1

reference text

[1] E. Allgower and K. Georg. Numerical continuation methods: An introduction. SpringerVerlag, 1990.

[2] M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear Programming - Theory and Algorithms. John Wiley and Sons, Inc., 1993.

[3] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.

[4] M. Berger, G. Badis, A. Gehrke, and S. Talukder et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell, 27, 2008.

[5] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 98.

[6] D. Cohn, Z. Ghahramani, and M. Jordan. Active learning with statistical models. JAIR, 4:129– 145, 1996.

[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.

[8] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39(1):1–38, 1977.

[9] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.

[10] T. Finley and T. Joachims. Supervised clustering with support vector machines. In ICML, 2005.

[11] C. Floudas and V. Visweswaran. Primal-relaxed dual global optimization approach. Journal of Optimization Theory and Applications, 78(2):187–225, 1993.

[12] A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman and Hall, 1995.

[13] G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based object localization for descriptive classiﬁcation. IJCV, 2009.

[14] T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training for structural SVMs. Machine Learning, 77(1):27–59, 2009.

[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[16] V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In ACL, 2002.

[17] K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In CIKM, 2000.

[18] P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent Prop - a formalism for specifying selected invariances in adaptive network. In NIPS, 1991.

[19] B. Sriperumbudur and G. Lanckriet. On the convergence of concave-convex procedure. In NIPS Workshop on Optimization for Machine Learning, 2009.

[20] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.

[21] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. JMLR, 2:45–66, 2001.

[22] I. Tsochantaridis, T. Hofmann, Y. Altun, and T. Joachims. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.

[23] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.

[24] A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2003.

[25] K. Zhang, I. Tsang, and J. Kwok. Maximum margin clustering made practical. In ICML, 2007. 9