nips nips2010 nips2010-235 nips2010-235-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: M. P. Kumar, Benjamin Packer, Daphne Koller
Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition. 1
[1] E. Allgower and K. Georg. Numerical continuation methods: An introduction. SpringerVerlag, 1990.
[2] M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear Programming - Theory and Algorithms. John Wiley and Sons, Inc., 1993.
[3] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.
[4] M. Berger, G. Badis, A. Gehrke, and S. Talukder et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell, 27, 2008.
[5] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT, 98.
[6] D. Cohn, Z. Ghahramani, and M. Jordan. Active learning with statistical models. JAIR, 4:129– 145, 1996.
[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[8] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39(1):1–38, 1977.
[9] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.
[10] T. Finley and T. Joachims. Supervised clustering with support vector machines. In ICML, 2005.
[11] C. Floudas and V. Visweswaran. Primal-relaxed dual global optimization approach. Journal of Optimization Theory and Applications, 78(2):187–225, 1993.
[12] A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman and Hall, 1995.
[13] G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based object localization for descriptive classification. IJCV, 2009.
[14] T. Joachims, T. Finley, and C.-N. Yu. Cutting-plane training for structural SVMs. Machine Learning, 77(1):27–59, 2009.
[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[16] V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In ACL, 2002.
[17] K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In CIKM, 2000.
[18] P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent Prop - a formalism for specifying selected invariances in adaptive network. In NIPS, 1991.
[19] B. Sriperumbudur and G. Lanckriet. On the convergence of concave-convex procedure. In NIPS Workshop on Optimization for Machine Learning, 2009.
[20] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS, 2003.
[21] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. JMLR, 2:45–66, 2001.
[22] I. Tsochantaridis, T. Hofmann, Y. Altun, and T. Joachims. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.
[23] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.
[24] A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2003.
[25] K. Zhang, I. Tsang, and J. Kwok. Maximum margin clustering made practical. In ICML, 2007. 9