nips nips2011 nips2011-4 nips2011-4-reference knowledge-graph by maker-knowledge-mining

4 nips-2011-A Convergence Analysis of Log-Linear Training


Source: pdf

Author: Simon Wiesler, Hermann Ney

Abstract: Log-linear models are widely used probability models for statistical pattern recognition. Typically, log-linear models are trained according to a convex criterion. In recent years, the interest in log-linear models has greatly increased. The optimization of log-linear model parameters is costly and therefore an important topic, in particular for large-scale applications. Different optimization algorithms have been evaluated empirically in many papers. In this work, we analyze the optimization problem analytically and show that the training of log-linear models can be highly ill-conditioned. We verify our findings on two handwriting tasks. By making use of our convergence analysis, we obtain good results on a large-scale continuous handwriting recognition task with a simple and generic approach. 1


reference text

[1] Bertolami, R., Bunke, H.: HMM-based Ensamble Methods for Offline Handwritten Text Line Recognition. Pattern Recogn. 41, 3452–3460 (2008)

[2] Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems. pp. 161–168 (2008)

[3] Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press (2004)

[4] Darroch, J., Ratcliff, D.: Generalized Iterative Scaling for Log-Linear Models. Ann. Math. Stat. 43(5), 1470–1480 (1972)

[5] Dreuw, P., Heigold, G., Ney, H.: Confidence- and Margin-Based MMI/MPE Discriminative Training for Off-Line Handwriting Recognition. Int. J. Doc. Anal. Recogn. pp. 1–16 (2011)

[6] Espa˜ a-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martinez, F.: Improving Ofn fline Handwritten Text Recognition with Hybrid HMM/ANN Models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767 –779 (april 2011)

[7] Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (May 2009)

[8] Horn, R., Johnson, C.: Topics in Matrix Analysis. Cambridge University Press (1994)

[9] Horn, R., Johnson, C.: Matrix Analysis. Cambridge University Press (2005)

[10] Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. pp. 282–289 (2001)

[11] LeCun, Y., Kanter, I., Solla, S.: Second order properties of error surfaces: Learning time and generalization. In: Advances in Neural Information Processing Systems. pp. 918–924. Morgan Kaufmann Publishers Inc. (1990)

[12] Liu, D., Nocedal, J.: On the Limited Memory BFGS Method for Large-Scale Optimization. Math. Program. 45(1), 503–528 (1989)

[13] Luenberger, D., Ye, Y.: Linear and Nonlinear Programming. Springer Verlag (2008)

[14] Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning. pp. 49–55 (2002)

[15] Marti, U., Bunke, H.: The IAM-Database: An English Sentence Database for Offline Handwriting Recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)

[16] McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning. pp. 591–598 (2000)

[17] Minka, T.: Algorithms for maximum-likelihood logistic regression. Tech. rep., Carnegie Mellon University (2001)

[18] Nocedal, J., Wright, S.: Numerical Optimization. Springer (1999)

[19] Notay, Y.: Solving positive (semi)definite linear systems by preconditioned iterative methods. In: Preconditioned Conjugate Gradient Methods, Lecture Notes in Mathematics, vol. 1457, pp. 105–125. Springer (1990)

[20] Salakhutdinov, R., Roweis, S., Ghahramani, Z.: On the convergence of bound optimization algorithms. In: Uncertainty in Artificial Intelligence. vol. 19, pp. 509–516 (2003)

[21] Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. pp. 134–141 (2003)

[22] Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007) 9