nips nips2010 nips2010-188 nips2010-188-reference knowledge-graph by maker-knowledge-mining

188 nips-2010-On Herding and the Perceptron Cycling Theorem

Source: pdf

Author: Andrew Gelfand, Yutian Chen, Laurens Maaten, Max Welling

Abstract: The paper develops a connection between traditional perceptron algorithms and recently introduced herding algorithms. It is shown that both algorithms can be viewed as an application of the perceptron cycling theorem. This connection strengthens some herding results and suggests new (supervised) herding algorithms that, like CRFs or discriminative RBMs, make predictions by conditioning on the input attributes. We develop and investigate variants of conditional herding, and show that conditional herding leads to practical algorithms that perform better than or on par with related classiﬁers such as the voted perceptron and the discriminative RBM. 1

reference text

[1] C.M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. 8

[2] H.D. Block and S.A. Levin. On the boundedness of an iterative procedure for solving a system of linear inequalities. Proceedings of the American Mathematical Society, 26(2):229–235, 1970.

[3] Y. Chen and M. Welling. Parametric herding. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, 2010.

[4] M. Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, page 8. Association for Computational Linguistics, 2002.

[5] Y. Freund and R.E. Schapire. Large margin classiﬁcation using the perceptron algorithm. Machine learning, 37(3):277–296, 1999.

[6] A. Goetz. Perturbations of 8-attractors and births of satellite systems. Internat. J. Bifur. Chaos, Appl. Sci. Engrg., 8(10):1937–1956, 1998.

[7] A. Goetz. Global properties of a family of piecewise isometries. Ergodic Theory Dynam. Systems, 29(2):545–568, 2009.

[8] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.

[9] E.T. Jaynes. Information theory and statistical mechanics. 106(4):620–663, 1957. Physical Review Series II,

[10] H. Larochelle and Y. Bengio. Classiﬁcation using discriminative Restricted Boltzmann Machines. In Proceedings of the 25th International Conference on Machine learning, pages 536– 543. ACM, 2008.

[11] M.L. Minsky and S. Papert. Perceptrons; An introduction to computational geometry. Cambridge, Mass.,: MIT Press, 1969.

[12] F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386–408, 1958.

[13] T. Tieleman. Training Restricted Boltzmann Machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine learning, volume 25, pages 1064–1071, 2008.

[14] M. Welling. Herding dynamic weights for partially observed random ﬁeld models. In Proc. of the Conf. on Uncertainty in Artiﬁcial Intelligence, Montreal, Quebec, CAN, 2009.

[15] M. Welling. Herding dynamical weights to learn. In Proceedings of the 21st International Conference on Machine Learning, Montreal, Quebec, CAN, 2009.

[16] M. Welling and Y. Chen. Statistical inference using weak chaos and inﬁnite memory. In Proceedings of the Int’l Workshop on Statistical-Mechanical Informatics (IW-SMI 2010), pages 185–199, 2010.

[17] L. Younes. Parametric inference for imperfectly observed Gibbsian ﬁelds. Probability Theory and Related Fields, 82:625–645, 1989. 9