nips nips2005 nips2005-50 nips2005-50-reference knowledge-graph by maker-knowledge-mining

50 nips-2005-Convex Neural Networks


Source: pdf

Author: Yoshua Bengio, Nicolas L. Roux, Pascal Vincent, Olivier Delalleau, Patrice Marcotte

Abstract: Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors. 1


reference text

Chv´ V. (1983). Linear Programming. W.H. Freeman. atal, Delalleau, O., Bengio, Y., and Le Roux, N. (2005). Efficient non-parametric function induction in semi-supervised learning. In Cowell, R. and Ghahramani, Z., editors, Proceedings of AISTATS’2005, pages 96–103. Freund, Y. and Schapire, R. E. (1997). A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science, 55(1):119–139. Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1180. Hettich, R. and Kortanek, K. (1993). Semi-infinite programming: theory, methods, and applications. SIAM Review, 35(3):380–429. Marcotte, P. and Savard, G. (1992). Novel approaches to the discrimination problem. Zeitschrift fr Operations Research (Theory), 36:517–545. Mason, L., Baxter, J., Bartlett, P. L., and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12, pages 512–518. R¨ tsch, G., Demiriz, A., and Bennett, K. P. (2002). Sparse regression ensembles in infinite and finite a hypothesis spaces. Machine Learning. Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323:533–536.