jmlr jmlr2008 jmlr2008-59 jmlr2008-59-reference knowledge-graph by maker-knowledge-mining

59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

Source: pdf

Author: Jörg Lücke, Maneesh Sahani

Abstract: We study a generative model in which hidden causes combine competitively to produce observations. Multiple active causes combine to determine the value of an observed variable through a max function, in the place where algorithms such as sparse coding, independent component analysis, or non-negative matrix factorization would use a sum. This max rule can represent a more realistic model of non-linear interaction between basic components in many settings, including acoustic and image data. While exact maximum-likelihood learning of the parameters of this model proves to be intractable, we show that efﬁcient approximations to expectation-maximization (EM) can be found in the case of sparsely active hidden causes. One of these approximations can be formulated as a neural network model with a generalized softmax activation function and Hebbian learning. Thus, we show that learning in recent softmax-like neural networks may be interpreted as approximate maximization of a data likelihood. We use the bars benchmark test to numerically verify our analytical results and to demonstrate the competitiveness of the resulting algorithms. Finally, we show results of learning model parameters to ﬁt acoustic and visual data sets in which max-like component combinations arise naturally. Keywords: component extraction, maximum likelihood, approximate EM, competitive learning, neural networks

reference text

A. J. Bell and T. J. Sejnowski. The “independent components” of natural scenes are edge ﬁlters. Vision Research, 37(23):3327–3338, 1997. N. J. Butko and J. Triesch. Learning sensory representations with intrinsic plasticity. Neurocomputing, 70(7-9):1130–1138, 2007. D. Charles and C. Fyfe. Modelling multiple-cause structure using rectiﬁcation constraints. Network: Computation in Neural Systems, 9:167–182, 1998. D. Charles, C. Fyfe, D. MacDonald, and J. Koetsier. Unsupervised neural networks for the identiﬁcation of minimum overcomplete basis in visual data. Neurocomputing, 47(1-4):119–143, 2002. P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994. P. Dayan and L. F. Abbott. Theoretical Neuroscience. MIT Press, Cambridge, 2001. P. Dayan and R. S. Zemel. Competition and multiple cause models. Neural Computation, 7:565– 579, 1995. 1264 M AXIMAL C AUSES A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39:1–38, 1977. B. S. Everitt. An Introduction to Latent Variable Models. Chapman and Hall, 1984. P. F¨ ldi´ k. Forming sparse representations by local anti-Hebbian learning. Biological Cybernetics, o a 64:165–170, 1990. C. Fyfe. A neural network for PCA and beyond. Neural Processing Letters, 6:33–41, 1997. N. M. Grzywacz and A. L. Yuille. A model for the estimate of local image velocity by cells in the visual cortex. Proceedings of the Royal Society of London B, 239:129–161, 1990. G. F. Harpur and R. W. Prager. Development of low entropy coding in a recurrent network. Network: Computation in Neural Systems, 7:277–284, 1996. G. E. Hinton and Z. Ghahramani. Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society B, 352:1177–1190, 1997. G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal. The ‘wake-sleep’ algorithm for unsupervised neural networks. Science, 268:1158–1161, 1995. S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. Neural Computation, 11:679–714, 1999. P. O. Hoyer. Non-negative sparse coding. In Neural Networks for Signal Processing XII: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pages 557–565. 2002. P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457–1469, 2004. M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999. D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13, pages 556–562, 2001. J. L¨ cke. Hierarchical self-organization of minicolumnar receptive ﬁelds. Neural Networks, 17/8–9: u 1377–1389, 2004. J. L¨ cke. A dynamical model for receptive ﬁeld self-organization in V1 cortical columns. In u Proceedings of the International Conference on Artiﬁcial Neural Networks, LNCS 4669, pages 389–398. Springer, 2007. J. L¨ cke and J. D. Bouecke. Dynamics of cortical columns – self-organization of receptive ﬁelds. In u Proceedings of the International Conference on Artiﬁcial Neural Networks, LNCS 3696, pages 31–37. Springer, 2005. 1265 ¨ L UCKE AND S AHANI J. L¨ cke and M. Sahani. Generalized softmax networks for non-linear component extraction. In u Proceedings of the International Conference on Artiﬁcial Neural Networks, LNCS 4668, pages 657–667. Springer, 2007. J. L¨ cke and C. von der Malsburg. Rapid processing and unsupervised learning in a model of the u cortical macrocolumn. Neural Computation, 16:501–533, 2004. G. McLachlan and D. Peel. Finite Mixture Models. Wiley, 2000. R. Neal and G. Hinton. A view of the EM algorithm that justiﬁes incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998. S. J. Nowlan. Maximum likelihood competitive learning. In Advances in Neural Information Processing Systems 2, pages 574–582, 1990. B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive ﬁeld properties by learning a sparse code for natural images. Nature, 381:607–609, 1996. R. C. O’Reilly. Generalization in interactive networks: The beneﬁts of inhibitory competition and Hebbian learning. Neural Computation, 13:1199–1241, 2001. R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki, editors. Probabilistic Models of the Brain: Perception and Neural Function. Neural Information Processing. The MIT Press, Cambridge, MA, 2002. M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11):1019–1025, 1999. S. T. Roweis. Factorial models and reﬁltering for speech separation and denoising. In Proceedings Eurospeech, volume 7, pages 1009–1012, 2003. M. Sahani. Latent Variable Models for Neural Data Analysis. PhD thesis, California Institute of Technology, Pasadena, California, 1999. URL http://www.gatsby.ucl.ac.uk/∼ maneesh/thesis/. E. Saund. A multiple cause mixture model for unsupervised learning. Neural Computation, 7: 51–71, 1995. M. W. Spratling. Learning image components for object recognition. Journal of Machine Learning Research, 7:793–815, 2006. M. W. Spratling and M. H. Johnson. Preintegration lateral inhibition enhances unsupervised learning. Neural Computation, 14:2157–2179, 2002. Sunsite. Sun-sounds: Phonemes. Data retrieved in 2007 through ibiblio.org from ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes/, 1997. N. Ueda and R. Nakano. Deterministic annealing EM algorithm. Neural Networks, 11(2):271–282, 1998. 1266 M AXIMAL C AUSES J. H. van Hateren and A. van der Schaaf. Independent component ﬁlters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London B, 265: 359–366, 1998. Y. Weiss. Phase transitions and the perceptual organization of video sequences. In Advances in Neural Information Processing Systems 10, pages 850–856, 1998. H. Wersing and E. K¨ rner. Learning optimized features for hierarchical models of invariant object o recognition. Neural Computation, 15(7):1559–1588, 2003. A. L. Yuille and D. Geiger. Winner-take-all networks. In M.A. Arbib, editor, The handbook of brain theory and neural networks, pages 1228–1231. MIT Press, 2003. 1267