jmlr jmlr2010 jmlr2010-69 jmlr2010-69-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fabian Sinz, Matthias Bethge
Abstract: In this paper, we introduce a new family of probability densities called L p -nested symmetric distributions. The common property, shared by all members of the new class, is the same functional form ˜ x x ρ(x ) = ρ( f (x )), where f is a nested cascade of L p -norms x p = (∑ |xi | p )1/p . L p -nested symmetric distributions thereby are a special case of ν-spherical distributions for which f is only required to be positively homogeneous of degree one. While both, ν-spherical and L p -nested symmetric distributions, contain many widely used families of probability models such as the Gaussian, spherically and elliptically symmetric distributions, L p -spherically symmetric distributions, and certain types of independent component analysis (ICA) and independent subspace analysis (ISA) models, ν-spherical distributions are usually computationally intractable. Here we demonstrate that L p nested symmetric distributions are still computationally feasible by deriving an analytic expression for its normalization constant, gradients for maximum likelihood estimation, analytic expressions for certain types of marginals, as well as an exact and efficient sampling algorithm. We discuss the tight links of L p -nested symmetric distributions to well known machine learning methods such as ICA, ISA and mixed norm regularizers, and introduce the nested radial factorization algorithm (NRF), which is a form of non-linear ICA that transforms any linearly mixed, non-factorial L p nested symmetric source into statistically independent signals. As a corollary, we also introduce the uniform distribution on the L p -nested unit sphere. Keywords: parametric density model, symmetric distribution, ν-spherical distributions, non-linear independent component analysis, independent subspace analysis, robust Bayesian inference, mixed norm density model, uniform distributions on mixed norm spheres, nested radial factorization
P-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton Univ Pr, Dec 2007. ISBN 0691132984. M. Bethge. Factorial coding of natural images: How effective are linear model in removing higherorder dependencies? J. Opt. Soc. Am. A, 23(6):1253–1268, June 2006. A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl., 20(2):303–353, 1999. ISSN 0895-4798. J. Eichhorn, F. Sinz, and M. Bethge. Natural image coding in v1: How much use is orientation selectivity? PLoS Comput Biol, 5(4), Apr 2009. 3449 S INZ AND B ETHGE K. T. Fang, S. Kotz, and K. W. Ng. Symmetric Multivariate and Related Distributions. Chapman and Hall New York, 1990. C. Fernandez, J. Osiewalski, and M.F.J. Steel. Modeling and inference with ν-spherical distributions. Journal of the American Statistical Association, 90(432):1331–1340, Dec 1995. URL http://www.jstor.org/stable/2291523. A.K. Gupta and D. Song. L p -norm spherical distribution. Journal of Statistical Planning and Inference, 60:241–260, 1997. A.E. Hoerl. Application of ridge analysis to regression problems. Chemical Engineering Progress, 58(3):54–59, 1962. A. Hyv¨ rinen and P. Hoyer. Emergence of phase and shift invariant features by decomposition of a natural images into independent feature subspaces. Neural Comput., 12(7):1705–1720, 2000. A. Hyv¨ rinen and U. K¨ ster. Fastisa: A fast fixed-point algorithm for independent subspace analya o sis. In Proc. of ESANN, pages 371–376, 2006. A. Hyv¨ rinen and U. K¨ ster. Complex cell pooling and the statistics of natural images. Network: a o Computation in Neural Systems, 18(2):81–100, 2007. A. Hyv¨ rinen and Erkki O. A fast fixed-point algorithm for independent component analysis. Neural a Computation, 9(7):1483–1492, Oct 1997. doi: 10.1162/neco.1997.9.7.1483. D. Kelker. Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya: The Indian Journal of Statistics, Series A, 32(4):419–430, Dec 1970. doi: 10.2307/25049690. URL http://www.jstor.org/stable/25049690. M. Kowalski, E. Vincent, and R. Gribonval. Under-determined source separation via mixed-norm regularized minimization. In Proceedings of the European Signal Processing Conference, 2008. TW. Lee and M. Lewicki. The generalized gaussian mixture model using ica. In P. Pajunen and J. Karhunen, editors, ICA’ 00, pages 239–244, Helsinki, Finland, june 2000. M. S. Lewicki. Efficient coding of natural sounds. Nat Neurosci, 5(4):356–363, Apr 2002. doi: 10.1038/nn831. M.S. Lewicki and B.A. Olshausen. Probabilistic framework for the adaptation and comparison of image codes. J. Opt. Soc. Am. A, 16:1587–1601, 1999. S. Lyu and E. P. Simoncelli. Nonlinear extraction of independent components of natural images using radial gaussianization. Neural Computation, 21(6):1485–1519, Jun 2009. doi: 10.1162/ neco.2009.04-08-773. J. H. Manton. Optimization algorithms exploiting unitary constraints. IEEE Transactions on Signal Processing, 50:635 – 650, 2002. B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:560–561, 1996. 3450 L p -N ESTED S YMMETRIC D ISTRIBUTIONS J. Osiewalski and M. F. J. Steel. Robust bayesian inference in lq -spherical models. Biometrika, 80 (2):456–460, Jun 1993. URL http://www.jstor.org/stable/2337215. M. W. Seeger. Bayesian inference and optimal design for the sparse linear model. Journal of Machine Learning Research, 9:759–813, 04 2008. URL http://www.jmlr.org/papers/ volume9/seeger08a/seeger08a.pdf. E.P. Simoncelli. Statistical models for images: compression, restoration and synthesis. In Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, 1997., volume 1, pages 673–678 vol.1, 1997. doi: 10.1109/ACSSC.1997.680530. F. Sinz and M. Bethge. The conjoint effect of divisive normalization and orientation selectivity on redundancy reduction. In D. Schuurmans Y. Bengio L. Bottou Koller, D., editor, Twenty-Second Annual Conference on Neural Information Processing Systems, pages 1521–1528, Red Hook, NY, USA, 06 2009. Curran. URL http://nips.cc/Conferences/2008/. F. Sinz, S. Gerwinn, and M. Bethge. Characterization of the p-generalized normal distribution. Journal of Multivariate Analysis, 100(5):817–820, May 2009a. doi: 10.1016/j.jmva.2008.07.006. F. Sinz, E. P. Simoncelli, and M. Bethge. Hierarchical modeling of local image features through L p -nested symmetric distributions. In Twenty-Third Annual Conference on Neural Information Processing Systems, pages 1–9, 12 2009b. URL http://nips.cc/Conferences/2009/. D. Song and A.K. Gupta. L p -norm uniform distribution. Proceedings of the American Mathematical Society, 125:595–601, 1997. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. ISSN 00359246. URL http://www. jstor.org/stable/2346178. M.J. Wainwright and E.P. Simoncelli. Scale mixtures of Gaussians and the statistics of natural images. In S.A. Solla, T.K. Leen, and K.-R. M¨ ller, editors, Adv. Neural Information Processing u Systems (NIPS*99), volume 12, pages 855–861, Cambridge, MA, May 2000. MIT Press. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68(1):49–67, 2006. C. Zetzsche, B. Wegmann, and E. Barth. Nonlinear aspects of primary vision: entropy reduction beyond decorrelation. In Int’l Symposium, Soc. for Information Display, volume XXIV, pages 933–936. 1993. L. Zhang, A. Cichocki, and S. Amari. Self-adaptive blind source separation based on activation functions adaptation. Neural Networks, IEEE Transactions on, 15:233–244, 2004. P. Zhao, G. Rocha, and B. Yu. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 2008. 3451