nips nips2013 nips2013-204 nips2013-204-reference knowledge-graph by maker-knowledge-mining

204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

Source: pdf

Author: Francesca Petralia, Joshua T. Vogelstein, David Dunson

Abstract: Nonparametric estimation of the conditional distribution of a response given highdimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change ﬂexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efﬁciently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features. 1

reference text

[1] I. U. Rahman, I. Drori, V. C. Stodden, and D. L. Donoho. Multiscale representations for manifold- valued data. SIAM J. Multiscale Model, 4:1201–1232, 2005. 8

[2] W.K. Allard, G. Chen, and M. Maggioni. Multiscale geometric methods for data sets II: geometric wavelets. Applied and Computational Harmonic Analysis, 32:435–462, 2012.

[3] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.

[4] W. X. Jiang and M. A. Tanner. Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum likelihood estimation. Annals of Statistics, 27:987–1011, 1999.

[5] J. Q. Fan, Q. W. Yao, and H. Tong. Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika, 83:189–206, 1996.

[6] M. P. Holmes, G. A. Gray, and C. L. Isbell. Fast kernel conditional density estimation: a dual-tree Monte Carlo approach. Computational statistics & data analysis, 54:1707–1718, 2010.

[7] G. Fu, F. Y. Shih, and H. Wang. A kernel-based parametric method for conditional density estimation. Pattern recognition, 44:284–294, 2011.

[8] D. J. Nott, S. L. Tan, M. Villani, and R. Kohn. Regression density estimation with variational methods and stochastic approximation. Journal of Computational and Graphical Statistics, 21:797–820, 2012.

[9] M. N. Tran, D. J. Nott, and R. Kohn. Simultaneous variable selection and component selection for regression density estimation with mixtures of heteroscedastic experts. Electronic Journal of Statistics, 6:1170–1199, 2012.

[10] A. Norets and J. Pelenis. Bayesian modeling of joint and conditional distributions. Journal of Econometrics, 168:332–346, 2012.

[11] J. E. Grifﬁn and M. F. J. Steel. Order-based dependent Dirichlet processes. Journal of the American Statistical Association, 101:179–194, 2006.

[12] D. B. Dunson, N. Pillai, and J. H. Park. Bayesian density regression. Journal of the Royal Statistical Society Series B-Statistical Methodology, 69:163–183, 2007.

[13] Y. Chung and D. B. Dunson. Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association, 104:1646–1660, 2009.

[14] S. T. Tokdar, Y. M. Zhu, and J. K. Ghosh. Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Analysis, 5:319–344, 2010.

[15] I. Mossavat and O. Amft. Sparse bayesian hierarchical mixture of experts. IEEE Statistical Signal Processing Workshop (SSP), 2011.

[16] Isabelle Guyon and Andr´ Elisseeff. An introduction to variable and feature selection. The Journal of e Machine Learning Research, 3:1157–1182, 2003.

[17] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientiﬁc Computing 20, 1:359392, 1999.

[18] G. Chen, M. Iwen, S. Chin, and M. Maggioni. A fast multiscale framework for data in high-dimensions: Measure estimation, anomaly detection, and compressive measurements. In VCIP, 2012 IEEE, 2012.

[19] Ingrid Daubechies. Ten Lectures on Wavelets (CBMS-NSF Regional Conference Series in Applied Mathematics). SIAM: Society for Industrial and Applied Mathematics, 1992.

[20] J. Sethuraman. A constructive denition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.

[21] Didier Chauveau and Jean Diebolt. An automated stopping rule for mcmc convergence assessment. Computational Statistics, 14:419–442, 1998.

[22] R. Arden, R. S. Chavez, R. Grazioplene, and R. E. Jung. Neuroimaging creativity: a psychometric view. Behavioural brain research, 214:143–156, 2010. ˜ ˜ ˜

[23] R.E. Jung, R. Grazioplene, A. Caprihan, R.S. Chavez, and R.J. Haier. White matter integrity, creativity, and psychopathology: Disentangling constructs with diffusion tensor imaging. PloS one, 5(3):e9818, 2010. ˜ ˜ ˜ ˜ ˜

[24] W.R. Gray, J.A. Bogovic, J.T. Vogelstein, B.A. Landman, J˙ L. Prince, and R.J. Vogelstein. Magnetic resonance connectome automated pipeline: an overview. IEEE pulse, 3(2):42–8, March 2010.

[25] Susumu Mori and Jiangyang Zhang. Principles of diffusion tensor imaging and its applications to basic neuroscience research. Neuron, 51(5):527–39, September 2006.

[26] ABIDE. http://fcon 1000.projects.nitrc.org/indi/abide/. ˜ ˜

[27] S. Sikka, J.T. Vogelstein, and M.P. Milham. Towards Automated Analysis of Connectomes: The Conﬁgurable Pipeline for the Analysis of Connectomes (C-PAC). Neuroinformatics, 2012. ˙

[28] Q-H. Zou, C-Z. Zhu, Y. Yang, X-N. Zuo, X-Y. Long, Q-J. Cao, Y-FWang, and Y-F. Zang. An improved approach to detection of amplitude of low-frequency ﬂuctuation (ALFF) for resting-state fMRI: fractional ALFF. Journal of neuroscience methods, 172(1):137–141, July 2008.

[29] J. D. Power, K. A. Barnes, C. J. Stone, and R. A. Olshen. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. Neuroimage, 59:2142–2154, 2012.

[30] Leo Breiman. Statistical Modeling : The Two Cultures. Statistical Science, 16(3):199–231, 2001. 9