nips nips2012 nips2012-95 nips2012-95-reference knowledge-graph by maker-knowledge-mining

95 nips-2012-Density-Difference Estimation


Source: pdf

Author: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.


reference text

[1] V. N. Vapnik. Statistical Learning Theory. Wiley, New York, NY, USA, 1998.

[2] M. Sugiyama, T. Suzuki, S. Nakajima, H. Kashima, P. von B¨ nau, and M. Kawanabe. Diu rect importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4):699–746, 2008.

[3] X. Nguyen, M. J. Wainwright, and M. I. Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.

[4] C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems 23, pages 442–450, 2010.

[5] M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neural Computation, 14(1):21–41, 2002.

[6] Y. Kawahara and M. Sugiyama. Sequential change-point detection based on direct densityratio estimation. Statistical Analysis and Data Mining, 5(2):114–127, 2012.

[7] N. Anderson, P. Hall, and D. Titterington. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. Journal of Multivariate Analysis, 50(1):41–54, 1994.

[8] A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998.

[9] P. Hall and M. P. Wand. On nonparametric discrimination using density differences. Biometrika, 75(3):541–547, 1988.

[10] M. Eberts and I. Steinwart. Optimal learning rates for least squares SVMs using Gaussian kernels. In Advances in Neural Information Processing Systems 24, pages 1539–1547, 2011.

[11] R. H. Farrell. On the best obtainable asymptotic rates of convergence in estimation of a density function at a point. The Annals of Mathematical Statistics, 43(1):170–180, 1972.

[12] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London, UK, 1986.

[13] E. Parzen. On the estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.

[14] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, USA, 1970.

[15] W. H¨ rdle, M. M¨ ller, S. Sperlich, and A. Werwatz. Nonparametric and Semiparametric a u Models. Springer, Berlin, Germany, 2004.

[16] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, o Cambridge, MA, USA, 2006.

[17] R. Rifkin, G. Yeo, and T. Poggio. Regularized least-squares classification. In Advances in Learning Theory: Methods, Models and Applications, pages 131–154. IOS Press, Amsterdam, the Netherlands, 2003.

[18] M. Sugiyama, M. Krauledat, and K.-R. M¨ ller. Covariate shift adaptation by importance u weighted cross validation. Journal of Machine Learning Research, 8:985–1005, May 2007.

[19] Y. Takeuchi and K. Yamanishi. A unifying framework for detecting outliers and change points from non-stationary time series data. IEEE Transactions on Knowledge and Data Engineering, 18(4):482–489, 2006.

[20] Y. Kawahara, T. Yairi, and K. Machida. Change-point detection in time-series data based on subspace identification. In Proceedings of the 7th IEEE International Conference on Data Mining, pages 559–564, 2007.

[21] V. Moskvina and A. A. Zhigljavsky. An algorithm based on singular spectrum analysis for change-point detection. Communication in Statistics: Simulation & Computation, 32(2):319– 352, 2003.

[22] F. Desobry, M. Davy, and C. Doncarli. An online kernel change detection algorithm. IEEE Transactions on Signal Processing, 53(8):2961–2974, 2005.

[23] Z. Harchaoui, F. Bach, and E. Moulines. Kernel change-point analysis. In Advances in Neural Information Processing Systems 21, pages 609–616, 2009.

[24] S. Arlot, A. Celisse, and Z. Harchaoui. Kernel change-point detection. Technical Report 1202.3878, arXiv, 2012. 9