jmlr jmlr2006 jmlr2006-48 jmlr2006-48-reference knowledge-graph by maker-knowledge-mining

48 jmlr-2006-Learning Minimum Volume Sets

Source: pdf

Author: Clayton D. Scott, Robert D. Nowak

Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P-measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency, an oracle inequality, and rates of convergence. The proposed estimators are illustrated with histogram and decision tree set estimation rules. Keywords: minimum volume sets, anomaly detection, statistical learning theory, uniform deviation bounds, sample complexity, universal consistency

reference text

A. Baillo, J. A. Cuesta-Albertos, and A. A. Cuevas. Convergence rates in nonparametric estimation of level sets. Stat. Prob. Letters, 53:27–35, 2001. P. Bartlett, S. Boucheron, and G. Lugosi. Model selection and error estimation. Machine Learning, 48:85–113, 2002. 701 S COTT AND N OWAK S. Ben-David and M. Lindenbaum. Learning distributions by their density levels: a paradigm for learning without a teacher. J. Comp. Sys. Sci., 55:171–182, 1997. G. Blanchard, C. Sch¨ fer, and Y. Rozenholc. Oracle bounds and exact algorithm for dyadic classiﬁa cation trees. In J. Shawe-Taylor and Y. Singer, editors, Learning Theory: 17th Annual Conference on Learning Theory, COLT 2004, pages 378–392. Springer-Verlag, Heidelberg, 2004. O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. In O. Bousquet, U.v. Luxburg, and G. Rtsch, editors, Advanced Lectures in Machine Learning, pages 169– 207. Springer, 2004. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classiﬁcation and Regression Trees. Wadsworth, Belmont, CA, 1984. A. Cannon, J. Howse, D. Hush, and C. Scovel. Learning with the Neyman-Pearson and min-max criteria. Technical Report LA-UR 02-2951, Los Alamos National Laboratory, 2002. URL http: //www.c3.lanl.gov/∼kelly/ml/pubs/2002 minmax/paper.pdf. A. Cohen, W. Dahmen, I. Daubechies, and R. A. DeVore. Tree approximation and optimal encoding. Applied and Computational Harmonic Analysis, 11(2):192–226, 2001. T. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, New York, 1991. A. Cuevas and R. Fraiman. A plug-in approach to support estimation. Ann. Stat., 25:2300–2312, 1997. A. Cuevas and A. Rodriguez-Casal. Set estimation: An overview and some recent developments. Recent advances and trends in nonparametric statistics, pages 251–264, 2003. R. A. DeVore. Nonlinear approximation. Acta Numerica, 7:51–150, 1998. L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o New York, 1996. D. Donoho. Wedgelets: Nearly minimax estimation of edges. Ann. Stat., 27:859–897, 1999. R. Durrett. Probability: Theory and Examples. Wadsworth & Brooks/Cole, Paciﬁc Grove, CA, 1991. J. Hartigan. Estimation of a convex density contour in two dimensions. J. Amer. Statist. Assoc., 82 (397):267–270, 1987. X. Huo and J. Lu. A network ﬂow approach in ﬁnding maximum likelihood estimate of high concentration regions. Computational Statistics and Data Analysis, 46(1):33–56, 2004. J. Klemel¨ . Complexity penalized support estimation. J. Multivariate Anal., 88:274–297, 2004. a V. Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory, 47:1902–1914, 2001. J. Langford. Tutorial on practical prediction theory for classiﬁcation. J. Machine Learning Research, 6:273–306, 2005. 702 L EARNING M INIMUM VOLUME S ETS M. Ledoux and M. Talagrand. Probability in Banach spaces. Springer-Verlag, Berlin, 1991. G. Lugosi and K. Zeger. Concept learning using complexity regularization. IEEE Trans. Inform. Theory, 42(1):48–54, 1996. G. Lugosi and K. Zeger. Nonparametric estimation using empirical risk minimization. IEEE Trans. Inform. Theory, 41(3):677–687, 1995. D. M¨ ller and G Sawitzki. Excess mass estimates and tests for multimodality. J. Amer. Statist. u Assoc., 86(415):738–746, 1991. A. Mu˜ oz and J. M. Moguerza. Estimation of high-density regions using one-class neighbor man chines. IEEE Trans. Patt. Anal. Mach. Intell., 28:476–480, 2006. D. Nolan. The excess mass ellipsoid. J. Multivariate Analysis, 39:348–371, 1991. J. Nunez-Garcia, Z. Kutalik, K.-H.Cho, and O. Wolkenhauer. Level sets and minimum volume sets of probability density functions. Approximate Reasoning, 34:25–47, Sept. 2003. W. Polonik. Measuring mass concentrations and estimating density contour cluster–an excess mass approach. Ann. Stat., 23(3):855–881, 1995. W. Polonik. Minimum volume sets and generalized quantile processes. Stochastic Processes and their Applications, 69:1–24, 1997. T. W. Sager. An iterative method for estimating a multivariate mode and isopleth. J. Am. Stat. Asso., 74:329–339, 1979. B. Sch¨ lkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the support of a o high-dimensional distribution. Neural Computation, 13(7):1443–1472, 2001. C. Scott and R. Nowak. Learning minimum volume sets. Technical Report ECE-05-2, UWMadison, 2005a. URL http://www.stat.rice.edu/∼cscott. C. Scott and R. Nowak. A Neyman-Pearson approach to statistical learning. IEEE Trans. Inform. Theory, 51(8):3806–3819, 2005b. C. Scott and R. Nowak. Minimax-optimal classiﬁcation with dyadic decision trees. IEEE Trans. Inform. Theory, pages 1335–1353, April 2006. I. Steinwart, D. Hush, and C. Scovel. A classiﬁcation framework for anomaly detection. J. Machine Learning Research, 6:211–232, 2005. A. B. Tsybakov. On nonparametric estimation of density level sets. Ann. Stat., 25:948–969, 1997. V. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. R. Vert and J.-P. Vert. Consistency and convergence rates of one-class SVM and related algorithms. Technical Report 1414, Universit Paris-Sud, 2005. 703 S COTT AND N OWAK G. Walther. Granulometric smoothing. Ann. Stat., 25:2273–2299, 1997. R. Willett and R. Nowak. Minimax optimal level set estimation. submitted to IEEE Trans. Image Proc., 2006. URL http://www.ee.duke.edu/∼willett/. R. Willett and R. Nowak. Minimax optimal level set estimation. In Proc. SPIE, Wavelets XI, 31 July - 4 August, San Diego, CA, USA, 2005. 704