nips nips2005 nips2005-112 nips2005-112-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Clayton Scott, Robert Nowak
Abstract: Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P , and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P . Other than these samples, no other information is available regarding P , but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
[1] I. Steinwart, D. Hush, and C. Scovel, “A classification framework for anomaly detection,” J. Machine Learning Research, vol. 6, pp. 211–232, 2005.
[2] S. Ben-David and M. Lindenbaum, “Learning distributions by their density levels – a paradigm for learning without a teacher,” Journal of Computer and Systems Sciences, vol. 55, no. 1, pp. 171–182, 1997.
[3] W. Polonik, “Minimum volume sets and generalized quantile processes,” Stochastic Processes and their Applications, vol. 69, pp. 1–24, 1997.
[4] G. Walther, “Granulometric smoothing,” Ann. Stat., vol. 25, pp. 2273–2299, 1997.
[5] B. Sch¨lkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson, “Estimating o the support of a high-dimensional distribution,” Neural Computation, vol. 13, no. 7, pp. 1443–1472, 2001.
[6] C. Scott and R. Nowak, “Learning minimum volume sets,” UW-Madison, Tech. Rep. ECE-05-2, 2005. [Online]. Available: http://www.stat.rice.edu/∼cscott
[7] A. Cannon, J. Howse, D. Hush, and C. Scovel, “Learning with the Neyman-Pearson and min-max criteria,” Los Alamos National Laboratory, Tech. Rep. LA-UR 02-2951, 2002. [Online]. Available: http://www.c3.lanl.gov/∼kelly/ml/pubs/2002 minmax/ paper.pdf
[8] C. Scott and R. Nowak, “A Neyman-Pearson approach to statistical learning,” IEEE Trans. Inform. Theory, 2005, (in press).
[9] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[10] L. Devroye, L. Gy¨rfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition. o New York: Springer, 1996.
[11] V. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer-Verlag, 1982. New York:
[12] G. Lugosi and K. Zeger, “Concept learning using complexity regularization,” IEEE Trans. Inform. Theory, vol. 42, no. 1, pp. 48–54, 1996.