jmlr jmlr2006 jmlr2006-73 jmlr2006-73-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniil Ryabko
Abstract: In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification), aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs of objects and labels). We consider the case of deterministically defined labels. Traditionally, this task is studied under the assumption that examples are independent and identically distributed. However, it turns out that many results of pattern recognition theory carry over a weaker assumption. Namely, under the assumption of conditional independence and identical distribution of objects, while the only assumption on the distribution of labels is that the rate of occurrence of each label should be above some positive threshold. We find a broad class of learning algorithms for which estimations of the probability of the classification error achieved under the classical i.i.d. assumption can be generalized to the similar estimates for case of conditionally i.i.d. examples.
D. Aldous and U. Vazirani, A Markovian Extension of Valiant’s Learning Model. In Proceedings of the 31st Symposium on Foundations of Computer Science, pp. 392–396, 1990. P. Algoet, Universal Schemes for Learning the Best Nonlinear Predictor Given the Infinite Past and Side Information. IEEE Transactions on Information Theory, Vol. 45, No. 4, 1999. 662 PATTERN R ECOGNITION FOR C ONDITIONALLY I NDEPENDENT DATA P. Bartlett, S. Ben-David, S. Kulkarni, Learning Changing Concepts by Exploiting the Structure of Change. In Proceedings of the Workshop on Computational Learning Theory, pp. 131–139, Morgan Kaufmann Publishers, 1996. E. Baum and D. Haussler, What Size Net Gives Valid Generalisation? Neural Computation, 1:151160, 1989. A. Blumer, A. Ehrenfeucht, D. Haussler M and Warmuth Learnability and the Vapnik-Chervonenkis Dimension. Journal of the ACM, 36, pp. 929–965, 1989. O. Bousquet, A. Elisseeff. Stability and Generalization. Journal of Machine Learning Research, 2: 499-526, 2002. A.P. Dawid Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society, Series B (Methodological), Vol. 41 No 1, pp. 1–31, 1979. L. Devroye, On Asymptotic Probability of Error in Nonparametric Discrimination. Annals of Statistics, Vol. 9, No. 6, pp. 1320–1327, 1981. L. Devroye, L. Gy¨ rfi, A. Krzyz˙ k, G. Lugosi, On the Strong Universal Consistency of Nearest o a Neighbor Regression Function Estimates. Annals of Statistics, Vol. 22, pp. 1371–1385, 1994. L. Devroye, L. Gy¨ rfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition. New York: o Springer, 1996. R. Duda, P. Hart, D. Stork. Pattern Classification, Second edition, Wiley-Interscience, 2001. L. Gy¨ rfi, G. Lugosi, G. Morvai, A Simple Randomized Algorithm for Sequential Prediction of o Ergodic Time Series. IEEE Transactions on Information Theory, Vol. 45, pp. 2642–2650, 1999. D. Helmbold and P. Long, Tracking Drifting Concepts by Minimizing Disagreements. Proceedings of the Fourth Annual Workshop on Computational Learning Theory, Santa Cruz, USA, pp. 13–23, 1991. D. Gamarnik, Extension of the PAC Framework to Finite and Countable Markov Chains. IEEE Transactions on Information Theory, 49(1):338-345, 2003. M. Kearns and D. Ron, Algorithmic Stability and Sanity-Check Bounds on Leave-One-Out CrossValidation. Neural Computation, Vol. 11, No. 6, pp. 1427–1453, 1999. M. Kearns M. and U. Vazirani, An Introduction to Computational Learning Theory. The MIT Press, Cambridge, Massachusetts, 1994. S. Kulkarni, S. Posner. Rates of Convergence of Nearest Neighbour Estimation Under Arbitrary Sampling. IEEE Transactions on Information Theory, Vol. 41, No. 10, pp. 1028–1039, 1995. S. Kulkarni, S. Posner, S. Sandilya. Data-Dependent kn -NN and Kernel Estimators Consistent for Arbitrary Processess. IEEE Transactions on Information Theory, Vol. 48, No. 10, pp. 2785–2788, 2002. 663 RYABKO G. Lugosi, A. Nobel, Consistency of Data-Driven Histogram Methods for Density Estimation and Classification. Annals of Statistics Vol. 24, No.2, pp. 687–706, 1996. G. Lugosi and K. Zeger, Nonparametric Estimation via Empirical Risk Minimization. IEEE Transactions on Information Theory, Vol. 41 No. 3 pp. 677–687, 1995. G. Morvai, S. Kulkarni, and A.B. Nobel, Regression Estimation from an Individual Stable Sequence. Statistics, vol. 33, pp.99–118, 1999. G. Morvai, S. Yakowitz, P. Algoet, Weakly Convergent Nonparametric Forecasting of Stationary Time Series. IEEE Transactions on Information Theory, Vol. 43, No. 2, 1997. A.B. Nobel, Limits to Classification and Regression Estimation from Ergodic Process. Annals of Statistics, vol. 27, pp. 262–273, 1999. W. Rogers and T. Wagner. A finite Sample Distribution-Free Performance Bound for Local Discrimination Rules. Annals of Statistics, Vol. 6 No. 3 pp. 506–514, 1978. B. Ryabko, Prediction of random sequences and universal coding. Problems of Information Transmission, Vol. 24, pp. 87–96, 1988. D. Ryabko, Online Learning of Conditionally I.I.D. Data. In: C. E. Brodley (Ed.), Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, pp. 727–734, 2004. D. Ryabko, Application of Classical Nonparametric Predictors to Learning Conditionally I.I.D. Data. In: S. Ben-David, J. Case, A. Maruoka (Eds.), Proceedings of 15th International Conference on Algorithmic Learning Theory, Padova, Italy, pp. 171–180, 2004. L. Valiant, A Theory of the Learnable. Communications of the ACM, 27, pp. 1134–1142. 1984. V. Vapnik, Statistical Learning Theory, New York etc.: John Wiley , Sons, Inc. 1998. V. Vapnik, and A. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974 (in Russian). 664