nips nips2003 nips2003-170 nips2003-170-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vladimir Vovk, Glenn Shafer, Ilia Nouretdinov
Abstract: In the problem of probability forecasting the learner’s goal is to output, given a training set and a new object, a suitable probability measure on the possible values of the new object’s label. An on-line algorithm for probability forecasting is said to be well-calibrated if the probabilities it outputs agree with the observed frequencies. We give a natural nonasymptotic formalization of the notion of well-calibratedness, which we then study under the assumption of randomness (the object/label pairs are independent and identically distributed). It turns out that, although no probability forecasting algorithm is automatically well-calibrated in our sense, there exists a wide class of algorithms for “multiprobability forecasting” (such algorithms are allowed to output a set, ideally very narrow, of probability measures) which satisfy this property; we call the algorithms in this class “Venn probability machines”. Our experimental results demonstrate that a 1-Nearest Neighbor Venn probability machine performs reasonably well on a standard benchmark data set, and one of our theoretical results asserts that a simple Venn probability machine asymptotically approaches the true conditional probabilities regardless, and without knowledge, of the true probability measure generating the examples.
[1] A. Philip Dawid. Probability forecasting. In Samuel Kotz, Norman L. Johnson, and Campbell B. Read, editors, Encyclopedia of Statistical Sciences, volume 7, pages 210–218. Wiley, New York, 1986.
[2] Vladimir Vovk. On-line Confidence Machines are well-calibrated. In Proceedings of the Forty Third Annual Symposium on Foundations of Computer Science, pages 187–196, Los Alamitos, CA, 2002. IEEE Computer Society.
[3] Vladimir Vovk, Ilia Nouretdinov, and Alex Gammerman. Testing exchangeability on-line. In Tom Fawcett and Nina Mishra, editors, Proceedings of the Twentieth International Conference on Machine Learning, pages 768–775, Menlo Park, CA, 2003. AAAI Press.
[4] Vladimir Vovk. Universal well-calibrated algorithm for on-line classification. In Bernhard Sch¨ lkopf and Manfred K. Warmuth, editors, Learning Theory and Kero nel Machines: Sixteenth Annual Conference on Learning Theory and Seventh Kernel Workshop, volume 2777 of Lecture Notes in Artificial Intelligence, pages 358–372, Berlin, 2003. Springer.
[5] James O. Berger and Mohan Delampady. Testing precise hypotheses (with discussion). Statistical Science, 2:317–352, 1987.
[6] Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, New York, to appear.
[7] Glenn Shafer and Vladimir Vovk. Probability and Finance: It’s Only a Game! Wiley, New York, 2001.
[8] Craig Saunders, Alex Gammerman, and Vladimir Vovk. Transduction with confidence and credibility. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 722–726, 1999.
[9] Vladimir Vovk, Alex Gammerman, and Craig Saunders. Machine-learning applications of algorithmic randomness. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 444–453, San Francisco, CA, 1999. Morgan Kaufmann.
[10] Berna E. Kılınc. The reception of John Venn’s philosophy of probability. In Vincent F. ¸ Hendricks, Stig Andur Pedersen, and Klaus Frovin Jørgensen, editors, Probability Theory: Philosophy, Recent History and Relations to Science, pages 97–121. Kluwer, Dordrecht, 2001.
[11] Luc Devroye, L´ szl´ Gy¨ rfi, and G´ bor Lugosi. A Probabilistic Theory of Pattern a o o a Recognition. Springer, New York, 1996.
[12] Nick Littlestone and Manfred K. Warmuth. Relating data compression and learnability. Technical report, University of California, Santa Cruz, 1986.
[13] John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, and Martin Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:1926–1940, 1998.
[14] David A. McAllester. Some PAC-Bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 230–234, New York, 1998. Association for Computing Machinery.