nips nips2009 nips2009-112 nips2009-112-reference knowledge-graph by maker-knowledge-mining

112 nips-2009-Human Rademacher Complexity

Source: pdf

Author: Xiaojin Zhu, Bryan R. Gibson, Timothy T. Rogers

Abstract: We propose to use Rademacher complexity, originally developed in computational learning theory, as a measure of human learning capacity. Rademacher complexity measures a learner’s ability to ﬁt random labels, and can be used to bound the learner’s true error based on the observed training sample error. We ﬁrst review the deﬁnition of Rademacher complexity and its generalization bound. We then describe a “learning the noise” procedure to experimentally measure human Rademacher complexities. The results from empirical studies showed that: (i) human Rademacher complexity can be successfully measured, (ii) the complexity depends on the domain and training sample size in intuitive ways, (iii) human learning respects the generalization bounds, (iv) the bounds can be useful in predicting the danger of overﬁtting in human learning. Finally, we discuss the potential applications of human Rademacher complexity in cognitive science. 1

reference text

[1] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002.

[2] A. Caramazza and M. McCloskey. The case for single-patient studies. Cognitive Neuropsychology, 5(5):517–527, 1988.

[3] R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu. Human active learning. In Advances in Neural Information Processing Systems (NIPS) 22. 2008. 8

[4] L. J. Chapman. Illusory correlation in observational report. Journal of Verbal Learning and Verbal Behavior, 6:151–155, 1967.

[5] N. Chater and P. Vitanyi. Simplicity: A unifying principle in cognitive science? Trends in Cognitive Science, 7(1):19–22, 2003.

[6] N. Cowan. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24:87–185, 2000.

[7] J. Feldman. Minimization of boolean complexity in human concept learning. Nature, 407:630–633, 2000.

[8] I. Gauthier and M. Tarr. Becoming a ”greeble” expert: Exploring mechanisms for face recognition. Vision Research, 37(12):1673–1682, 1998.

[9] N. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Grifﬁths. A rational analysis of rule-based concept learning. Cognitive Science, 32(1):108–133, 2008.

[10] T. L. Grifﬁths, B. R. Christian, and M. L. Kalish. Using category structures to test iterated learning as a method for identifying inductive biases. Cognitive Science, 32:68–107, 2008.

[11] T. L. Grifﬁths and J. B. Tenenbaum. From mere coincidences to meaningful discoveries. Cognition, 103(2):180–226, 2007.

[12] M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.

[13] V. I. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. In E. Gine, D. Mason, and J. Wellner, editors, High Dimensional Probability II, pages 443–459. MIT Press, 2000.

[14] J. Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6:273–306, 2005.

[15] R. L. Lewis. Interference in short-term memory: The magical number two (or three) in sentence processing. Journal of Psycholinguistic Research, 25(1):93–115, 1996.

[16] C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148– 188. Cambridge University Press, 1989.

[17] S. C. McKinley and R. M. Nosofsky. Selective attention and the formation of linear decision boundaries. Journal of Experimental Psychology: Human Perception & Performance, 22(2):294–317, 1996.

[18] D. Medler, A. Arnoldussen, J. Binder, and M. Seidenberg. The Wisconsin perceptual attribute ratings database, 2005. http://www.neuro.mcw.edu/ratings/.

[19] G. Miller. The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2):81–97, 1956.

[20] R. C. O’Reilly and J. L. McClelland. Hippocampal conjunctive encoding, storage, and recall: Avoiding a tradeoff. Hippocampus, 4:661–682, 1994.

[21] J. Rissanen. Strong optimality of the normalized ML models as universal codes and information in data. IEEE Transaction on Information Theory, 47(5):17121717, 2001.

[22] H. L. Roediger and K. B. McDermott. Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory and Cognition, 21(4):803–814, 1995.

[23] T. Roos. Statistical and Information-Theoretic Methods for Data Analysis. PhD thesis, Department of Computer Science, University of Helsinki, 2007.

[24] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.

[25] V. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.

[26] V. Vapnik, E. Levin, and Y. Le Cun. Measuring the VC-dimension of a learning machine. Neural Computation, 6:851–876, 1994.

[27] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.

[28] W. D. Wattenmaker. Knowledge structures and linear separability: Integrating information in object and social categorization. Cognitive Psychology, 28(3):274–328, 1995.

[29] W. D. Wattenmaker, G. I. Dewey, T. D. Murphy, and D. L. Medin. Linear separability and concept learning: Context, relational properties, and concept naturalness. Cognitive Psychology, 18(2):158–194, 1986. 9