nips nips2009 nips2009-66 nips2009-66-reference knowledge-graph by maker-knowledge-mining

66 nips-2009-Differential Use of Implicit Negative Evidence in Generative and Discriminative Language Learning


Source: pdf

Author: Anne Hsu, Thomas L. Griffiths

Abstract: A classic debate in cognitive science revolves around understanding how children learn complex linguistic rules, such as those governing restrictions on verb alternations, without negative evidence. Traditionally, formal learnability arguments have been used to claim that such learning is impossible without the aid of innate language-specific knowledge. However, recently, researchers have shown that statistical models are capable of learning complex rules from only positive evidence. These two kinds of learnability analyses differ in their assumptions about the distribution from which linguistic input is generated. The former analyses assume that learners seek to identify grammatical sentences in a way that is robust to the distribution from which the sentences are generated, analogous to discriminative approaches in machine learning. The latter assume that learners are trying to estimate a generative model, with sentences being sampled from that model. We show that these two learning approaches differ in their use of implicit negative evidence – the absence of a sentence – when learning verb alternations, and demonstrate that human learners can produce results consistent with the predictions of both approaches, depending on how the learning problem is presented. 1


reference text

[1] C. L. Baker. Syntactic theory and the projection problem. Linguistic Inquiry, 10:533–538, 1979.

[2] C. L. Baker and J. J. McCarthy. The logical problem of language acquisition. MIT Press, 1981.

[3] N. Chomsky. Aspects if the theories of syntax. MIT Press, 1965.

[4] S. Pinker. Learnability and Cognition: The acquisition of argument structure. MIT Press, 1989.

[5] M. Bowerman. The ’No Negative Evidence’ Problem: How do children avoid constructing an overly general grammar? In J. Hawkins, editor, Explaining Language Universals, pages 73–101. Blackwell, New York, 1988.

[6] R. Brown and C. Hanlon. Derivational complexity and order of acquisition in child speech. Wiley, 1970.

[7] G. F. Marcus. Negative evidence in language acquisition. Cognition, 46:53–85, 1993.

[8] E. M. Gold. Language identification in the limit. Information and Control, 16:447–474, 1967.

[9] M. A. Nowak, N. L. Komarova, and P. Niyogi. Computational and evolutionary aspects of language. Nature, 417:611–617, 2002.

[10] S. Crain and L. D. Martin. An introduction to linguistic theory and language acquisition. Blackwell, 1999.

[11] D. Angluin. Identifying languages from stochastic examples. Technical Report YALEU/DCS/RR-614, Yale University, Department of Computer Science, 1988.

[12] J. J. Horning. A study of grammatical inference. PhD thesis, Stanford University, 1969.

[13] N. Chater and P. Vitanyi. “Ideal learning” of natural language: Positive results about learning from positive evidence. Journal of Mathematical Psychology, 51:135–163, 2007.

[14] M. Dowman. Addressing the learnability of verb subcategorizations with Bayesian inference. In Proceedings of the 22nd Annual Conference of the Cognitive Science Society, 2005.

[15] D. Kemp, A. Perfors, and J. Tenenbaum. Learning overhypothesis with hierarchical Bayesian models. Developmental Science, 10:307–321, 2007.

[16] P. Langley and S. Stromsten. Learning context-free grammars with a simplicity bias. In Proceedings of the 11th European Conference on Machine Learning, 2000.

[17] L. Onnis, M. Roberts, and N. Chater. Simplicity: A cure for overgeneralizations in language acquisition? In Proceedings of the 24th Annual Conference of the Cognitive Science Society, pages 720–725, 2002.

[18] A. Perfors, J. Tenenbaum, and T. Regier. Poverty of the stimulus: A rational approach? In Proceedings of the 28th Annual Conference of the Cognitive Science Society, pages 664–668, 2006.

[19] A. Stolcke. Bayesian learning of probabilistic language models. PhD thesis, UC Berkeley, 1994.

[20] E. Wonnacott, E. Newport, and M. Tanenhaus. Acquiring and processing verb argument structure: Distributional learning in a miniature language. Cognitive Psychology, 56:165–209, 2008.

[21] A. Y. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems 17, 2001.

[22] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman Hall, 2003.