nips nips2007 nips2007-9 nips2007-9-reference knowledge-graph by maker-knowledge-mining

9 nips-2007-A Probabilistic Approach to Language Change


Source: pdf

Author: Alexandre Bouchard-côté, Percy Liang, Dan Klein, Thomas L. Griffiths

Abstract: We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. This framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes by reconstructing ancient word forms of Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies. 1


reference text

[1] W. Sidney Allen. Vox Latina: The Pronunciation of Classical Latin. Cambridge University Press, 1989.

[2] W.A. Baehrens. Sprachlicher Kommentar zur vulg¨ rlateinischen Appendix Probi. Halle a (Saale) M. Niemeyer, 1922.

[3] A. Bouchard-Cˆ t´ , P. Liang, T. Griffiths, and D. Klein. A Probabilistic Approach to Diachronic oe Phonology. In Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL), 2007.

[4] L. Campbell. Historical Linguistics. The MIT Press, 1998.

[5] I. Dyen, J.B. Kruskal, and P. Black. FILE IE-DATA1. Available at http://www.ntu.edu.au/education/langs/ielex/IE-DATA1, 1997.

[6] S. N. Evans, D. Ringe, and T. Warnow. Inference of divergence times as a statistical inverse problem. In P. Forster and C. Renfrew, editors, Phylogenetic Methods and the Prehistory of Languages. McDonald Institute Monographs, 2004.

[7] J. Felsenstein. Inferring Phylogenies. Sinauer Associates, 2003.

[8] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984.

[9] S. Goldwater and M. Johnson. Learning ot constraint rankings using a maximum entropy model. Proceedings of the Workshop on Variation within Optimality Theory, 2003.

[10] R. D. Gray and Q. Atkinson. Language-tree divergence times support the Anatolian theory of Indo-European origins. Nature, 2003.

[11] J. P. Huelsenbeck, F. Ronquist, R. Nielsen, and J. P. Bollback. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 2001.

[12] G. Kondrak. Algorithms for Language Reconstruction. PhD thesis, University of Toronto, 2002.

[13] L. Nakhleh, D. Ringe, and T. Warnow. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural languages. Language, 81:382–420, 2005.

[14] D. Ringe, T. Warnow, and A. Taylor. Indo-european and computational cladistics. Transactions of the Philological Society, 100:59–129, 2002.

[15] M. Swadesh. Towards greater accuracy in lexicostatistic dating. Journal of American Linguistics, 21:121–137, 1955.

[16] A. Venkataraman, J. Newman, and J.D. Patrick. A complexity measure for diachronic chinese phonology. In J. Coleman, editor, Computational Phonology. Association for Computational Linguistics, 1997.

[17] C. Wilson and B. Hayes. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry, 2007. 8