emnlp emnlp2013 emnlp2013-19 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jan A. Botha ; Phil Blunsom
Abstract: This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78. 1.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. [sent-4, score-0.7]
2 The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. [sent-6, score-1.093]
3 Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. [sent-7, score-0.725]
4 We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78. [sent-8, score-0.36]
5 1 Introduction Unsupervised learning of morphology is the task of acquiring, from unannotated data, the intra-word building blocks of a language and the rules by which they combine to form words. [sent-10, score-0.255]
6 This task is of interest both as a gateway for studying language acquisition in humans and as a way of producing morphological analyses that are of practical use in a variety of natural language processing tasks, including machine translation, parsing and information retrieval. [sent-11, score-0.193]
7 A particularly interesting version of the morphology learning problem comes from languages that use templatic morphology, such as Arabic, Hebrew and Amharic. [sent-12, score-0.345]
8 uk root morphemes into templatic structures in a nonconcatenative way. [sent-16, score-0.389]
9 The practical appeal of unsupervised learning of templatic morphology is that it can overcome these shortcomings. [sent-21, score-0.333]
10 Unsupervised learning of concatenative morphology has received extensive attention, partly driven by the MorphoChallenge (Kurimo et al. [sent-22, score-0.299]
11 , 2010) in recent years, but that is not the case for root-templatic morphology (Hammarstr o¨m and Borin, 2011). [sent-23, score-0.19]
12 In this paper we present a model-based method that learns concatenative and root-templatic morphology in a unified framework. [sent-24, score-0.299]
13 We build on two disparate strands of work from the literature: Firstly, we apply simple Range Concatenating Grammars (SRCGs) (Boullier, 2000) to parse contiguous and discontiguous morphemes from an input string. [sent-25, score-0.304]
14 These grammars are mildly-context sensitive (Joshi, 1985), a superset of context-free grammars that retains polynomial parsing time-complexity. [sent-26, score-0.389]
15 Secondly, we generalise the nonparametric Bayesian learning framework of adaptor grammars (Johnson et al. [sent-27, score-0.517]
16 , 1991), all of which are weaker than (non-simple) range concatenating grammars (Boullier, 2000). [sent-31, score-0.227]
17 In addition to unannotated data, our method requires as input a minimal set of high-level grammar rules that encode basic intuitions of the morphology. [sent-38, score-0.199]
18 2 A powerful grammar for morphology Concatenative morphology lends itself well to an analysis in terms of finite-state transducers (FSTs) (Koskenniemi, 1984). [sent-42, score-0.514]
19 With some additional effort, FSTs can also encode non-concatenative morphology (Kiraz, 2000; Beesley and Karttunen, 2003; Cohen-Sygal and Wintner, 2006; Gasser, 2009). [sent-43, score-0.19]
20 We are not aware of successful attempts at inducing FST-based morphological analysers in an unsupervised way, and believe the challenge lies in the fact that FSTs do not offer a convenient way of expressing prior linguistic intuitions to guide the learning process. [sent-45, score-0.24]
21 These shortcomings are overcome for concate- native morphology by context-free adaptor grammars, which allowed diverse segmentation models to be formulated and investigated within a single framework (Johnson et al. [sent-47, score-0.617]
22 In this pursuit, an abstraction that permits discontiguous constituents is a highly useful modelling tool, but requires looking beyond context-free grammars. [sent-51, score-0.195]
23 The bold-faced “functions” combine the potentially discontiguous yields of the argument symbols into single contiguous strings, e. [sent-61, score-0.22]
24 Taken by themselves, the first two rules are simply a CFG that describes word formation as the concatenation of stems and affixes, a formulation that matches the underlying grammar of Morfessor (Creutz and Lagus, 2007), a well-studied unsupervised model. [sent-64, score-0.363]
25 The key aim of our extension is that we want the grammar to capture a discontiguous string like k·t·b as a single oco canpsttuitrueen at d iins a parse tursee s. [sent-65, score-0.363]
26 s 3 Simple range concatenating grammars In this section we define SRCGs formally and illustrate how they can be used to model nonconcatenative morphology. [sent-70, score-0.288]
27 SRCGs define languages that are recognisable in polynomial time, yet can capture discontiguous elements of a string under a single category (Boullier, 2000). [sent-71, score-0.241]
28 4i) Given an appropriate set of grammar rules (as we present in §5), we can parse an input string to obtpariens a ttr iene as s,h wowen ca inn Figure n1 . [sent-137, score-0.282]
29 i Tpuhte overlapping branches of the tree demonstrate that this grammar captures something a CFG could not. [sent-138, score-0.188]
30 From the parse tree one can read off the word’s root morpheme and the template used. [sent-139, score-0.379]
31 To capture the maximal case of a root with k − 1 characters and – ψ km discontiguous templatic cihthar kac −ter 1s forming a ss atenmd would require a grammar that has arity ψ = k. [sent-142, score-0.632]
32 3 This is a daunting proposition for parsing, but we are careful 3The trade-off between arity and rank with respect to parsing complexity has been characterised (Gildea, 2010), and the appropriate refactoring may bring down the complexity for our grammars too. [sent-144, score-0.275]
33 to set up our application of SRCGs in such a way that this is not too big an obstacle: Firstly, our grammars are defined over the characters that make up a word, and not over words that make up a sentence. [sent-145, score-0.219]
34 s Acia t readnd pormob string θin the languagPe ro∈fP the grammar can then be obtained through a gPenerative procedure that begins with the start symbol S and iteratively expands it until deriving ? [sent-155, score-0.224]
35 : At each step for some current symbol A, a rewrite rule r is sampled randomly from PA in accordance with the distribution over rules and used to expand A. [sent-156, score-0.199]
36 The probability P(w, t) of the rQesulting tree t and terminal string w is the product Qr θr over the sequence of rewrite rules used. [sent-161, score-0.242]
37 Adaptor grammars weaken this independence assumption by allowing whole subtrees to be reused during expansion. [sent-166, score-0.212]
38 Informally, they act as a cache of tree fragments whose tendency to be reused during expansion is governed by the choice of adaptor function. [sent-167, score-0.469]
39 Following earlier applications of adaptor grammars (Johnson et al. [sent-168, score-0.482]
40 , 2011), we employ the Pitman-Yor process (Pitman, 1995; Pit- man and Yor, 1997) as adaptor function. [sent-170, score-0.304]
41 The first case denotes the situation where a previously cached tree is reused for this n + 1-th expansion of A; to be clear, this expands A with a fully terminating tree fragment, meaning that none of the nodes descending from A in the tree being generated are subject to further expansion. [sent-185, score-0.331]
42 The invariance of SRCGs trees under isomorphism would make the probabilistic model deficient, but we side-step this issue by requiring that grammar rules are specified in a canonical way that ensures a one-to-one correspondence between the order of nodes in a tree and of terminals in the yield. [sent-190, score-0.29]
43 (6) Taking all the adapted nonQ-terminals into account, the joint probability of a set of full trees T under the grammar G is P(T|a,b,α) =YB(αB(Aα+A) fA)PY (z(T)|a,b), (7) where fA is a vector of the usage counts of rules r ∈ PA across T, and B is the Euler beta function. [sent-199, score-0.279]
44 The sampler works by visiting each string w in turn and drawing a new tree for it under a proposal grammar GQ and randomly accepting that as the new analysis Gfor w according to the Metropolis-Hastings acceptreject probability. [sent-202, score-0.237]
45 ausnaedlo fgooru PsC aFpGpros,x namely by taking a ss Jtoathicn snapshot GQ of the adaptor grammar where additional rules Grewrite adapted non-terminals as the terminal strings of their cached trees. [sent-204, score-0.677]
46 Lastly, the adaptor hyperparameters a and b are modelled by placing flat Beta(1, 1) and vague Gamma(10, 0. [sent-206, score-0.304]
47 , 2007) for morphological analysis of English, of which a later version also covers multiple affixes (Sirts and Goldwater, 2013). [sent-209, score-0.235]
48 In unvocalised text, the standard written form of Modern Standard Arabic (MSA), it may happen that the stem and the root of a word form are one and the same. [sent-218, score-0.499]
49 A discontiguous non-terminal An is rewritten through recursion on its arity down to 1, i. [sent-220, score-0.21]
50 5 Note that although we provide the model lw =ith n t−wo1 sets of discontiguous non-terminals R and T, we do not specify their mapping onto the actual terminal strings; no subdivision of the alphabet into vowels and consonants is hard-wired. [sent-230, score-0.18]
51 These languages share various properties, including morphology and lexical cognates, but are sufficiently different so as to require manual intervention when transferring rulebased morphological analysers across languages. [sent-232, score-0.442]
52 7 Table 1: Corpus statistics, including average number of morphemes (m/w) and characters (c/w) per word, and total surface-realised roots of length 3 or 4. [sent-246, score-0.254]
53 7 This allowed control over the word shapes, which is important to focus the evaluation, while yielding reliable segmentation and root annotations. [sent-248, score-0.261]
54 The first is the strictly context-free adaptor grammar for morphemes as sequences of characters using rules (8)-(9), which we denote as Concat and MConcat, where the latter allows multiple prefixes/suffixes in a word. [sent-257, score-0.625]
55 0, LDC2004L02, and sampled word types having a single stem and at most one prefix, suffix or both, according to the following random procedure: Sample a shape (stem: 0. [sent-260, score-0.361]
56 Sample uniformly at random (with replacement) a stem from the BAMA stem lexicon, and affix(es) from the ones consistent with the chosen stem. [sent-265, score-0.722]
57 The BAMA lexicons contain affixes and their legitimate concatenations, so some of the generated words would permit a linguistic segmentation into multiple prefixes/suffixes. [sent-266, score-0.257]
58 sions with stems as shown in the set of rules above, and we experiment with a variant Tpl3Ch that allows the non-terminal T1 to be rewritten as up to three Char symbols, since the data indicate there are cases where multiple characters intervene between the radicals of a root. [sent-268, score-0.272]
59 As external baseline model we used Morfessor (Creutz and Lagus, 2007), which performs decently in morphological segmentation of a variety of languages, but only handles concatenation. [sent-271, score-0.283]
60 Collected samples, each of which is a set of parse trees of the input word types, are used in two ways: First, by averaging over the samples we can estimate the joint probability of a word type w and a parse tree t under the adaptor grammar, conditional on the data and the model’s hyperparameters. [sent-274, score-0.463]
61 Likewise, we evaluate the implied lexicon of stems, affixes and roots against the corresponding reference sets. [sent-276, score-0.246]
62 The main result is that all our models capable of 351 forming complex stems obtain a marked improvement in F-scores over the baseline concatenative adaptor grammar, and the margin of improvement grows along with the expressivity of the complexstem models tested. [sent-283, score-0.543]
63 This applies across prefix, stem and suffix categories and across our datasets, with the exception of which we elaborate on in §6. [sent-284, score-0.361]
64 Stem lexicons of Arabic were learnt with relatively constant precision (∼70%), but modelling complex sntsetmans b prroeacdiseinoend ∼the7 coverage by aebllionugt 3000 stems over the concatenative model (against a QU0, reference set of 24k stems). [sent-286, score-0.347]
65 On vocalised Arabic, the improvements for stems are along both dimensions. [sent-287, score-0.191]
66 On our Hebrew data, which comprises only 5k words, the gains in lexicon quality from modelling complex stems tend to be larger than on Arabic. [sent-289, score-0.218]
67 Extracting a lexicon of roots is rendered challenging by the unsupervised nature of the model as the labelling ofgrammar symbols is ultimately arbitrary. [sent-291, score-0.236]
68 But adaptor grammars are probabilistic by definition and should thus also be evaluated in terms of probabilistic ability. [sent-296, score-0.482]
69 We plot the true positive rate versus the false positive rate for each prediction lexicon Lτ containing strings that have probability greater than τ under the model (for a grammar category of interest). [sent-298, score-0.224]
70 QU0 Our models with complex stem formation improve over the baseline on the AUC metric too. [sent-348, score-0.361]
71 We include the ROC plots for Hebrew stem and root induction in Figure 2, along with the roots the model was most confident about (Table 4). [sent-349, score-0.631]
72 Two aspects of interest are the segmentation into sequential morphemes and the identification of the root. [sent-352, score-0.204]
73 Our intercalating adaptor grammars consistently obtain large gains in segmentation accuracy over the baseline concatenative model, across all our datasets (Table 3). [sent-353, score-0.714]
74 Of the two MSA datasets, the vocalised version presents a more difficult segmentation task as its words are on average longer and feature 31k unique contiguous morphemes, compared to the 24k in BW for the same number of words. [sent-355, score-0.227]
75 The best triliteral root identification accuracy (on a per-word basis) was found for HEB (74%) and BW (67%). [sent-357, score-0.215]
76 An interesting aspect of these results is that templatic rules may aid segmentation quality without necessarily giving perfect root identification. [sent-359, score-0.435]
77 Modelling stem substructure allows any regularities that give rise to a higher data likelihood to be picked up. [sent-360, score-0.361]
78 All our adaptor grammars severely oversegmented this data, although the mistakes were not uniformly distributed. [sent-362, score-0.482]
79 This remains to be confirmed in future experiments, but would be consistent with other observations on the role of hierarchical adaptation in adaptor grammars (Sirts and Goldwater, 2013). [sent-369, score-0.482]
80 The trend that intercalated rules improve segmentation (compared to the concatenative grammar) remains consistent QU0 8When excluding cases where root equals stem, root identification on BW is 55%. [sent-370, score-0.573]
81 9By way of comparison, Rodrigues and C´avar (2007) presented an unsupervised statistics-based root identification method that obtained precision ranging between 50-75%, the higher requiring vocalised words. [sent-372, score-0.233]
82 (a) Stems (b) Triliteral roots Figure 2: ROC curves for predicting the stem and root lexicons for the HEB dataset. [sent-373, score-0.69]
83 Contrary to our expectations, it performs best on the “harder” worst on the arguably simpler HEB and struggled less than the adaptor grammars on . [sent-377, score-0.482]
84 One factor here is that it learns according to a grammar with multiple consecutive affixes and stems, whereas all our experiments (except on presupposed single affixes. [sent-378, score-0.209]
85 BW0, QU0 QU0) QU0 7 Related work The distinctive feature of our morphological model is that it jointly addresses root identification and morpheme segmentation, and our results demonstrate the mutual benefit of this. [sent-380, score-0.42]
86 , 2011) achieves F1-scores in the high eighties by incorporating sentential context and inferred syntactic categories, both of which our model forgoes, although theirs has × no account of discontiguous root morphemes. [sent-383, score-0.284]
87 Hypothesised root characters are boldfaced, while accent (ˇ) marks gold root characters. [sent-416, score-0.317]
88 Previous approaches to Arabic root identifica- tion that sought to use little supervision typically constrain the search space of candidate characters within a word, leveraging pre-existing dictionaries (Darwish, 2002; Boudlal et al. [sent-417, score-0.179]
89 In contrast to these approaches, our model requires no dictionary, and while our grammar rules effect some constraints on what could be a root, they are specified in a convenient and flexible manner that Pre(w l) . [sent-420, score-0.199]
90 The templatic grammars correctly identified the triliteral and quadriliteral roots, also fixing the segmentation of (a). [sent-433, score-0.487]
91 In (b), the templatic grammar improved over the baseline by finding the correct prefix but falsely posited a suffix. [sent-434, score-0.243]
92 Recent work by Fullwood and O’Donnell (2013) goes some way toward jointly dealing with nonconcatenative and concatenative morphology in the unsupervised setting, but their focus is limited to inflected stems and does not handle multiple consecutive affixes. [sent-440, score-0.524]
93 kataba “he wrote”) into a templatic bit-string denoting root and non-root characters with a root morpheme residue morpheme (e. [sent-443, score-0.67]
94 Learning root-templatic morphology is loosely related to morphological paradigm induction (Clark, 2001 ; Dreyer and Eisner, 2011; Durrett and DeNero, 2013). [sent-454, score-0.35]
95 Our models do not represent templatic paradigms explicitly, but it is interesting to note that preliminary experiments with German indicate that our adaptor grammars pick up on the past participle forming circumfix in ab+ge+spiel+t (played back). [sent-455, score-0.591]
96 8 Conclusion and Outlook We presented a new approach to modelling nonconcatenative phenomena in morphology using sim354 ple range concatenating grammars and extended adaptor grammars to this formalism. [sent-456, score-1.048]
97 Our experiments show that this richer model improves morphological segmentation and morpheme lexicon induc- tion on different languages in the Semitic family. [sent-457, score-0.49]
98 Firstly, the lightly-supervised, metagrammar approach to adaptor grammars (Sirts and Goldwater, 2013) can be extended to this more powerful formalism to lessen the burden of defining the “right” grammar rules by hand, and possibly boost performance. [sent-459, score-0.681]
99 Our PYSRCAG implementation leveraged the adaptor grammar code released by Mark Johnson, whom we thank, along with the in- dividuals who contributed to the public data sources that enabled the empirical elements of this paper. [sent-467, score-0.438]
100 Improving nonparameteric Bayesian inference: Experiments on unsupervised word segmentation with adaptor grammars. [sent-569, score-0.461]
wordName wordTfidf (topN-words)
[('stem', 0.361), ('adaptor', 0.304), ('arabic', 0.222), ('morphology', 0.19), ('srcgs', 0.184), ('grammars', 0.178), ('suf', 0.168), ('morphological', 0.16), ('discontiguous', 0.146), ('srcg', 0.138), ('root', 0.138), ('grammar', 0.134), ('bw', 0.133), ('roots', 0.132), ('stems', 0.13), ('segmentation', 0.123), ('morpheme', 0.122), ('pre', 0.122), ('hebrew', 0.121), ('semitic', 0.12), ('templatic', 0.109), ('concatenative', 0.109), ('tpl', 0.107), ('morfessor', 0.092), ('pysrcag', 0.092), ('morphemes', 0.081), ('sirts', 0.08), ('heb', 0.077), ('triliteral', 0.077), ('affixes', 0.075), ('goldwater', 0.075), ('johnson', 0.07), ('rules', 0.065), ('arity', 0.064), ('expansions', 0.064), ('boullier', 0.061), ('concat', 0.061), ('nonconcatenative', 0.061), ('vocalised', 0.061), ('lexicons', 0.059), ('tree', 0.054), ('rule', 0.053), ('char', 0.053), ('roc', 0.053), ('terminating', 0.053), ('instantiation', 0.051), ('strings', 0.051), ('fragment', 0.05), ('string', 0.049), ('concatenating', 0.049), ('modelling', 0.049), ('analysers', 0.046), ('auc', 0.046), ('bama', 0.046), ('cached', 0.046), ('fsts', 0.046), ('infix', 0.046), ('kitab', 0.046), ('quranic', 0.046), ('rodrigues', 0.046), ('seki', 0.046), ('languages', 0.046), ('derivation', 0.045), ('adapted', 0.043), ('contiguous', 0.043), ('characters', 0.041), ('symbol', 0.041), ('cache', 0.041), ('cfgs', 0.04), ('shuly', 0.04), ('rewrite', 0.04), ('phenomena', 0.039), ('instantiated', 0.039), ('lexicon', 0.039), ('affix', 0.037), ('trees', 0.037), ('bayesian', 0.037), ('kato', 0.036), ('creutz', 0.036), ('msa', 0.036), ('radicals', 0.036), ('ab', 0.036), ('secondly', 0.036), ('expansion', 0.036), ('nonparametric', 0.035), ('unsupervised', 0.034), ('reused', 0.034), ('rewriting', 0.034), ('parse', 0.034), ('sharon', 0.034), ('terminal', 0.034), ('parsing', 0.033), ('ys', 0.032), ('cfg', 0.032), ('template', 0.031), ('symbols', 0.031), ('abcde', 0.031), ('altantawy', 0.031), ('avar', 0.031), ('beesley', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000033 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology
Author: Jan A. Botha ; Phil Blunsom
Abstract: This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78. 1.
2 0.33682936 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
Author: Ramy Eskander ; Nizar Habash ; Owen Rambow
Abstract: We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.
3 0.28436476 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
Author: Stella Frank ; Frank Keller ; Sharon Goldwater
Abstract: Frank Keller keller@ inf .ed .ac .uk Sharon Goldwater sgwater@ inf .ed .ac .uk ILCC, School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK interactions are often (but not necessarily) synergisChildren learn various levels of linguistic structure concurrently, yet most existing models of language acquisition deal with only a single level of structure, implicitly assuming a sequential learning process. Developing models that learn multiple levels simultaneously can provide important insights into how these levels might interact synergistically dur- ing learning. Here, we present a model that jointly induces syntactic categories and morphological segmentations by combining two well-known models for the individual tasks. We test on child-directed utterances in English and Spanish and compare to single-task baselines. In the morphologically poorer language (English), the model improves morphological segmentation, while in the morphologically richer language (Spanish), it leads to better syntactic categorization. These results provide further evidence that joint learning is useful, but also suggest that the benefits may be different for typologically different languages.
4 0.27961382 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases
Author: Victor Chahuneau ; Eva Schlinger ; Noah A. Smith ; Chris Dyer
Abstract: Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific word- and phrase-level translations that are added to a standard translation model as “synthetic” phrases. Our approach relies on morphological analysis of the target language, but we show that an unsupervised Bayesian model of morphology can successfully be used in place of a supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili.
5 0.19559953 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
Author: Wolfgang Seeker ; Jonas Kuhn
Abstract: Morphology and syntax interact considerably in many languages and language processing should pay attention to these interdependencies. We analyze the effect of syntactic features when used in automatic morphology prediction on four typologically different languages. We show that predicting morphology for languages with highly ambiguous word forms profits from taking the syntactic context of words into account and results in state-ofthe-art models.
6 0.10636915 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging
7 0.10088623 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
8 0.091913231 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
9 0.090342492 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
10 0.077242143 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
11 0.073473573 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
12 0.067595892 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
13 0.063399091 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching
14 0.063150741 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
15 0.062937878 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
16 0.059557118 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
17 0.05806528 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
18 0.05564522 201 emnlp-2013-What is Hidden among Translation Rules
19 0.054177672 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
20 0.053708527 146 emnlp-2013-Optimal Incremental Parsing via Best-First Dynamic Programming
topicId topicWeight
[(0, -0.211), (1, -0.103), (2, 0.002), (3, -0.117), (4, -0.449), (5, -0.125), (6, -0.161), (7, -0.085), (8, 0.073), (9, -0.08), (10, 0.017), (11, -0.034), (12, -0.085), (13, -0.024), (14, 0.003), (15, -0.004), (16, 0.073), (17, 0.003), (18, -0.028), (19, -0.04), (20, -0.057), (21, -0.133), (22, -0.045), (23, -0.146), (24, -0.125), (25, -0.025), (26, 0.044), (27, -0.051), (28, 0.04), (29, -0.078), (30, -0.066), (31, -0.018), (32, -0.039), (33, 0.0), (34, -0.012), (35, 0.099), (36, -0.053), (37, -0.104), (38, -0.031), (39, -0.061), (40, -0.102), (41, -0.063), (42, -0.034), (43, -0.006), (44, -0.082), (45, -0.069), (46, -0.048), (47, -0.025), (48, -0.034), (49, 0.079)]
simIndex simValue paperId paperTitle
same-paper 1 0.95645863 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology
Author: Jan A. Botha ; Phil Blunsom
Abstract: This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78. 1.
2 0.88154221 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
Author: Ramy Eskander ; Nizar Habash ; Owen Rambow
Abstract: We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction of 55.6% over a simple baseline; our best method for German achieves a 66.7% error reduction.
3 0.77298701 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases
Author: Victor Chahuneau ; Eva Schlinger ; Noah A. Smith ; Chris Dyer
Abstract: Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific word- and phrase-level translations that are added to a standard translation model as “synthetic” phrases. Our approach relies on morphological analysis of the target language, but we show that an unsupervised Bayesian model of morphology can successfully be used in place of a supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili.
4 0.74317223 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
Author: Stella Frank ; Frank Keller ; Sharon Goldwater
Abstract: Frank Keller keller@ inf .ed .ac .uk Sharon Goldwater sgwater@ inf .ed .ac .uk ILCC, School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK interactions are often (but not necessarily) synergisChildren learn various levels of linguistic structure concurrently, yet most existing models of language acquisition deal with only a single level of structure, implicitly assuming a sequential learning process. Developing models that learn multiple levels simultaneously can provide important insights into how these levels might interact synergistically dur- ing learning. Here, we present a model that jointly induces syntactic categories and morphological segmentations by combining two well-known models for the individual tasks. We test on child-directed utterances in English and Spanish and compare to single-task baselines. In the morphologically poorer language (English), the model improves morphological segmentation, while in the morphologically richer language (Spanish), it leads to better syntactic categorization. These results provide further evidence that joint learning is useful, but also suggest that the benefits may be different for typologically different languages.
5 0.59296179 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology
Author: Wolfgang Seeker ; Jonas Kuhn
Abstract: Morphology and syntax interact considerably in many languages and language processing should pay attention to these interdependencies. We analyze the effect of syntactic features when used in automatic morphology prediction on four typologically different languages. We show that predicting morphology for languages with highly ambiguous word forms profits from taking the syntactic context of words into account and results in state-ofthe-art models.
6 0.39146483 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion
7 0.38883761 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
9 0.35369706 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
10 0.31946141 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization
11 0.31475472 50 emnlp-2013-Combining PCFG-LA Models with Dual Decomposition: A Case Study with Function Labels and Binarization
12 0.30689687 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
13 0.30271354 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
14 0.28965771 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
15 0.2747601 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
16 0.27176309 138 emnlp-2013-Naive Bayes Word Sense Induction
17 0.24666999 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging
18 0.24466658 58 emnlp-2013-Dependency Language Models for Sentence Completion
19 0.23726504 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
20 0.23322642 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation
topicId topicWeight
[(3, 0.028), (9, 0.013), (18, 0.026), (22, 0.026), (30, 0.094), (47, 0.01), (50, 0.393), (51, 0.116), (52, 0.026), (66, 0.077), (71, 0.022), (75, 0.023), (77, 0.027), (96, 0.012), (97, 0.011)]
simIndex simValue paperId paperTitle
1 0.9056263 159 emnlp-2013-Regularized Minimum Error Rate Training
Author: Michel Galley ; Chris Quirk ; Colin Cherry ; Kristina Toutanova
Abstract: Minimum Error Rate Training (MERT) remains one of the preferred methods for tuning linear parameters in machine translation systems, yet it faces significant issues. First, MERT is an unregularized learner and is therefore prone to overfitting. Second, it is commonly used on a noisy, non-convex loss function that becomes more difficult to optimize as the number of parameters increases. To address these issues, we study the addition of a regularization term to the MERT objective function. Since standard regularizers such as ‘2 are inapplicable to MERT due to the scale invariance of its objective function, we turn to two regularizers—‘0 and a modification of‘2— and present methods for efficiently integrating them during search. To improve search in large parameter spaces, we also present a new direction finding algorithm that uses the gradient of expected BLEU to orient MERT’s exact line searches. Experiments with up to 3600 features show that these extensions of MERT yield results comparable to PRO, a learner often used with large feature sets.
2 0.89908129 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
same-paper 3 0.82681608 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology
Author: Jan A. Botha ; Phil Blunsom
Abstract: This paper contributes an approach for expressing non-concatenative morphological phenomena, such as stem derivation in Semitic languages, in terms of a mildly context-sensitive grammar formalism. This offers a convenient level of modelling abstraction while remaining computationally tractable. The nonparametric Bayesian framework of adaptor grammars is extended to this richer grammar formalism to propose a probabilistic model that can learn word segmentation and morpheme lexicons, including ones with discontiguous strings as elements, from unannotated data. Our experiments on Hebrew and three variants of Arabic data find that the additional expressiveness to capture roots and templates as atomic units improves the quality of concatenative segmentation and stem identification. We obtain 74% accuracy in identifying triliteral Hebrew roots, while performing morphological segmentation with an F1-score of 78. 1.
4 0.59789079 98 emnlp-2013-Image Description using Visual Dependency Representations
Author: Desmond Elliott ; Frank Keller
Abstract: Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image descrip- tion task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.
Author: Andrew J. Anderson ; Elia Bruni ; Ulisse Bordignon ; Massimo Poesio ; Marco Baroni
Abstract: Traditional distributional semantic models extract word meaning representations from cooccurrence patterns of words in text corpora. Recently, the distributional approach has been extended to models that record the cooccurrence of words with visual features in image collections. These image-based models should be complementary to text-based ones, providing a more cognitively plausible view of meaning grounded in visual perception. In this study, we test whether image-based models capture the semantic patterns that emerge from fMRI recordings of the neural signal. Our results indicate that, indeed, there is a significant correlation between image-based and brain-based semantic similarities, and that image-based models complement text-based ones, so that the best correlations are achieved when the two modalities are combined. Despite some unsatisfactory, but explained out- comes (in particular, failure to detect differential association of models with brain areas), the results show, on the one hand, that imagebased distributional semantic models can be a precious new tool to explore semantic representation in the brain, and, on the other, that neural data can be used as the ultimate test set to validate artificial semantic models in terms of their cognitive plausibility.
6 0.49049115 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries
7 0.47814083 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
8 0.47158033 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
9 0.46078849 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
10 0.45967242 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
11 0.45673892 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
12 0.45397493 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
13 0.44911188 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
14 0.44858432 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
15 0.44691673 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
16 0.44602805 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
17 0.44555828 2 emnlp-2013-A Convex Alternative to IBM Model 2
18 0.44276157 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
19 0.44245258 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
20 0.4374823 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization