acl acl2012 acl2012-174 knowledge-graph by maker-knowledge-mining

174 acl-2012-Semantic Parsing with Bayesian Tree Transducers


Source: pdf

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk ∗ School of Informatics University of Edinburgh Edinburgh, EH8 9AB, UK Abstract Many semantic parsing models use tree transformations to map between natural language and meaning representation. [sent-12, score-0.383]

2 However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. [sent-13, score-0.552]

3 This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. [sent-14, score-0.72]

4 In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers. [sent-15, score-0.737]

5 Several approaches assume a tree structure to the NL, MR, or both (Ge and Mooney, 2005; Kate and Mooney, 2006; Wong and Mooney, 2006; Lu et al. [sent-20, score-0.261]

6 , 2011), and often in488 † Department of Computing Macquarie University Sydney, NSW 2109, Australia Figure 1: (a) An example sentence/meaning pair, (b) a tree transformation based mapping, and (c) a tree transducer that performs the mapping. [sent-22, score-0.947]

7 volve tree transformations either between two trees or a tree and a string. [sent-23, score-0.583]

8 The tree transducer, a formalism from automata theory which has seen interest in machine translation (Yamada and Knight, 2001 ; Graehl et al. [sent-24, score-0.348]

9 , 2008) and has potential applications in many other areas, is well suited to formalizing such tree transformation based models. [sent-25, score-0.319]

10 We argue for a unifying theory of tree transformation based semantic parsing by presenting a tree transducer model and drawing connections to other similar systems. [sent-27, score-1.063]

11 We make a further contribution by bringing to tree transducers the benefits of the Bayesian framework for principled handling of data sparsity and ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A. [sent-28, score-0.476]

12 (2008) present an EM training procedure for top down tree transducers, but while there are Bayesian approaches to string transducers (Chiang et al. [sent-32, score-0.534]

13 , 2010) and PCFGs (Kurihara and Sato, 2006), there has yet to be a proposal for Bayesian inference in tree transducers. [sent-33, score-0.261]

14 Our variational algorithm produces better semantic parses than EM while remaining general to a broad class of transducers appropriate for other domains. [sent-34, score-0.334]

15 In short, our contributions are three-fold: we present a new state-of-the-art semantic parsing model, propose a broader theory for tree transformation based semantic parsing, and present a general inference algorithm for the tree transducer frame- work. [sent-35, score-1.087]

16 2 Meaning representations and regular tree grammars In semantic parsing, an MR is typically an expression from a machine interpretable language (e. [sent-37, score-0.34]

17 1 More specifically, we assume the MR language is a regular tree language. [sent-41, score-0.261]

18 A regular tree grammar (RTG) closely resembles a context free grammar (CFG), and is a way of describing a language of trees. [sent-42, score-0.375]

19 Formally, define TΣ as the set of trees with symbols from alphabet Σ, and TΣ (A) as the set of all trees in TΣ∪A where symbols from(A A) a only occur at lt three eles aivne Ts. [sent-43, score-0.15]

20 i sT a esnet a onf states, Σ a tisu an alphabet, qstart ∈ wQh eisr eth Qe i nsi atia sle state, aatends, R Σ is a set of grammar ru∈les Q Qof i tsh teh feo irnmit q → t, w, ahnedre R q i ss a ssteatte o ff grormam Qm aarn dru lte iss a tr theee ffroormm TΣ (Q). [sent-46, score-0.128]

21 Aa s rtuatlee typically ncodn tsi issts a o trfe a parent Tsta(tQe (left) and its child states and output symbol (right). [sent-47, score-0.152]

22 3 Weighted root-to-frontier, linear, non-deleting tree-to-string transducers Tree transducers (Rounds, 1970; Thatcher, 1970) are generalizations of finite state machines that operate on trees. [sent-54, score-0.497]

23 Mirroring the branching nature of its input, the transducer may simultaneously transition to several successor states, assigning a separate state to each subtree. [sent-55, score-0.469]

24 There are many classes of transducer with different formal properties (Knight and Greahl, 2005; Maletti et al. [sent-56, score-0.367]

25 It is defined using rules where the left hand side identifies a state of the transducer and a fragment of the input tree, and the right hand side describes a portion of the output string. [sent-59, score-0.839]

26 Variables xi stand for entire sub-trees, and state-variable pairs qj . [sent-60, score-0.135]

27 xi stand for strings produced by applying the transducer starting at state qj to subtree xi. [sent-61, score-0.608]

28 Figure 1(b) illustrates an application of the transducer, taking the tree on the left as input and outputting the string on the right. [sent-62, score-0.356]

29 Formally, a weighted root-to-frontier, tree-tostring transducer is a 5-tuple (Q, Σ, ∆, qstart, R). [sent-63, score-0.367]

30 Q isst a nfgin ittrean set uocfe states, Σ-t uapnlde (∆Q are ,th∆e input a,nRd output alphabets, qstart is the start state, and R is the set of rules. [sent-64, score-0.127]

31 t is the left hand side of rule r and u its right hand side. [sent-77, score-0.338]

32 The transducer is linear iff no variable appears more than once on the right hand side. [sent-78, score-0.437]

33 It is non-deleting iff all variables on the left hand side also occur on the right hand side. [sent-79, score-0.195]

34 In this paper we assume that every tree t on the left hand side is either a single variable x0 or of the form σ(x0, . [sent-80, score-0.386]

35 A weighted tree transducer may define a probability distribution, either a joint distribution over input and output pairs or a conditional distribution of the output given the input. [sent-86, score-0.764]

36 Here, we will use joint distributions, which can be defined by ensuring that the weights of all rules with the same state on the lefthand side sum to one. [sent-87, score-0.189]

37 In this case, it can be helpful to view the transducer as simultaneously generating both the input and output, rather than the usual view of mapping input trees into output strings. [sent-88, score-0.517]

38 4 A generative model of semantic parsing Like the hybrid tree semantic parser (Lu et al. [sent-90, score-0.454]

39 , 2008) and the synchronous grammar based WASP (Wong and Mooney, 2006), our model simultaneously generates the input MR tree and the output NL string. [sent-91, score-0.441]

40 The MR tree is built up according to the provided MR grammar, one grammar rule at a time. [sent-92, score-0.461]

41 In each step, we select an MR rule and then build the NL by first choosing a pattern with which to expand it and then filling out that pattern with words drawn from a unigram distribution. [sent-94, score-0.321]

42 This kind of coupled generative process can be naturally formalized with tree transducer rules, where the input tree fragment on the left side of each rule describes the derivation of the MR and the right describes the corresponding NL derivation. [sent-95, score-1.337]

43 For a simple example of a tree-to-string transducer rule consider q. [sent-96, score-0.51]

44 x1 (1) which simultaneously generates tree fragment population(x1) on the left and sub-string “population of q. [sent-98, score-0.364]

45 While this rule can serve as a single step of an MR-to-NL map such as the example transducer shown in Figure 1(c), such rules do not model the NUM → population(PLACE) PLACE → cityid(CITY, STATE) (r) CITY → portland (u) STATE → maine (v) STATE → maine qmMR,1. [sent-102, score-0.781]

46 W → ǫ (7) Figure 2: Examples of transducer rules (bottom) that generate MR and NL associated with MR rules m-v (top). [sent-135, score-0.505]

47 Transducer rule 2 selects MR rule r from the MR grammar. [sent-136, score-0.286]

48 Rule 3 simultaneously writes the MR associated with rule m and chooses an NL pattern (as does 4 for r). [sent-137, score-0.249]

49 grammaticality of the MR and lack flexibility since sub-strings corresponding to a given tree fragment must be completely pre-specified. [sent-139, score-0.293]

50 See Figure 2 for example rules from our transducer and Figure 3 for a derivation. [sent-142, score-0.436]

51 To ensure that only grammatical MRs are generated, each state of our transducer encodes the identity of exactly one MR grammar rule. [sent-143, score-0.515]

52 For instance, rule 2 in Figure 2 selects 490 MR grammar rule r to expand the ith child of the parent produced by rule m. [sent-145, score-0.596]

53 Aside from ensuring the grammaticality of the generated MR, rules of this type also model the probability of the MR, conditioning the probability of a rule both on the parent rule and the index of the child being expanded. [sent-146, score-0.436]

54 Thus, parent state qMmR,1 encodes not only the identity of rule m, but also the child index, 1in this case. [sent-147, score-0.315]

55 Once the MR rule is selected, qNL states are applied to select among rules such as 3 and 4 to generate the MR entity and choose the NL expansion pattern. [sent-148, score-0.255]

56 The particular set of patterns that appear on the right of rules such as 3 embodies the binary word attachment decisions and the particular permutation of xi in the NL. [sent-156, score-0.182]

57 Thus, rule 4 is just one of 16 such possible patterns (3 binary decisions and 2 permutations), while rule 3 is one of 4. [sent-158, score-0.286]

58 Finally, the NL is filled out with words chosen according to a unigram distribution, implemented in a PCFG-like fashion, using a different rule for each word which recursively chooses the next word until a string termination rule is reached. [sent-160, score-0.389]

59 2 Generating word sequence “population of” entails first choosing rule 5 in Figure 2. [sent-161, score-0.168]

60 State qrW is then recursively applied to choose rule 6, generating “of” at the same time as deciding to terminate the string by transitioning to a new state qEND which deterministically concludes by writing the empty string ǫ. [sent-162, score-0.3]

61 At each step of the derivation, an MR rule is chosen to expand a node of the MR tree, and then a corresponding part of the NL is expanded. [sent-167, score-0.172]

62 1 of the example chooses MR rule m, NUM → population(PLACE). [sent-169, score-0.173]

63 Transducer rule 3 then generatespopulation in the MR (shown in the left column) at the same time as choosing an NL expansion pattern (Step 1. [sent-170, score-0.245]

64 This coupled derivation can be represented by a tree, shown in Figure 3(c), which explicitly represents the dependency structure of the coupled MR and NL (a simplified version is shown in (d) for clarity). [sent-174, score-0.153]

65 In our transducer, which defines a joint distribution over both the MR and NL, the probability of a rule is conditioned on the parent state. [sent-175, score-0.22]

66 Since each state encodes an MR rule, MR rule specific distributions are learned for both the words and their order. [sent-176, score-0.234]

67 5 Relation to existing models The tree transducer model can be viewed either as a generative procedure for building up two separate structures or as a transformative machine that takes one as input and produces another as output. [sent-177, score-0.743]

68 WASP (Wong and Mooney, 2006) is an example of the former perspective, coupling the generation of the MR and NL with a synchronous gram- mar, a formalism closely related to tree transducers. [sent-179, score-0.317]

69 The most significant difference from our approach is that they use machine translation techniques for automatically extracting rules from parallel corpora; similar techniques can be applied to tree transducers (Galley et al. [sent-180, score-0.545]

70 In fact, synchronous grammars and tree transducers can be seen as instances of the same more general class of automata (Shieber, 3The addition of W symbols is a convenience; it is easier to design transducer rules where every substring on the right side corresponds to a subtree on the left. [sent-182, score-1.208]

71 At each step an MR grammar rule is chosen to expand the MR and the corresponding portion of the NL is then generated. [sent-184, score-0.229]

72 Symbols W stand for locations in the tree corresponding to substrings of the output and are removed in a post-processing step. [sent-185, score-0.32]

73 However, they represent the MR and NL with a single tree and apply tree walking algorithms to extract them. [sent-195, score-0.522]

74 The tree 492 transducer, on the other hand, naturally captures the same probabilistic dependencies while maintaining the separation between MR and NL, and further allows us to build upon a larger body of theory. [sent-197, score-0.261]

75 This procedure corresponds to backward-application in tree transducers, identifying the most likely input tree given a particular output string. [sent-200, score-0.605]

76 While few semantic parsers attempt to exploit syntactic information, there are techniques from machine translation for using tree transducers to map between parsed parallel corpora, and these techniques could likely be applied to semantic parsing. [sent-202, score-0.572]

77 (201 1) argue for the PCFG as an alternative model class, permitting conventional grammar induction techniques, and tree transducers are similar enough that many techniques are applicable to both. [sent-204, score-0.583]

78 The tree transducer framework, on the other hand, allows us to condition on individual MR rules. [sent-206, score-0.628]

79 6 Variational Bayes for tree transducers As seen in the example in Figure 3(c), tree transducers not only operate on trees, their derivations are themselves trees, making them amenable to dynamic programming and an EM training procedure resembling inside-outside (Graehl et al. [sent-207, score-1.005]

80 The Bayesian framework offers an elegant solution to this problem, introducing a prior over rule weights which simultaneously ensures that all rules receive non-zero probability and allows the incorporation of prior knowledge and intuitions. [sent-210, score-0.247]

81 The tree transducer defines a joint distribution over the input y, output w, and their derivation x as the product of the weights of the rules appearing in x. [sent-212, score-0.82]

82 That is, p(y,x,w|θ) = Y θ(r)cr(x) rY∈R where θ is the set of multinomial parameters, r is a transducer rule, θ(r) is its weight, and cr(x) is the 493 number of times r appears in x. [sent-213, score-0.367]

83 lnp(Y,W|α) = lnEq[p(Yq,(Xθ,,XW)|θ)] ≥ Eq[lnp(Yq,(Xθ,,XW)|θ)] = F, Applying our independence assumption, we arrive at the following expression for F, where θt is the parttihceul faorl parameter vector corresponding to the rules with parent state t: F= X ? [sent-224, score-0.187]

84 We explore learning separate hyper-parameters αt feor e xepalcohr θt, using a fixed point update described by Minka (2000), where kt is the number of rules with parent state t: ˆθ. [sent-232, score-0.187]

85 −1 7 Training and decoding We implement our VB training algorithm inside the tree transducer package Tiburon (May and Knight, 2006), and experiment with both manually set and automatically estimated priors. [sent-238, score-0.628]

86 WASP (Wong and Mooney, 2006) and the hybrid tree (Lu et al. [sent-254, score-0.289]

87 , 2008) are chosen to represent tree transformation based approaches, and, while this comparison is our primary focus, we also report UBL-S (Kwiatkowski et al. [sent-255, score-0.319]

88 6 The hybrid tree is notable as the only other system based on a generative model, and uni-hybrid, a version that uses a unigram distribution over words, is very similar to our own model. [sent-257, score-0.382]

89 We report transducer performance under three different training conditions: tsEM using EM, tsVBauto using VB with empirical Bayes, and tsVB-hand using hyper-parameters manually tuned on the German training data (α of 0. [sent-259, score-0.367]

90 Highest scores are in bold, while the highest among the tree based models are marked with a bullet. [sent-269, score-0.261]

91 The dotted line separates the tree based from non-tree based models. [sent-270, score-0.286]

92 Test set accuracy is consistently higher for the VB trained tree transducer than the other tree transformation based models (and often highest overall), while fscore remains competitive. [sent-272, score-0.947]

93 7 9 Conclusion We have argued that tree transformation based semantic parsing can benefit from the literature on formal language theory and tree automata, and have taken a step in this direction by presenting a tree transducer based semantic parser. [sent-273, score-1.348]

94 Drawing this connection facilitates a greater flow of ideas in the research community, allowing semantic parsing to leverage ideas from other work with tree automata, while making clearer how seemingly isolated efforts might relate to one another. [sent-274, score-0.353]

95 We demonstrate this by both building on previous work in training tree transducers using EM (Graehl et al. [sent-275, score-0.476]

96 Highest scores are in bold, while the highest among the tree based models are marked with a bullet. [sent-278, score-0.261]

97 The dotted line separates the tree based from non-tree based models. [sent-279, score-0.286]

98 7 and describing a general purpose variational inference algorithm for adapting tree transducers to the Bayesian framework. [sent-280, score-0.547]

99 The new VB algorithm results in an overall performance improvement for the transducer over EM training, and the general effectiveness of the approach is further demonstrated by the Bayesian transducer achieving highest accuracy among other tree transformation based approaches. [sent-281, score-1.053]

100 An overview of probabilistic tree transducers for natural language processing. [sent-315, score-0.476]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mr', 0.62), ('transducer', 0.367), ('tree', 0.261), ('nl', 0.258), ('transducers', 0.215), ('rule', 0.143), ('population', 0.095), ('maine', 0.089), ('xi', 0.079), ('rtg', 0.077), ('lnp', 0.077), ('graehl', 0.076), ('mooney', 0.074), ('variational', 0.071), ('qstart', 0.071), ('bayesian', 0.069), ('rules', 0.069), ('state', 0.067), ('eq', 0.066), ('em', 0.063), ('automata', 0.063), ('wong', 0.063), ('kwiatkowski', 0.062), ('transformation', 0.058), ('grammar', 0.057), ('coupled', 0.056), ('side', 0.053), ('kurihara', 0.053), ('wasp', 0.053), ('vb', 0.052), ('parent', 0.051), ('semantic', 0.048), ('cr', 0.048), ('knight', 0.046), ('symbols', 0.044), ('parsing', 0.044), ('states', 0.043), ('mrs', 0.042), ('kate', 0.042), ('unigram', 0.042), ('pattern', 0.041), ('derivation', 0.041), ('orschinger', 0.039), ('num', 0.039), ('subtree', 0.039), ('kevin', 0.037), ('hand', 0.036), ('left', 0.036), ('cityid', 0.035), ('geoquery', 0.035), ('jonathon', 0.035), ('lnq', 0.035), ('qe', 0.035), ('qend', 0.035), ('qnl', 0.035), ('qrw', 0.035), ('tiburon', 0.035), ('transformative', 0.035), ('xyi', 0.035), ('lu', 0.035), ('simultaneously', 0.035), ('right', 0.034), ('fragment', 0.032), ('synchronous', 0.032), ('stand', 0.031), ('fold', 0.031), ('trees', 0.031), ('sato', 0.031), ('maletti', 0.031), ('yq', 0.031), ('grammars', 0.031), ('string', 0.031), ('chooses', 0.03), ('transformations', 0.03), ('child', 0.03), ('cfg', 0.029), ('expand', 0.029), ('dirichlet', 0.029), ('input', 0.028), ('transitioning', 0.028), ('bevan', 0.028), ('hybrid', 0.028), ('output', 0.028), ('procedure', 0.027), ('tom', 0.026), ('amenable', 0.026), ('applicable', 0.026), ('ge', 0.026), ('distribution', 0.026), ('place', 0.026), ('choosing', 0.025), ('restrictive', 0.025), ('dotted', 0.025), ('qj', 0.025), ('generative', 0.025), ('portland', 0.024), ('answers', 0.024), ('argue', 0.024), ('encodes', 0.024), ('formalism', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

2 0.17299108 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

3 0.10766809 154 acl-2012-Native Language Detection with Tree Substitution Grammars

Author: Benjamin Swanson ; Eugene Charniak

Abstract: We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Furthermore, we contrast these two induction algorithms and show that the Bayesian approach produces superior classification results with a smaller feature set.

4 0.10085613 118 acl-2012-Improving the IBM Alignment Models Using Variational Bayes

Author: Darcey Riley ; Daniel Gildea

Abstract: Bayesian approaches have been shown to reduce the amount of overfitting that occurs when running the EM algorithm, by placing prior probabilities on the model parameters. We apply one such Bayesian technique, variational Bayes, to the IBM models of word alignment for statistical machine translation. We show that using variational Bayes improves the performance of the widely used GIZA++ software, as well as improving the overall performance of the Moses machine translation system in terms of BLEU score.

5 0.095842779 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

Author: Adam Pauls ; Dan Klein

Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.

6 0.094572082 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

7 0.094211973 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

8 0.092376098 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

9 0.086973086 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

10 0.085628264 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

11 0.082495414 139 acl-2012-MIX Is Not a Tree-Adjoining Language

12 0.08217261 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

13 0.079224184 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

14 0.077805787 108 acl-2012-Hierarchical Chunk-to-String Translation

15 0.0776527 78 acl-2012-Efficient Search for Transformation-based Inference

16 0.076345257 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

17 0.07315892 83 acl-2012-Error Mining on Dependency Trees

18 0.070103951 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

19 0.069490761 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

20 0.067709662 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.196), (1, -0.027), (2, -0.083), (3, -0.052), (4, -0.106), (5, 0.017), (6, -0.008), (7, 0.151), (8, 0.042), (9, 0.017), (10, -0.071), (11, -0.086), (12, -0.101), (13, 0.01), (14, 0.011), (15, -0.11), (16, 0.009), (17, 0.029), (18, 0.053), (19, 0.072), (20, 0.012), (21, -0.029), (22, -0.078), (23, 0.016), (24, -0.151), (25, -0.046), (26, 0.112), (27, -0.058), (28, 0.042), (29, 0.142), (30, 0.058), (31, 0.145), (32, -0.04), (33, -0.024), (34, 0.002), (35, 0.101), (36, -0.054), (37, -0.064), (38, 0.058), (39, 0.004), (40, 0.042), (41, -0.094), (42, 0.071), (43, 0.106), (44, 0.014), (45, -0.106), (46, 0.155), (47, -0.021), (48, 0.026), (49, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94574517 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

2 0.80235195 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

Author: Brian Roark ; Richard Sproat ; Cyril Allauzen ; Michael Riley ; Jeffrey Sorensen ; Terry Tai

Abstract: In this paper, we present a new collection of open-source software libraries that provides command line binary utilities and library classes and functions for compiling regular expression and context-sensitive rewrite rules into finite-state transducers, and for n-gram language modeling. The OpenGrm libraries use the OpenFst library to provide an efficient encoding of grammars and general algorithms for building, modifying and applying models.

3 0.62557858 139 acl-2012-MIX Is Not a Tree-Adjoining Language

Author: Makoto Kanazawa ; Sylvain Salvati

Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.

4 0.59232956 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

5 0.57530588 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Micha Elsner ; Sharon Goldwater ; Jacob Eisenstein

Abstract: ILCC, School of Informatics School of Interactive Computing University of Edinburgh Georgia Institute of Technology Edinburgh, EH8 9AB, UK Atlanta, GA, 30308, USA (a) intended: /ju want w2n/ /want e kUki/ (b) surface: [j@ w a?P w2n] [wan @ kUki] During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [Di] or [D@]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.

6 0.49362689 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

7 0.4569788 83 acl-2012-Error Mining on Dependency Trees

8 0.45668542 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

9 0.45437148 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

10 0.44478658 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

11 0.44046789 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

12 0.4303059 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

13 0.41583428 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

14 0.39843971 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

15 0.38822919 108 acl-2012-Hierarchical Chunk-to-String Translation

16 0.37416777 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

17 0.36784813 154 acl-2012-Native Language Detection with Tree Substitution Grammars

18 0.364847 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

19 0.36327904 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

20 0.35819575 118 acl-2012-Improving the IBM Alignment Models Using Variational Bayes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(23, 0.012), (25, 0.026), (26, 0.056), (28, 0.05), (30, 0.037), (37, 0.04), (39, 0.054), (43, 0.216), (74, 0.027), (82, 0.033), (84, 0.045), (85, 0.032), (90, 0.096), (92, 0.117), (94, 0.016), (99, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82052928 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

2 0.63097972 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

Author: Arjun Mukherjee ; Bing Liu

Abstract: Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes, different categorizations may be needed. Some form of user guidance is desired. In this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1

3 0.62312126 31 acl-2012-Authorship Attribution with Author-aware Topic Models

Author: Yanir Seroussi ; Fabian Bohnert ; Ingrid Zukerman

Abstract: Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.

4 0.62193656 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

Author: Elif Yamangil ; Stuart Shieber

Abstract: We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.

5 0.61727524 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

Author: Rui Yan ; Mirella Lapata ; Xiaoming Li

Abstract: Mirella Lapata‡ Xiaoming Li†, \ ‡Institute for Language, \State Key Laboratory of Software Cognition and Computation, Development Environment, University of Edinburgh, Beihang University, Edinburgh EH8 9AB, UK Beijing 100083, China mlap@ inf .ed .ac .uk lxm@pku .edu .cn 2012.1 Twitter enables users to send and read textbased posts ofup to 140 characters, known as tweets. As one of the most popular micro-blogging services, Twitter attracts millions of users, producing millions of tweets daily. Shared information through this service spreads faster than would have been possible with traditional sources, however the proliferation of user-generation content poses challenges to browsing and finding valuable information. In this paper we propose a graph-theoretic model for tweet recommendation that presents users with items they may have an interest in. Our model ranks tweets and their authors simultaneously using several networks: the social network connecting the users, the network connecting the tweets, and a third network that ties the two together. Tweet and author entities are ranked following a co-ranking algorithm based on the intuition that that there is a mutually reinforcing relationship between tweets and their authors that could be reflected in the rankings. We show that this framework can be parametrized to take into account user preferences, the popularity of tweets and their authors, and diversity. Experimental evaluation on a large dataset shows that our model out- performs competitive approaches by a large margin.

6 0.61209321 154 acl-2012-Native Language Detection with Tree Substitution Grammars

7 0.61173695 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

8 0.61017764 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

9 0.60563403 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

10 0.59975356 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

11 0.59810501 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks

12 0.59694898 167 acl-2012-QuickView: NLP-based Tweet Search

13 0.59649622 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

14 0.59516776 139 acl-2012-MIX Is Not a Tree-Adjoining Language

15 0.59226537 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

16 0.58875829 78 acl-2012-Efficient Search for Transformation-based Inference

17 0.58644474 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

18 0.58159161 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

19 0.58099931 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction

20 0.58096516 187 acl-2012-Subgroup Detection in Ideological Discussions