acl acl2012 acl2012-185 knowledge-graph by maker-knowledge-mining

185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars


Source: pdf

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. [sent-3, score-0.147]

2 , 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. [sent-6, score-0.757]

3 A more powerful model, the simple context-free tree grammar, admits such a normal form. [sent-7, score-0.177]

4 Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves. [sent-9, score-0.72]

5 1 Introduction Tree adjoining grammars [TAG] (Joshi et al. [sent-10, score-0.213]

6 A good overview on TAG, their formal properties, their linguistic motivation, and their applications is presented by Joshi and Schabes (1992) and Joshi and Schabes (1997), in which also strong lexicalization is discussed. [sent-13, score-0.18]

7 In general, lexicalization is the process of transforming a grammar into an equivalent one (potentially expressed in another formalism) such that each production contains a lexical item (or anchor). [sent-14, score-0.453]

8 Each production can then be viewed as lexical information on its anchor. [sent-15, score-0.216]

9 nl i alphabet, each production of a lexicalized grammar produces at least one letter of the generated string. [sent-21, score-0.313]

10 Consequently, lexicalized grammars offer significant parsing benefits (Schabes et al. [sent-22, score-0.188]

11 , 1988) as the number of applications of productions (i. [sent-23, score-0.335]

12 In addition, the lexical items in the productions guide the production selection in a derivation, which works especially well in scenarios with large alphabets. [sent-26, score-0.551]

13 , 1960) only requires that the generated string languages coincide, whereas strong equivalence (Chomsky, 1963) requires that even the generated tree languages coincide. [sent-30, score-0.172]

14 Correspondingly, we obtain weak and strong lexicalization based on the required equivalence. [sent-31, score-0.175]

15 The GREIBACH normal form shows that CFG can weakly lexicalize themselves, but they cannot strongly lexicalize themselves (Schabes, 1990). [sent-32, score-0.397]

16 It is a prominent feature of tree adjoining grammars that they can strongly lexicalize CFG (Schabes, 1990),2 and it was claimed and widely believed that they can strongly lexicalize themselves. [sent-33, score-0.668]

17 Recently, Kuhlmann and Satta (2012) proved that TAG actually cannot strongly lexicalize themselves. [sent-34, score-0.175]

18 In fact, they prove that TAG cannot even strongly lexicalize the weaker tree insertion grammars (Schabes and Waters, 1995). [sent-35, score-0.402]

19 , linear and nondeleting) context-free tree grammars [CFTG] (Rounds, 1969; Rounds, 1970) are a more powerful grammar formalism than TAG (M¨ onnich, 1997). [sent-43, score-0.284]

20 However, the monadic variant is strongly equivalent to a slightly extended version of TAG, which is called non-strict TAG (Kepser and Rogers, 2011). [sent-44, score-0.157]

21 In particular, they also demonstrate that monadic CFTG can strongly lexicalize regular tree grammars (G´ ecseg and Steinby, 1984; G ´ecseg and Steinby, 1997). [sent-47, score-0.527]

22 CFTG are weakly equivalent to the simple macro grammars of Fischer (1968), which are a notational variant of the well-nested linear context-free rewriting systems (LCFRS) of Vijay-Shanker et al. [sent-48, score-0.222]

23 In this contribution, we show that CFTG can strongly lexicalize TAG and also themselves, thus answering the second question in the conclusion of Kuhlmann and Satta (2012). [sent-53, score-0.175]

24 This is achieved by a series of normalization steps (see Section 4) and a final lexicalization step (see Section 5), in which a lexical item is guessed for each production that does not already contain one. [sent-54, score-0.412]

25 This item is then transported in an additional argument until it is exchanged for the same item in a terminal pro- duction. [sent-55, score-0.192]

26 The lexicalization is effective and increases the maximal rank (number of arguments) of the nonterminals by at most 1. [sent-56, score-0.214]

27 In contrast to a transformation into GREIBACH normal form, our lexicalization does not radically change the structure of the derivations. [sent-57, score-0.199]

28 Moreover, t[u]p denotes the tree obtained from t by replacing the subtree at p by the tree u ∈ TΣ (X). [sent-91, score-0.21]

29 3 Context-free tree grammars In this section, we recall linear and nondeleting context-free tree grammars [CFTG] (Rounds, 1969; Rounds, 1970). [sent-154, score-0.454]

30 The nonterminals of regular tree grammars only occur at the leaves and are replaced using first-order substitution. [sent-156, score-0.288]

31 In the left-hand sides of productions we write A(x1, . [sent-159, score-0.361]

32 , xk) for a nonterminal A ∈ Nk to indicate the variab)l feosr th aa nt ohnotledr mthien dli Arect ∈ ∈su Nbtrees of a particular occurrence of A. [sent-162, score-0.172]

33 S ∈ N0 is the start nonterminal of rank 0, and • PS ∈is a finite set of productions of the form A(x1, . [sent-166, score-0.57]

34 The components ‘ and r are called left- and righthand side of the production ‘ → r in P. [sent-170, score-0.253]

35 T sahey right-hand side is simply a tree using terminal and nonterminal symbols according to their rank. [sent-175, score-0.423]

36 We use lower-case Greek letters for terminal symbols and upper-case Latin letters for nonterminals. [sent-179, score-0.157]

37 As a running example, we consider the CFTG Gex = ({S(0), Σ, S, P) where • • A(2)}, Σ = {=σ(2 (){, Sα(0), β(0)}} ,aΣn,dS • PΣ c=o {nσtains the productions (see Figure 3):8 S → A(α, α) | A(β, β) | σ(α, β) A(x1, x2) → A(σ(x1 , S) , σ(x2 , S)) | σ(x1 , x2) . [sent-181, score-0.335]

38 A derivation to a tree of TΣ is illustrated in Figure 4. [sent-205, score-0.149]

39 It demonstrates that the final tree in that derivation is in the language JGexK generated by Gex. [sent-206, score-0.149]

40 A major tool is a simple production elimination 9For all k ∈ N and ξ ⇒G ζ we note that ξ ∈ CN∪Σ (Xk) if and only ilfl ζ ∈ C NN an∪dΣ (Xk). [sent-218, score-0.252]

41 The CFTG G is start-separated if posS(r) = ∅ for every production ‘ → r ∈ P. [sent-222, score-0.242]

42 In such a CFTG we call each production of the form S → r initial. [sent-226, score-0.216]

43 It requires that the right-hand side of each non-initial production contains at least two terminal or nonterminal symbols. [sent-232, score-0.477]

44 In particular, it eliminates projection productions A(x1) → x1 and unit productions, in which the right-hand si dxe has the same shape as the lefthand side (potentially with a different root symbol and a different order of the variables). [sent-233, score-0.415]

45 A production ‘ → r is growing if |posN∪Σ(r) | ≥ 2 p. [sent-235, score-0.256]

46 oTdhuec iCoFnTG ‘ →G →i sr growing nifg a illf |opfo oistsN n∪oΣn(-rin)i|ti ≥al productions are growing. [sent-236, score-0.375]

47 The tree language L ⊆ TΣ has finite ∆-ambiguity . [sent-262, score-0.19]

48 This is typically called strong lexicalization (Schabes, 1990; Joshi and Schabes, 1992; Kuhlmann and Satta, 2012) because we require strong equiva- lence. [sent-267, score-0.177]

49 The production ‘ → r is ∆-lexicalDizeefdi nifit pos∆(r) e∅ . [sent-270, score-0.216]

50 p rTohdeu CctiFoTnG ‘ G → →is r r∆ is-l ∆ex-ilceaxliiczaeldif all its non-(inr)iti6 =al productions are ∆ is- ∆lex-liceaxliiczaelidz. [sent-271, score-0.335]

51 First, we define our simple production elimination scheme, which we will use in the following. [sent-276, score-0.252]

52 Roughly speaking, a non-initial Aproduction such that A does not occur in its righthand side can be eliminated from G by applying it in = 12The corresponding notion for weak equivalence is called weak lexicalization (Joshi and Schabes, 1992). [sent-277, score-0.252]

53 , xk) → r in P be a non-initial production such that posA(r) = ∅P. [sent-283, score-0.216]

54 For every other production ρ0 = ‘0 → r0 (inr )P = =an ∅d. [sent-284, score-0.242]

55 In particular, ρ0∅ = ρ0 for every production ρ0, so every productio∅n besides the eliminated production ρ is preserved. [sent-288, score-0.484]

56 Conversely, a derivation of G can be simulated directly by G0ρ except for derivation steps ⇒ρG,p using the eliminated production ρ. [sent-294, score-0.304]

57 tSiionnce s eSp ⇒A, we know that the nonterminal at posSiitniocne p was generated by aanto tthheer n production ρ0. [sent-295, score-0.34]

58 = In the next normalization step we use our production elimination scheme. [sent-300, score-0.252]

59 The goal is to make sure that non-initial monic productions (i. [sent-301, score-0.472]

60 , productions of which the right-hand side contains at most one nonterminal) contain at least one lexical symbol. [sent-303, score-0.372]

61 A sentential form ξ ∈ SF(G) is monic if |posN (ξ) | ≤ 1n. [sent-305, score-0.168]

62 us T rheme onveaxlt of epsilon-productions A → ε and unit productions oAf → Bilo fno-rp rcoodnutcetxito-fnrsee A grammars (Hopcroft cetti al. [sent-309, score-0.457]

63 , xk) ⇒+G0 ξ} , where ⇒+G0 is the transitive closure of ⇒G0 and the wCFheTrGe ⇒G0 = (N, Σ, S, P0) i sc osusuchre th ofat ⇒ ⇒P0 contains exactly the non-∆-lexicalized productions of P. [sent-328, score-0.36]

64 The set ΞA is finite since only finitely many non∆-lexicalized productions can be used due to the finite ∆-ambiguity of JGK . [sent-329, score-0.551]

65 Next, we eliminate all productions of P1 from G1 using Lemma 12 to obtain an equivalent CFTG G2 with the productions P2. [sent-338, score-0.72]

66 In the final step, we drop all non-∆lexicalized monic productions of P2 to obtain the CFTG G, in which all monic productions are ∆lexicalized. [sent-339, score-0.944]

67 The CFTG G0e0x only has {α, β}-lexicalized noninitial monic productions, so we use a new example. [sent-341, score-0.171]

68 c hL tth (at{ SΣ = {σ(2), α}(0,)Σ, ,βS(0,)P} )a bnde 511 xA1⇒G0βσxB1⇒G0βxσ1σβ Bx1⇒G0x1σβ Figure 5: The relevant derivations using only productions that are not ∆-lexicalized (see Example 14). [sent-344, score-0.335]

69 P contains the productions A(x1) B(x1) → → σ(β, B(x1)) σ(α, A(x1)) B(x1) S → → σ(x1 , β) A(α) . [sent-345, score-0.335]

70 Moreover, all its productions are monic, and JGex2K iMs ofirneoitevleyr, ∆al-la imtsb pirgoudouucst ofnors tahree mseotn ∆ic, a=n { JαG} ofK ilMesxo firicneaiotl seyrm, abllo iltss. [sent-347, score-0.335]

71 The relevant derivations using only non-∆-lexicalized productions are shown in Figure 5. [sent-350, score-0.335]

72 We call a production ‘ → r terminal if r ∈ TΣ (X) ; i. [sent-353, score-0.316]

73 | ≥f J G2 Kfo hra sal fli nititse n ∆on--aimnibtiiaglu tietyr,m tihneanl productions r‘) → r ∈ Pfor0. [sent-361, score-0.335]

74 Moreover, we assume that each nonterminal is use- ful and that each of its non-initial monic productions is ∆-lexicalized by Theorem 13. [sent-364, score-0.596]

75 We obtain the desired CFTG by simply eliminating each noninitial terminal production ‘ → r ∈ P such that |pos∆ (r) | = 1l. [sent-365, score-0.35]

76 on R-eicnaitlilal t htaetrm ∆ina =l production uthctaito vnio (l4a)te iss the requirement of Theorem 15. [sent-374, score-0.216]

77 We eliminate it and obtain the CFTG with the productions S S B(x1) B(x1) 5 → → → → σ(β, σ(β, σ(α, σ(α, B(α)) | σ(β, σ(α, β)) σ(α, σ(β, σ(α, β)))) σ(β, B(x1))) σ(β, σ(α, σ(β, σ(x1 ,β))))) . [sent-375, score-0.335]

78 This is the reason why we made sure that we have two occurrences of lexical symbols in the terminal productions. [sent-381, score-0.157]

79 After we exchanged one ∆- × for a parameter, the resulting terminal production is 512 still ∆-lexicalized. [sent-382, score-0.316]

80 Lexical items that are guessed for distinct (occurrences of) productions are transported to distinct (occurrences of) terminal productions [cf. [sent-383, score-0.844]

81 We let N0 = N ∆ be a new set of nonterminals such that rk(hA, δi) = rk(A) + 1t foofr n every mAin ∈ N su cahnd th aδt ∈ (∆hA. [sent-392, score-0.162]

82 This parameter is handed to the (lexicographically) first nonterminal in the right-hand side until it is resolved in a terminal production. [sent-395, score-0.261]

83 F)o wr teha cBh nonterminal A-production ρ = ‘ → r in P let ρδ = hA, δi (x1, . [sent-412, score-0.174]

84 Roughly speaking, we select the lexicographically smallest occurrence of a nonterminal in the right-hand side and pass the lexical symbol δ in the extra parameter to it. [sent-417, score-0.299]

85 The extra parameter is used in terminal productions, so let ρ = ‘ → r in P S→ ασα hSx,1αi → x1σα Figure 7: Original terminal production and the production ρ (see Theorem 17). [sent-418, score-0.682]

86 With these productions we obtain the CFTG G0 = (N ∪ N0, S,P), where Σ, P = P ∪ P0 ∪ P00 and= P0 = [ {ρδ | δ ∈ ∆} P00 = ρ=‘→[r∈P ‘6=ρS=,p‘[o→sNr [ {ρ} . [sent-425, score-0.335]

87 ρ=‘→[r∈P ∈(rP)6=∅ ‘6=ρS=,p‘[o→sNr ∈(rP)=∅ It is easy to prove that those new productions manage the desired transport of the extra parameter if it holds the value indicated in the nonterminal. [sent-426, score-0.335]

88 Finally, we replace each non-initial non-∆-lexicalized production in G0 by new productions that guess a lexical symbol and add it to the new parameter of the (lexicographically) first nonterminal of N in the right-hand side. [sent-427, score-0.718]

89 Note that = each production ‘ → r ∈ Pnil contains at least one occurrence cotfio a nno ‘n →ter rmi ∈na Pl of N (because all monic productions of G are ∆-lexicalized). [sent-429, score-0.688]

90 Now all noninitial non-∆-lexicalized productions from P can be removed, so we obtain the CFTG G00, which is given by (N ∪ N0, Σ, S, R) with R = (P ∪ P000) \ Pnil. [sent-430, score-0.369]

91 only transported until it is resolved in a production with at least two lexical items. [sent-435, score-0.25]

92 If we change δthi ea ldex Sica=liz haSti,oδni construction as indicated before this example, then all the productions Sδ(x1) → Aδ(δ0, δ0, x1) are replaced by the productions Sδ(x1) → A(x1, δ). [sent-440, score-0.698]

93 Moreover, the productions (5) can b e→ replaced by the productions A(x1, x2) → A(σ(x1 , Sδ(δ)) , σ(x2, S)), and then the nonter)m →inal As Aδ xand their productions can be removed, which leaves only 15 productions. [sent-441, score-1.005]

94 sSein CceF TthGe normal form constructions preserve the nonterminal rank, the proof of Theorem 17 shows that CFTG(k) are strongly lexicalized by CFTG(k+1). [sent-443, score-0.312]

95 (1980) that the classes CFTG(k) with k ∈ N in(d1u9c8e0 an ihnaftin tihtee hierarchy FofT string languages, N but in nitremains an open problem whether the rank increase in our lexicalization construction is necessary. [sent-447, score-0.181]

96 Epsilon-free grammars and lexicalized grammars that generate the class of the mildly context-sensitive languages. [sent-497, score-0.342]

97 The equivalence of tree adjoining grammars and monadic linear context-free tree grammars. [sent-591, score-0.522]

98 Adjunction as substitution: An algebraic formulation of regular, context-free and tree adjoining languages. [sent-626, score-0.223]

99 Parsing strategies with ‘lexicalized’ grammars: Application to tree adjoining grammars. [sent-682, score-0.196]

100 Tree insertion grammar: A cubic-time parsable formalism that lexicalizes context-free grammars without changing the trees produced. [sent-691, score-0.148]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cftg', 0.673), ('productions', 0.335), ('production', 0.216), ('monic', 0.137), ('schabes', 0.127), ('lexicalization', 0.127), ('lexicalize', 0.125), ('nonterminal', 0.124), ('grammars', 0.122), ('xk', 0.119), ('kuhlmann', 0.114), ('tree', 0.105), ('theorem', 0.104), ('terminal', 0.1), ('jgk', 0.091), ('adjoining', 0.091), ('finite', 0.085), ('joshi', 0.081), ('rk', 0.08), ('posn', 0.08), ('normal', 0.072), ('ecseg', 0.068), ('lexicographically', 0.068), ('lexicalized', 0.066), ('nonterminals', 0.061), ('tk', 0.059), ('symbols', 0.057), ('greibach', 0.057), ('kepser', 0.057), ('monadic', 0.057), ('posa', 0.057), ('stamer', 0.057), ('strongly', 0.05), ('let', 0.05), ('gex', 0.05), ('equivalent', 0.05), ('alphabet', 0.046), ('finitely', 0.046), ('rogers', 0.046), ('steinby', 0.046), ('derivation', 0.044), ('satta', 0.044), ('symbol', 0.043), ('equivalence', 0.042), ('ha', 0.041), ('automata', 0.041), ('sf', 0.04), ('yd', 0.04), ('guessed', 0.04), ('growing', 0.04), ('pos', 0.039), ('tag', 0.039), ('side', 0.037), ('yves', 0.036), ('elimination', 0.036), ('elim', 0.034), ('engelfriet', 0.034), ('fujiyoshi', 0.034), ('hopcroft', 0.034), ('noninitial', 0.034), ('onnich', 0.034), ('pnil', 0.034), ('rozenberg', 0.034), ('transported', 0.034), ('aravind', 0.033), ('mildly', 0.032), ('grammar', 0.031), ('sentential', 0.031), ('guez', 0.03), ('kanazawa', 0.03), ('treeadjoining', 0.03), ('rn', 0.029), ('item', 0.029), ('rounds', 0.028), ('construction', 0.028), ('formal', 0.028), ('stuttgart', 0.027), ('algebraic', 0.027), ('smallest', 0.027), ('marco', 0.027), ('rank', 0.026), ('sides', 0.026), ('formalism', 0.026), ('every', 0.026), ('th', 0.025), ('cfg', 0.025), ('strong', 0.025), ('formally', 0.025), ('weakly', 0.025), ('rewriting', 0.025), ('positions', 0.024), ('categorial', 0.024), ('position', 0.023), ('weak', 0.023), ('grzegorz', 0.023), ('akio', 0.023), ('arto', 0.023), ('baader', 0.023), ('dit', 0.023), ('feosr', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

2 0.12043514 139 acl-2012-MIX Is Not a Tree-Adjoining Language

Author: Makoto Kanazawa ; Sylvain Salvati

Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.

3 0.10334387 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

Author: Hiroyuki Shindo ; Yusuke Miyao ; Akinori Fujino ; Masaaki Nagata

Abstract: We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.

4 0.089319855 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

Author: Adam Pauls ; Dan Klein

Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.

5 0.075832501 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar

Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.

6 0.069837794 108 acl-2012-Hierarchical Chunk-to-String Translation

7 0.067253076 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

8 0.064510927 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

9 0.056849975 154 acl-2012-Native Language Detection with Tree Substitution Grammars

10 0.049725518 190 acl-2012-Syntactic Stylometry for Deception Detection

11 0.048405778 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

12 0.042615924 83 acl-2012-Error Mining on Dependency Trees

13 0.042208634 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

14 0.037061572 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

15 0.034874689 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

16 0.032404158 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

17 0.031995177 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction

18 0.031300955 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

19 0.031277388 107 acl-2012-Heuristic Cube Pruning in Linear Time

20 0.03121857 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.097), (1, -0.006), (2, -0.067), (3, -0.062), (4, -0.072), (5, -0.005), (6, -0.007), (7, 0.103), (8, 0.013), (9, 0.016), (10, -0.051), (11, -0.086), (12, -0.04), (13, 0.03), (14, 0.044), (15, -0.113), (16, 0.023), (17, -0.037), (18, 0.03), (19, 0.019), (20, 0.041), (21, 0.016), (22, -0.062), (23, -0.003), (24, -0.023), (25, 0.057), (26, 0.039), (27, -0.038), (28, 0.021), (29, 0.068), (30, 0.004), (31, 0.079), (32, -0.12), (33, 0.06), (34, 0.037), (35, -0.025), (36, -0.09), (37, -0.011), (38, -0.128), (39, -0.154), (40, 0.09), (41, -0.023), (42, 0.101), (43, 0.1), (44, 0.008), (45, -0.002), (46, -0.039), (47, 0.107), (48, 0.248), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96385664 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

2 0.82345861 139 acl-2012-MIX Is Not a Tree-Adjoining Language

Author: Makoto Kanazawa ; Sylvain Salvati

Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.

3 0.54832214 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar

Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.

4 0.52121937 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

5 0.45551684 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries

Author: Brian Roark ; Richard Sproat ; Cyril Allauzen ; Michael Riley ; Jeffrey Sorensen ; Terry Tai

Abstract: In this paper, we present a new collection of open-source software libraries that provides command line binary utilities and library classes and functions for compiling regular expression and context-sensitive rewrite rules into finite-state transducers, and for n-gram language modeling. The OpenGrm libraries use the OpenFst library to provide an efficient encoding of grammars and general algorithms for building, modifying and applying models.

6 0.42118496 107 acl-2012-Heuristic Cube Pruning in Linear Time

7 0.41954127 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

8 0.41077837 83 acl-2012-Error Mining on Dependency Trees

9 0.3868064 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

10 0.36724436 108 acl-2012-Hierarchical Chunk-to-String Translation

11 0.35364181 137 acl-2012-Lemmatisation as a Tagging Task

12 0.3227984 154 acl-2012-Native Language Detection with Tree Substitution Grammars

13 0.30201125 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

14 0.29988918 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

15 0.28797749 190 acl-2012-Syntactic Stylometry for Deception Detection

16 0.26034781 184 acl-2012-String Re-writing Kernel

17 0.21342623 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

18 0.20060325 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing

19 0.19687109 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation

20 0.19605578 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.4), (23, 0.019), (26, 0.026), (28, 0.025), (30, 0.029), (35, 0.018), (37, 0.018), (39, 0.034), (74, 0.021), (82, 0.018), (84, 0.039), (85, 0.034), (90, 0.071), (92, 0.073), (94, 0.016), (99, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81241357 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

Author: Andreas Maletti ; Joost Engelfriet

Abstract: Recently, it was shown (KUHLMANN, SATTA: Tree-adjoining grammars are not closed under strong lexicalization. Comput. Linguist., 2012) that finitely ambiguous tree adjoining grammars cannot be transformed into a normal form (preserving the generated tree language), in which each production contains a lexical symbol. A more powerful model, the simple context-free tree grammar, admits such a normal form. It can be effectively constructed and the maximal rank of the nonterminals only increases by 1. Thus, simple context-free tree grammars strongly lexicalize tree adjoining grammars and themselves.

2 0.65595782 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

Author: Hiroshi Yamaguchi ; Kumiko Tanaka-Ishii

Abstract: The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solution finds the segments and their languages through dynamic programming. Empirical results demonstrating the potential of this approach are presented for experiments using texts taken from the Universal Declaration of Human Rights and Wikipedia, covering more than 200 languages.

3 0.6466217 161 acl-2012-Polarity Consistency Checking for Sentiment Dictionaries

Author: Eduard Dragut ; Hong Wang ; Clement Yu ; Prasad Sistla ; Weiyi Meng

Abstract: Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries have been manually or (semi)automatically constructed. The dictionaries have substantial inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionaries in this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability problem and utilize a fast SAT solver to detect inconsistencies in a sentiment dictionary. We perform experiments on four sentiment dictionaries and WordNet.

4 0.3393392 139 acl-2012-MIX Is Not a Tree-Adjoining Language

Author: Makoto Kanazawa ; Sylvain Salvati

Abstract: The language MIX consists of all strings over the three-letter alphabet {a, b, c} that contain an equal n-luemttebrer a olpfh occurrences }o tfh heaatch c olentttaeinr. We prove Joshi’s (1985) conjecture that MIX is not a tree-adjoining language.

5 0.33728683 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

Author: Elif Yamangil ; Stuart Shieber

Abstract: We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.

6 0.30238575 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

7 0.29648262 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

8 0.29551315 31 acl-2012-Authorship Attribution with Author-aware Topic Models

9 0.29519275 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

10 0.2945911 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

11 0.29158676 205 acl-2012-Tweet Recommendation with Graph Co-Ranking

12 0.29143402 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

13 0.28877014 167 acl-2012-QuickView: NLP-based Tweet Search

14 0.28788623 154 acl-2012-Native Language Detection with Tree Substitution Grammars

15 0.28685713 191 acl-2012-Temporally Anchored Relation Extraction

16 0.28630036 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

17 0.28566331 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

18 0.28516161 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

19 0.28501391 181 acl-2012-Spectral Learning of Latent-Variable PCFGs

20 0.28402433 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence