acl acl2010 acl2010-21 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
Reference: text
sentIndex sentText sentNum sentScore
1 cat Abstract A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. [sent-4, score-1.37]
2 Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. [sent-5, score-0.93]
3 This characterization allows the easy integration of STAG into toolkits for extended tree transducers. [sent-6, score-0.533]
4 Top-down tree transducers (Rounds, 1970; Thatcher, 1970) have been heavily investigated in the formal language community (G´ ecseg and Steinby, 1984; G ´ecseg and Steinby, 1997), but as argued by Shieber (2004) they are still too weak for syntax-based machine translation. [sent-12, score-0.626]
5 Instead Shieber (2004) proposes synchronous tree substitution grammars (STSGs) and develops an equivalent bimorphism (Arnold and Dauchet, 1982) characterization. [sent-13, score-0.645]
6 This characterization eventually led to the rediscovery of extended tree transducers (Graehl and Knight, 2004; Knight and Graehl, 2005; Graehl et al. [sent-14, score-0.684]
7 The bimorphisms used are rather unconventional because they consist of a regular tree language and two embedded tree transducers (instead oftwo tree homomorphisms). [sent-20, score-1.301]
8 Such embedded tree transducers (Shieber, 2006) are particular macro tree transducers (Courcelle and Franchi-Zannettacci, 1982; Engelfriet and Vogler, 1985). [sent-21, score-1.163]
9 We will develop a tree transducer model that can simulate STAGs. [sent-23, score-0.649]
10 It turns out that the adjunction operation of an STAG can be explained easily by explicit substitution. [sent-24, score-0.286]
11 We prove that any tree transformation computed by an STAG can also be computed by an STSG using explicit substitution. [sent-26, score-0.532]
12 Thus, a simple evaluation procedure that performs the explicit substitution is all that is needed to simulate an STAG in a toolkit for STSGs or extended tree transducers like TIBURON by May and Knight (2006). [sent-27, score-0.877]
13 Finally, we also present a complete tree transducer model that is as powerful as STAG, which is an extension of the embedded tree transducers of Shieber (2006). [sent-30, score-1.172]
14 c s 2o0c1ia0ti Aosnso focria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s067–1076, 2 Notation We quickly recall some central notions about trees, tree languages, and tree transformations. [sent-33, score-0.654]
15 The tree with a root node labeled σ is written σ(t1 , . [sent-61, score-0.431]
16 A tree language is any subset of TΣ (V ) for some alphabet Σ and set V . [sent-68, score-0.367]
17 Given another alphabet ∆ and a set Y , a tree transformation is a relation τ TΣ (V ) T∆ (Y ). [sent-69, score-0.441]
18 The translation of τ is tfhorem mraeltaiotinon τ {(yd(t) , yd(u)) | (t, u) ∈ τ} where yd(t), athtieo yield do(ft t, yisd t(hue) sequence o ∈f l τea}f wlahbeerlse in a left-to-right tree traversal of t. [sent-73, score-0.356]
19 The yield of the third tree in Figure 1 is “the N saw the N”. [sent-74, score-0.362]
20 ⊆ ⊆ ⊆ 3 Substitution A standard operation on (labeled) trees is substitution, which replaces leaves with a specified label in one tree by another tree. [sent-76, score-0.469]
21 result of) the substitution that replaces all leaves labeled A in the tree t by the tree u. [sent-81, score-0.896]
22 Clearly, this problem is avoided if the source tree t contains only one leaf labeled A. [sent-100, score-0.401]
23 We call a tree Aproper if it contains exactly one leaf with label A. [sent-101, score-0.404]
24 For example, the tree t of Figure 1 is ‘saw’-proper, oarn dex tahme ptrleee, u o trf Figure F1i gisu e‘th 1e’ i-s and ‘N’-proper. [sent-103, score-0.327]
25 The tree t[u]NP in Figure 1 only shows the result of the substitution. [sent-105, score-0.327]
26 It cannot be infered from the tree alone, how it was obtained (if we do not know t and u). [sent-106, score-0.327]
27 To obtain t[u]NP (the right-most tree i ·n[· Figure 1), we have to evaluate · [·]NP(t, u). [sent-117, score-0.327]
28 4 Extended tree transducer An extended tree transducer is a theoretical model that computes a tree transformation. [sent-135, score-1.601]
29 Their popularity in machine translation is due to Shieber (2004), in which it is shown that extended tree transducers are essentially (up to a relabeling) as expressive as synchronous tree substitution grammars (STSG). [sent-139, score-1.337]
30 (2009) our extended tree transducers are linear, nondeleting extended top-down tree transducers. [sent-145, score-1.053]
31 Rewcahlel teha kt any tree of CΣ (Xk) c ro ∈nta Cins each variable of Xk = {x1, . [sent-150, score-0.327]
32 A sentential form is a tree that contains exclusively output symbols towards the root and remaining parts of the input headed by a state as leaves. [sent-164, score-0.523]
33 , qk (tk) headed by the corresponding state, respectively, and • replacing the selected leaf in ξ by the tree rceopnlsatcriuncgted th ien sthelee previous it inem ξ. [sent-177, score-0.486]
34 Formally, a sentential form of the XTT M is a tree of SF = T∆(Q(TΣ)) where Q(TΣ) = {q(t) | q ∈ Q, t ∈ TΣ} ) . [sent-179, score-0.357]
35 The tree transformation computed by M is the relation = . [sent-189, score-0.431]
36 5 Synchronous tree-adjoining grammar XTT are a simple, natural model for tree transformations, however they are not suitably expressive for all applications in machine translation (Shieber, 2007). [sent-197, score-0.381]
37 In particular, all tree transformations of XTT have a certain locality condition, which yields that the input tree and its corresponding translation cannot be separated by an unbounded distance. [sent-198, score-0.712]
38 A treeadjoining grammar essentially is a regular tree grammar (G´ ecseg and Steinby, 1984; G ´ecseg and DTNPN N? [sent-201, score-0.512]
39 NADJ DTNPN rouges bonbons detr eiveed N ADJ bonbons les les rouges autxrielieary adjunction Figure 4: Illustration of an adjunction taken from Nesson et al. [sent-202, score-0.592]
40 roAuDgJesbonbons Figure 5: Illustration of the adjunction of Figure 4 using explicit substitution. [sent-206, score-0.286]
41 Roughly speaking, an adjunction replaces a node (not necessarily a leaf) by an auxiliary tree, which has exactly one distinguished foot node. [sent-208, score-0.441]
42 Traditionally, the root label and the label of the foot node coincide in an auxiliary tree aside from a star index that marks the foot node. [sent-210, score-0.689]
43 For example, if the root node of an auxiliary tree is labeled A, then the foot node is traditionally labeled A? [sent-211, score-0.65]
44 Formally, the adjunction of the auxiliary tree u with root label A (and foot node label A? [sent-214, score-0.832]
45 The result of the adjunction of Figure 4 using explicit substitution is displayed in Figure 5. [sent-229, score-0.447]
46 initial tree auxiliary tree auxiliary tree auxiliary tree Figure 6: A TAG for the copy string language {wcw | w ∈ {a, b}∗} taken from Shieber (2006). [sent-235, score-1.563]
47 A derivation is a chain of trees that starts with an initial tree and each derived tree is obtained from the previous one in the chain by adjunction of an auxiliary tree. [sent-237, score-1.091]
48 , if an auxiliary tree can be adjoined, then we need to make an adjunction. [sent-240, score-0.412]
49 Thus, a derivation starting from an initial tree to a derived tree is complete if no adjunction is possible in the derived tree. [sent-241, score-0.948]
50 This is easily achieved by labeling the root of each adjoined auxiliary tree by a special marker. [sent-243, score-0.509]
51 Traditionally, the root label A of an auxiliary tree is replaced by A∅ once adjoined. [sent-244, score-0.48]
52 Since we assume that there are no auxiliary trees with such a root label, no further adjunction is possible at such nodes. [sent-245, score-0.4]
53 A pair of auxiliary trees is then adjoined to linked nodes (one in each tree of the sentential form) in the expected manner. [sent-257, score-0.555]
54 6 τG for the tree transformation Main result In this section, we will present our main result. [sent-268, score-0.401]
55 Thus, for every tree transformation computed by a STAG, there is an extended tree transducer that computes a representation of the tree transformation using explicit substitution. [sent-270, score-1.6]
56 For every extended tree transducer M that uses explicit substitution, we can construct a STAG that computes the tree transformation represented by τM up to a relabeling (a mapping that consistently replaces node labels throughout the tree). [sent-272, score-1.31]
57 If we replace the extended tree transducer by a STSG, then the result holds even without the relabeling. [sent-274, score-0.669]
58 Theorem 1 For every STAG G, there exists an extended tree transducer M such that τG = {(tE, uE) | (t, u) ∈ τM} . [sent-275, score-0.669]
59 Conversely, for every extended tree transducer M, there exists a STAG G such that the above relation holds up to a relabeling. [sent-276, score-0.669]
60 1 Proof sketch The following proof sketch is intended for readers that are familiar with the literature on embedded tree transducers, macro tree transducers, and morphisms. [sent-278, score-0.84]
61 Let τ ⊆ TΣ T∆ be a tree transformation computed by a S×TA TG. [sent-281, score-0.431]
62 By Shieber (2006) there exists a regular tree language L ⊆ TΓ and two efuxniscttsio ans r e1 : TΓ → TΣ aunadg e2 : TΓ → T∆ such that τ = {(e1(t) , e2 (t)) | t ∈ L}. [sent-282, score-0.352]
63 be Mddoerde tree transducers (Shieber, 2006), which are particular 1-state, deterministic, total, 1-parameter, lin- ear, and nondeleting macro tree transducers (Courcelle and Franchi-Zannettacci, 1982; Engelfriet and Vogler, 1985). [sent-284, score-1.11]
64 Using a result of Engelfriet and Vogler (1985), each embedded tree transducer can be decomposed into a top-down tree transducer (G´ ecseg and Steinby, 1984; G ´ecseg and Steinby, 1997) and a yield-mapping. [sent-288, score-1.363]
65 In our particular case, the top-down tree transducers are linear and nondeleting homomorphisms h1 and h2. [sent-289, score-0.569]
66 Linearity and nondeletion are inherited from the corresponding properties of the macro tree transducer. [sent-290, score-0.389]
67 The properties ‘1-state’, ‘deterministic’, and ‘total’ of the macro tree transducer ensure that the obtained topdown tree transducer is also 1-state, deterministic, and total, which means that it is a homomorphism. [sent-291, score-1.249]
68 Finally, the 1-parameter property yields that the used substitution symbols are binary (as our substitution symbols · [·]A). [sent-292, score-0.46]
69 Again, this decomposition actually is a characterization of embedded tree transducers. [sent-294, score-0.502]
70 by an extended tree transducer M due to results of Shieber (2004) and Maletti (2008). [sent-296, score-0.669]
71 More precisely, every extended tree transducer computes such a set, so that also this step is a characterization. [sent-297, score-0.697]
72 Thus we obtain that τ is an evaluation of a tree transformation computed by an extended tree transducer, and moreover, for each extended tree transducer, the evaluation can be computed (up to a relabeling) by a STAG. [sent-298, score-1.299]
73 2 Example Let us illustrate one direction (the construction of the extended tree transducer) on our example STAG of Figure 7. [sent-301, score-0.419]
74 Finally, we also present a tree transducer model that includes explicit substitution. [sent-317, score-0.648]
75 1 Toolkits Obviously, our characterization can be applied in a toolkit for extended tree transducers (or STSG) such as TIBURON by May and Knight (2006) to simulate STAG. [sent-320, score-0.731]
76 The existing infrastructure (inputoutput, derivation mechanism, etc) for extended tree transducers can be re-used to run XTTs encoding STAGs. [sent-321, score-0.677]
77 Thus, given a STAG G and a recognizable tree language L, we want to construct a STAG G0 such that = {(t, u) | (t, u) ∈ τG, t ∈ L} . [sent-340, score-0.327]
78 In other words, we take the tree transformation τG but additionally require the input tree to be in L. [sent-341, score-0.757]
79 Let M = (Q, Σ, ∆, I,R) be an XTT (using explicit substitution) and G = (N, Σ, I0, P) be a tree substitution grammar (regular tree grammar) in normal form that recognizes L (i. [sent-346, score-0.886]
80 3 A complete tree transducer model So far, we have specified a tree transducer model that requires some additional parsing before it can be applied. [sent-364, score-1.154]
81 This parsing step has to annotate (and correspondingly restructure) the input tree by the adjunction points. [sent-365, score-0.571]
82 This is best illustrated by the left tree in the last pair of trees in Figure 8. [sent-366, score-0.385]
83 To run our constructed XTT on the trivially completed version of this input tree, it has to be transformed into the first tree of Figure 11, where the adjunctions are now visible. [sent-367, score-0.392]
84 To avoid the first additional parsing step, we will now modify our tree transducer model such that this parsing step is part of its semantics. [sent-369, score-0.577]
85 In addition, we arrive at a tree transducer model that exactly (up to a relabeling) matches the power of STAG, which can be useful for certain constructions. [sent-371, score-0.577]
86 It is known that an embedded tree transducer (Shieber, 2006) can handle the mentioned un-parsing step. [sent-372, score-0.666]
87 An extended embedded tree transducer with 9c0 is the same as c except that it maps A to p0. [sent-373, score-0.758]
88 substitution M = (Q, Σ, ∆, I,R) is simply an embedded tree transducer with extended left-hand sides (i. [sent-374, score-0.919]
89 We refer to Shieber (2006) for a full description of embedded tree transducers. [sent-383, score-0.416]
90 The semantics of extended embedded tree transducers with substitution deviates slightly from the embedded tree transducer semantics. [sent-388, score-1.514]
91 Figure 12: Rule and derivation step using the rule in an extended embedded tree transducer with substitution where the context parameter (if present) is displayed as first child. [sent-395, score-0.998]
92 No•te ζt =hat C t[h(er tessential difference to the “standard” semantics of embedded tree transducers is the evaluation in the first item. [sent-407, score-0.595]
93 The tree transformation computed by M is defined as usual. [sent-408, score-0.431]
94 Theorem 2 Every STAG can be simulated by an extended embedded tree transducer with substitution. [sent-412, score-0.758]
95 Moreover, every extended embedded tree transducer computes a tree transformation that can be computed by a STAG up to a relabeling. [sent-413, score-1.217]
96 • 8 Conclusions We presented an alternative view on STAG using tree transducers (or equivalently, STSG). [sent-414, score-0.506]
97 Our main result shows that the syntactic characterization of STAG as STSG plus adjunction rules also carries over to the semantic side. [sent-415, score-0.301]
98 A STAG tree transformation can also be computed by an STSG using explicit substitution. [sent-416, score-0.502]
99 An overview of probabilistic tree transducers for natural language processing. [sent-501, score-0.506]
100 Unifying synchronous tree adjoining grammars and tree transducers via bimorphisms. [sent-555, score-0.99]
wordName wordTfidf (topN-words)
[('stag', 0.587), ('tree', 0.327), ('transducer', 0.25), ('adjunction', 0.215), ('xtt', 0.179), ('transducers', 0.179), ('shieber', 0.176), ('substitution', 0.161), ('stsg', 0.146), ('xk', 0.12), ('ecseg', 0.12), ('synchronous', 0.108), ('graehl', 0.101), ('tk', 0.098), ('steinby', 0.096), ('extended', 0.092), ('embedded', 0.089), ('characterization', 0.086), ('auxiliary', 0.085), ('qk', 0.082), ('derivation', 0.079), ('transformation', 0.074), ('foot', 0.072), ('relabeling', 0.072), ('explicit', 0.071), ('symbols', 0.069), ('engelfriet', 0.068), ('qhui', 0.068), ('macro', 0.062), ('trees', 0.058), ('adjoined', 0.055), ('rhsk', 0.055), ('stags', 0.055), ('leaf', 0.051), ('grammars', 0.049), ('arnold', 0.049), ('dauchet', 0.048), ('vogler', 0.048), ('simulate', 0.047), ('np', 0.046), ('knight', 0.045), ('root', 0.042), ('courcelle', 0.041), ('operable', 0.041), ('tiburon', 0.041), ('alphabet', 0.04), ('essentially', 0.04), ('node', 0.039), ('stuart', 0.038), ('asa', 0.036), ('adjunctions', 0.036), ('maletti', 0.036), ('nondeleting', 0.036), ('stsgs', 0.036), ('proof', 0.035), ('saw', 0.035), ('jonathan', 0.035), ('bijection', 0.033), ('topdown', 0.033), ('kevin', 0.032), ('derivations', 0.031), ('replaces', 0.03), ('sa', 0.03), ('computed', 0.03), ('sentential', 0.03), ('input', 0.029), ('illustration', 0.029), ('translation', 0.029), ('leaves', 0.028), ('restriction', 0.028), ('computes', 0.028), ('qs', 0.028), ('toolkits', 0.028), ('write', 0.027), ('andreas', 0.027), ('bimorphisms', 0.027), ('bonbons', 0.027), ('dtnpn', 0.027), ('homomorphisms', 0.027), ('nesson', 0.027), ('qh', 0.027), ('qhi', 0.027), ('rouges', 0.027), ('tscqh', 0.027), ('wcw', 0.027), ('let', 0.027), ('les', 0.027), ('yd', 0.027), ('union', 0.026), ('label', 0.026), ('headed', 0.026), ('expressive', 0.025), ('develop', 0.025), ('regular', 0.025), ('sf', 0.024), ('aho', 0.024), ('ferenc', 0.024), ('yehoshua', 0.024), ('finite', 0.024), ('labeled', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
2 0.2005121 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
3 0.19154491 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
Author: Jonathan May ; Kevin Knight ; Heiko Vogler
Abstract: Weighted tree transducers have been proposed as useful formal models for representing syntactic natural language processing applications, but there has been little description of inference algorithms for these automata beyond formal foundations. We give a detailed description of algorithms for application of cascades of weighted tree transducers to weighted tree acceptors, connecting formal theory with actual practice. Additionally, we present novel on-the-fly variants of these algorithms, and compare their performance on a syntax machine translation cascade based on (Yamada and Knight, 2001). 1 Motivation Weighted finite-state transducers have found recent favor as models of natural language (Mohri, 1997). In order to make actual use of systems built with these formalisms we must first calculate the set of possible weighted outputs allowed by the transducer given some input, which we call forward application, or the set of possible weighted inputs given some output, which we call backward application. After application we can do some inference on this result, such as determining its k highest weighted elements. We may also want to divide up our problems into manageable chunks, each represented by a transducer. As noted by Woods (1980), it is easier for designers to write several small transducers where each performs a simple transformation, rather than painstakingly construct a single complicated device. We would like to know, then, the result of transformation of input or output by a cascade of transducers, one operating after the other. As we will see, there are various strategies for approaching this problem. We will consider offline composition, bucket brigade applica- tion, and on-the-fly application. Application of cascades of weighted string transducers (WSTs) has been well-studied (Mohri, Heiko Vogler Technische Universit a¨t Dresden Institut f u¨r Theoretische Informatik 01062 Dresden, Germany he iko .vogle r@ tu-dre s den .de 1997). Less well-studied but of more recent interest is application of cascades of weighted tree transducers (WTTs). We tackle application of WTT cascades in this work, presenting: • • • explicit algorithms for application of WTT casceaxpdelisc novel algorithms for on-the-fly application of nWoTvTe lca alscgoardieths,m mansd f experiments comparing the performance of tehxepseer iamlgeonrtisthm cos.m 2 Strategies for the string case Before we discuss application of WTTs, it is helpful to recall the solution to this problem in the WST domain. We recall previous formal presentations of WSTs (Mohri, 1997) and note informally that they may be represented as directed graphs with designated start and end states and edges labeled with input symbols, output symbols, and weights.1 Fortunately, the solution for WSTs is practically trivial—we achieve application through a series of embedding, composition, and projection operations. Embedding is simply the act of representing a string or regular string language as an identity WST. Composition of WSTs, that is, generating a single WST that captures the transformations of two input WSTs used in sequence, is not at all trivial, but has been well covered in, e.g., (Mohri, 2009), where directly implementable algorithms can be found. Finally, projection is another trivial operation—the domain or range language can be obtained from a WST by ignoring the output or input symbols, respectively, on its arcs, and summing weights on otherwise identical arcs. By embedding an input, composing the result with the given WST, and projecting the result, forward application is accomplished.2 We are then left with a weighted string acceptor (WSA), essentially a weighted, labeled graph, which can be traversed R+1∪W {e+ as∞su}m,te ha thtro thuegh woeuitgh t hi osf p aa ppaetrh t ihsa cta wlceuilgahtetds a asre th ien prod∪uct { +of∞ ∞th}e, wtheaitgh thtes wofe i gtsh etd ogfes a, panatdh t ihsat c athlceu lwateeigdh ats so tfh ae (not necessarily finite) set T of paths is calculated as the sum of the weights of the paths of T. 2For backward applications, the roles of input and output are simply exchanged. 1058 ProceedingUsp opfs thaela 4, 8Stwhe Adnennu,a 1l1- M16ee Jtiunlgy o 2f0 t1h0e. A ?c ss2o0c1ia0ti Aosnso focria Ctioonm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s058–1066, a:a/1a:a/1 b:b/a.:5b/.1aa::ba//. 49Ea:ba/:a.6/.5 (a) InpAut strain:ag/ “a aB” emab:ae/d1dedC in an D(b) first Wa:b:aS/T./4. in cascadeE :c/.6Fa:d/.4 (c//).. second WFST in cascbaad::dde// identityA WSaT: (d)b:Oda/c:f.d3Al./6ai5n.:0c3e/.0ca:o7m/1pao :sBcdi/t.32o584n6a :p/1roa:cCa/h:db.3:/c6d.2/46.35b:a(/be.a)5 :BbA:/u.D1c/keDtbrigBadEDae:ba/p.49proach:ECDa:b/ a.:6b/5.(f)Rdc/Ae./0sD.u35b7aFl4t:c o/f.76ofcBld/iE.nD53e4F6orFbcud/.c0k7e3tapbCpd:cDli/cF.a12t38ion (g)dcI/n.A3i5tD46dcaF/lD.0oF7n3-thea f:lcBdy/E .b(12h:d)8c/O.3A5nD-4dtchF/.e0C -7f3EDlyFas:t /n.B9d-EDinF afterc /x.d3cp/6.l1o28ring C(i)ECDOcFE/.nA5-tD4hdcF/e.0- fl37ysdcta/n.35dB64-iEDnF afteBrc/Eb.3eFs6dtc/ pd./1a2.t83h46 asbeC nEDF found Cobm:bdap:c/:o/d.3/.sa6.5e05D:c tF.h0e7 tranaaaa:sd:c: cdd// /. u12..53c2846ers stacnd//d..A53-i46nDdc f.o.00r73 (f) Appaal:A:ybaD W..19ST (b)BB tEoD WST (a) aftercd p/..0/0..rDo5373Fj46ectionc o.12u28tcdg//o.A.35iDn64dgcF// e0C0dC73gEDeFFs of sBtEaDFteF ADdFc// FiBgBuEDFreF 1: Tc/ .h2.34dcr/6/e. 1e2 8 d/ iA. f53fD64edcF/ r. 0 eCC37nEDtF appBroBEaDFcFhesdc ./t2o.3d4c a..12p28plicatioCCnED tF/ h. A53r6D4ocd/uF/. g0073h cascBBaEdDFeFs ocdf/ .3W264dcS//.1.T282s. bydc w//..53e46ll-known aElFgorithdc/m./2.34s6 to e//.f.5f3i46cieCnEtFly finBdE the kd-c/ bedst/ p6aths. Because WSTs can be freely composed, extending application to operate on a cascade of WSTs is fairly trivial. The only question is one of composition order: whether to initially compose the cascade into a single transducer (an approach we call offline composition) or to compose the initial embedding with the first transducer, trim useless states, compose the result with the second, and so on (an approach we call bucket brigade). The appropriate strategy generally depends on the structure of the individual transducers. A third approach builds the result incrementally, as dictated by some algorithm that requests information about it. Such an approach, which we call on-the-fly, was described in (Pereira and Riley, 1997; Mohri, 2009; Mohri et al., 2000). If we can efficiently calculate the outgoing edges of a state of the result WSA on demand, without calculating all edges in the entire machine, we can maintain a stand-in for the result structure, a machine consisting at first of only the start state of the true result. As a calling algorithm (e.g., an implementation of Dijkstra’s algorithm) requests information about the result graph, such as the set of outgoing edges from a state, we replace the current stand-in with a richer version by adding the result of the request. The on-the-fly approach has a distinct advantage over the other two methods in that the entire result graph need not be built. A graphical representation of all three methods is presented in Figure 1. 3 AppCliEcdcF//a..53ti64on of treeB tFranscd/d./3.u264cers Now let us revisit these strategies in the setting of trees and tree transducers. Imagine we have a tree or set of trees as input that can be represented as a weighted regular tree grammar3 (WRTG) and a WTT that can transform that input with some weight. We would like to know the k-best trees the WTT can produce as output for that input, along with their weights. We already know of several methods for acquiring k-best trees from a WRTG (Huang and Chiang, 2005; Pauls and Klein, 2009), so we then must ask if, analogously to the string case, WTTs preserve recognizability4 and we can form an application WRTG. Before we begin, however, we must define WTTs and WRTGs. 3.1 Preliminaries5 A ranked alphabet is a finite set Σ such that every member σ ∈ Σ has a rank rk(σ) ∈ N. We cerayll ⊆ Σ, ∈k ∈ aNs t ahe r set rokf tσho)s ∈e σ ∈ Σe such that r⊆k(σ Σ), k= ∈k. NTh teh ese ste otf o vfa trhioasbele σs σi s∈ d eΣnoted X = {x1, x2, . . .} and is assumed to be disjnooitnetd df Xrom = any rank,e.d. a.}lp ahnadb iest aussseudm iend dth tios paper. We use to denote a symbol of rank 0 that is not iWn any e ra ⊥nk toed d eanlpohtaeb aet s yumsebdo lin o fth riasn paper. tA is tr neoet t ∈ TΣ is denoted σ(t1 , . . . , tk) where k ≥ 0, σ ∈ and t1, . . . , tk ∈ TΣ. F)o wr σ ∈ we mΣe(km) ⊥ Σ T(k), σ ∈ Σk(0 ≥) Σ 3This generates the same class of weighted tree languages as weighted tree automata, the direct analogue of WSAs, and is more useful for our purposes. 4A weighted tree language is recognizable iff it can be represented by a wrtg. 5The following formal definitions and notations are needed for understanding and reimplementation of the presented algorithms, but can be safely skipped on first reading and consulted when encountering an unfamiliar term. 1059 write σ ∈ TΣ as shorthand for σ() . For every set Sw rditiesjσo in ∈t f Trom Σ, let TΣ (S) = TΣ∪S, where, for all s ∈ S, rk(s) = 0. lW se ∈ d,e rfkin(es) th 0e. positions of a tree t = σ(t1, . . . , tk), for k 0, σ ∈ t1, . . . , tk ∈ TΣ, as a set pos(≥t) ⊂ N∗ s∈uch that {∈ε} T ∪ 1e t≤ p ois (≤t) k ⊂, ⊂v ∈ pTohse( tse)t =of {lεea}f ∪ po {siivtio |ns 1 l ≤v(t i) ≤⊆ k p,ovs(t ∈) apores t(hto)s}e. pTohseit sieotns o fv l a∈f p poossit(ito)n ssu lvch(t )th ⊆at pfoors tn)o ir ∈ th Nse, pvio ∈it ponoss(t v). ∈ We p presume hsta tnhadatr dfo lrex nioco igr ∈aph Nic, ovrid ∈eri pnogss( <∈ nTΣd ≤an odn v p o∈s pos(t). The label of t at Lpoestit ti,osn v, Tdenaontedd v v by ∈ t( pvo)s,( tt)he. sTuhbetr leaeb eolf ot fa tt v, denoted by t|v, and the replacement at v by s, vde,n doetneodt e bdy tb[ys] tv|, are defined as follows: ≥ pos(t) = ,{ aivs a| Σ(k), pos(ti)}. 1. For every σ ∈ Σ(0) , σ(ε) = σ, σ|ε = σ, and σF[osr]ε e v=e sy. 2. For every t = σ(t1 , . . . , tk) such that k = rk(σ) and k 1, t(ε) = σ, t|ε = t, aknd = t[ rsk]ε( =) ns.d kFo ≥r every 1) ≤= iσ ≤ t| k and v ∈ pos(ti), t(ivF) =r vtie (rvy) ,1 1t| ≤iv =i ≤ti |v k, aanndd tv[s] ∈iv p=o sσ(t(t1 , . . . , ti−1 , ti[(sv])v, , tt|i+1 , . . . , t|k). The size of a tree t, size (t) is |pos(t) |, the cardinTahliety s iozef i otsf apo tsrieteio tn, sseizt.e (Tt)he is s y |ipelods (ste)t| ,o tfh ae tcraereis the set of labels of its leaves: for a tree t, yd (t) = {t(v) | v ∈ lv(t)}. {Lt(etv )A | avn ∈d lBv( tb)e} sets. Let ϕ : A → TΣ (B) be Lae mt Aapp ainndg. B W bee seexttes.nd L ϕ t oϕ th :e A Am →appi Tng ϕ : TΣ (A) → TΣ (B) such that for a ∈ A, ϕ(a) = ϕ(a) and( Afo)r →k 0, σ ∈ Σch(k th) , atn fdo t1, . . . , tk ∈ TΣ (A), ϕan(dσ( fto1r, . . . ,t0k,) σ) =∈ σ Σ(ϕ(t1), . . . ,ϕ(tk)). ∈ W Te indicate such extensions by describing ϕ as a substitution mapping and then using ϕ without further comment. We use R+ to denote the set {w ∈ R | w 0} and R+∞ to dentoote d Ren+o ∪e {th+e∞ set}. { ≥ ≥ ≥ Definition 3.1 (cf. (Alexandrakis and Bozapalidis, 1987)) A weighted regular tree grammar (WRTG) is a 4-tuple G = (N, Σ, P, n0) where: 1. N is a finite set of nonterminals, with n0 ∈ N the start nonterminal. 2. Σ is a ranked alphabet of input symbols, where Σ ∩ N = ∅. 3. PΣ ∩is Na =tup ∅le. (P0, π), where P0 is a finite set of productions, each production p of the form n → u, n ∈ N, u ∈ TΣ(N), and π : P0 → R+ ins a→ →w uei,g nht ∈ ∈fu Nnc,ti uo n∈ o Tf the productions. W→e w Rill refer to P as a finite set of weighted productions, each production p of the form n −π −(p →) u. A production p is a chain production if it is of the form ni nj, where ni, nj ∈ N.6 − →w 6In (Alexandrakis and Bozapalidis, 1987), chain productions are forbidden in order to avoid infinite summations. We explicitly allow such summations. A WRTG G is in normal form if each production is either a chain production or is of the form n σ(n1, . . . , nk) where σ ∈ Σ(k) and n1, . . . , nk →∈ σ N(n. For WRTG∈ G N =. (N, Σ, P, n0), s, t, u ∈ TΣ(N), n ∈ N, and p ∈ P of the form n −→ ∈w T u, we nobt ∈ain N Na ,d aenridva ptio ∈n s Ptep o ffr tohme fso rtom mt n by− →repl ua,ci wneg some leaf nonterminal in s labeled n with u. For- − →w mally, s ⇒pG t if there exists some v ∈ lv(s) smuaclhly t,ha st s⇒(v) =t i fn t haenrde s e[xui]svt = so tm. e W ve say t(hsis) derivation step is leftmost if, for all v0 ∈ lv(s) where v0 < v, s(v0) ∈ Σ. We hencef∈orth lv a(ss-) sume all derivation )ste ∈ps a.re leftmost. If, for some m ∈ N, pi ∈ P, and ti ∈ TΣ (N) for all s1o m≤e i m m≤ ∈ m N, n0 ⇒∈ pP1 t a1n ⇒∈pm T tm, we say t1he ≤ sequence ,d n = (p1, . . . ,p.m ⇒) is a derivation of tm in G and that n0 ⇒∗ tm; the weight of d is wt(d) = π(p1) · . . . ⇒· π(pm). The weighted tree language rec)og ·n .i.z.ed · π by(p G is the mapping LG : TΣ → R+∞ such that for every t ∈ TΣ, LG(t) is the sum→ →of R the swuecihgth htsa to ffo arl el v(eproyss ti b∈ly T infinitely many) derivations of t in G. A weighted tree language f : TΣ → R+∞ is recognizable if there is a WRTG G such t→hat R f = LG. We define a partial ordering ? on WRTGs sucWh eth date finore W aR TpGarst aGl1 r=d r(iNng1 , Σ?, P o1n , n0) and G2 (N2, Σ, P2, n0), we say G1 ? G2 iff N1 ⊆ N2 and P1 ⊆ P2, where the w?eigh Gts are pres⊆erve Nd. ... = Definition 3.2 (cf. Def. 1of (Maletti, 2008)) A weighted extended top-down tree transducer (WXTT) is a 5-tuple M = (Q, Σ, ∆, R, q0) where: 1. Q is a finite set of states. 2. Σ and ∆ are the ranked alphabets of input and output symbols, respectively, where (Σ ∪ ∆) ∩ Q = 3. (RΣ i ∪s a∆ )tu ∩ple Q ( =R 0∅, .π), where R0 is a finite set of rules, each rule r of the form q.y → u for q ∈ ru lQes, y c∈h T ruΣle(X r), o fa tnhde u fo r∈m T q∆.y(Q − → →× u uX fo)r. Wqe ∈ ∈fu Qrt,hye r ∈req Tuire(X Xth)a,t annod v uari ∈abl Te x( Q∈ ×X appears rmthoerre rtehqauni roen tchea itn n y, aanrida bthleat x xe ∈ach X Xva arpi-able appearing in u is also in y. Moreover, π : R0 → R+∞ is a weight function of the rules. As →for RWRTGs, we refer to R as a finite set of weighted rules, each rule r of the form ∅. q.y −π −(r →) u. A WXTT is linear (respectively, nondeleting) if, for each rule r of the form q.y u, each x ∈ yd (y) ∩ X appears at most on− →ce ( ur,es epaecchxtive ∈ly, dat( lye)a ∩st Xonc aep) iena us. tW meo dsten oontcee th (ree scpleascsof all WXTTs as wxT and add the letters L and N to signify the subclasses of linear and nondeleting WTT, respectively. Additionally, if y is of the form σ(x1 , . . . , xk), we remove the letter “x” to signify − →w 1060 × ×× × the transducer is not extended (i.e., it is a “traditional” WTT (F¨ ul¨ op and Vogler, 2009)). For WXTT M = (Q, Σ, ∆, R, q0), s, t ∈ T∆(Q TΣ), and r ∈ R of the form q.y −w →), u, we obtain a× d Ter)iv,a atniodn r s ∈te pR ofrfom the s f trom mt b.yy r→epl ua,c wineg sbotamine leaf of s labeled with q and a tree matching y by a transformation of u, where each instance of a variable has been replaced by a corresponding subtree of the y-matching tree. Formally, s ⇒rM t if there oisf tah peo ysi-tmioantc vh n∈g tp roese(.s F)o, am saulblys,ti stu ⇒tion mapping ϕ : X → TΣ, and a rule q.y −u→w bs u ∈ R such that ϕs(v :) X X= → →(q, T ϕ(y)) and t = s[ϕ− →0(u u)] ∈v, wRh seurech hϕ t0h aist a substitution mapping Q X → T∆ (Q TΣ) dae sfiunbesdti usuticohn t mhaatp ϕpin0(qg0, Q Qx) × = X ( →q0, Tϕ(x()Q) f×or T all q0 ∈ Q and x ∈ X. We say this derivation step is l∈eft Qmo asnt dif, x f o∈r Xall. v W0 e∈ s lyv( tsh)i w deherirvea tvio0 n< s v, s(v0) ∈ ∆. We hencefor∈th lavs(sus)m we haellr ede vrivation steps) a ∈re ∆le.ftm Woes ht.e nIcf,e ffoorr sho amsesu sm ∈e aTllΣ d, emriv a∈t oNn, ri p∈s R ar, ea lnedf ttmi o∈s tT.∆ I f(,Q f ×r sToΣm) efo sr ∈all T T1 ≤, m mi ≤ ∈ m, (q0∈, s R) ,⇒ anrd1 tt1 . . . ⇒(rQm ×tm T, w)e f say lth 1e sequence d =, ()r1 ⇒ , . . . , rm..) .i s⇒ ⇒a derivation of (s, tm) in M; the weight of d is wt(d) = π(r1) · . . . · π(rm). The weighted tree transformation )r ·ec .o..gn ·i πze(rd by M is the mapping τM : TΣ T∆ → R+∞, such that for every s ∈ TΣ and t ∈× T T∆, τM→(s R, t) is the × µ× foofrth eve ewryeig sh ∈ts Tof aalln (dpo ts ∈sib Tly infinitely many) derivations of (s, t) in M. The composition of two weighted tree transformations τ : TΣ T∆ → R+∞ and : T∆ TΓ → R+∞ is the weight×edT tree→ →tra Rnsformation (τ×; Tµ) :→ →TΣ R TΓ → R+∞ wPhere for every s ∈ TΣ and u ∈ TΓ, (τ×; Tµ) (→s, uR) = Pt∈T∆ τ(s, t) · µ(t,u). 3.2 Applicable classes We now consider transducer classes where recognizability is preserved under application. Table 1 presents known results for the top-down tree transducer classes described in Section 3. 1. Unlike the string case, preservation of recognizability is not universal or symmetric. This is important for us, because we can only construct an application WRTG, i.e., a WRTG representing the result of application, if we can ensure that the language generated by application is in fact recognizable. Of the types under consideration, only wxLNT and wLNT preserve forward recognizability. The two classes marked as open questions and the other classes, which are superclasses of wNT, do not or are presumed not to. All subclasses of wxLT preserve backward recognizability.7 We do not consider cases where recognizability is not preserved tshuamt in the remainder of this paper. If a transducer M of a class that preserves forward recognizability is applied to a WRTG G, we can call the forward ap7Note that the introduction of weights limits recognizability preservation considerably. For example, (unweighted) xT preserves backward recognizability. plication WRTG M(G). and if M preserves backward recognizability, we can call the backward application WRTG M(G)/. Now that we have explained the application problem in the context of weighted tree transducers and determined the classes for which application is possible, let us consider how to build forward and backward application WRTGs. Our basic approach mimics that taken for WSTs by using an embed-compose-project strategy. As in string world, if we can embed the input in a transducer, compose with the given transducer, and project the result, we can obtain the application WRTG. Embedding a WRTG in a wLNT is a trivial operation—if the WRTG is in normal form and chain production-free,8 for every production of the form n − →w σ(n1 , . . . , nk), create a rule ofthe form n.σ(x1 , . . . , xk) − →w σ(n1 .x1, . . . , nk.xk). Range × projection of a w− x→LN σT(n is also trivial—for every q ∈ Q and u ∈ T∆ (Q X) create a production of the form q ∈−→w T u(0 where )u 0c is formed from u by replacing al−l → →lea uves of the form q.x with the leaf q, i.e., removing references to variables, and w is the sum of the weights of all rules of the form q.y → u in R.9 Domain projection for wxLT is bq.eyst →exp ulai inne dR b.y way of example. The left side of a rule is preserved, with variables leaves replaced by their associated states from the right side. So, the rule q1.σ(γ(x1) , x2) − →w δ(q2.x2, β(α, q3.x1)) would yield the production q1 q− →w σ(γ(q3) , q2) in the domain projection. Howev− →er, aσ dγe(lqeting rule such as q1.σ(x1 , x2) − →w γ(q2.x2) necessitates the introduction of a new →non γte(rqminal ⊥ that can genienrtartoed aullc toiof nT Σo fw ai nthe wwe niognhtte r1m . The only missing piece in our embed-composeproject strategy is composition. Algorithm 1, which is based on the declarative construction of Maletti (2006), generates the syntactic composition of a wxLT and a wLNT, a generalization of the basic composition construction of Baker (1979). It calls Algorithm 2, which determines the sequences of rules in the second transducer that match the right side of a single rule in the × first transducer. Since the embedded WRTG is of type wLNT, it may be either the first or second argument provided to Algorithm 1, depending on whether the application is forward or backward. We can thus use the embed-compose-project strategy for forward application of wLNT and backward application of wxLT and wxLNT. Note that we cannot use this strategy for forward applica8Without loss of generality we assume this is so, since standard algorithms exist to remove chain productions (Kuich, 1998; E´sik and Kuich, 2003; Mohri, 2009) and convert into normal form (Alexandrakis and Bozapalidis, 1987). 9Finitely many such productions may be formed. 1061 tion of wxLNT, even though that class preserves recognizability. Algorithm 1COMPOSE 1: inputs 2: wxLT M1 = (Q1, Σ, ∆, R1, q10 ) 3: wLNT M2 = (Q2, ∆, Γ, R2, q20 ) 4: outputs 5: wxLT M3 = ((Q1 Q2), Σ, Γ, R3, (q10 , q20 )) such that M3 = (τM1 ; τM2 Q). 6: complexity 7: O(|R1 | max(|R2|size( ˜u), |Q2|)), where ˜u is the × lOar(g|eRst |rimgahtx s(|idRe t|ree in a,n|yQ ru|l))e in R1 8: Let R3be of the form (R30,π) 9: R3 ← (∅, ∅) 10: Ξ ←← ←{ ((q∅,10∅ , q20 )} {seen states} 11 10 : ΨΞ ←← {{((qq10 , q20 ))}} {{speeennd sintagt essta}tes} 1112:: Ψwh ←ile {Ψ( ∅ do) 1123:: (ilqe1 , Ψq26 =) ← ∅ daony element of 14: ← Ψ) \← {a(nqy1 , ql2em)}e 15: for all (q1.y q− −w →1 u) ∈ R1 do 16: for all (z, −w − →2) u∈) )C ∈O RVER(u, M2, q2) do 17: for all (q, x) )∈ ∈∈ C yOdV V(Ez)R ∩(u u(,(QM1 Q2) X) do 18: i fa lql (∈q ,Ξx )th ∈en y 19: qΞ6 ∈ ← Ξ tΞh e∪n {q} 20: ΞΨ ←← ΞΨ ∪∪ {{qq}} 21: r ← ((Ψq1 ← , q 2Ψ) .y {→q }z) 22: rR30 ← ←← (( qR03 ∪ {).ry} 23: π(r)← ←← R π(∪r) { +r} (w1 · w2) 24: return M3 = Ψ 4 Ψ Application of tree transducer cascades What about the case of an input WRTG and a cascade of tree transducers? We will revisit the three strategies for accomplishing application discussed above for the string case. In order for offline composition to be a viable strategy, the transducers in the cascade must be closed under composition. Unfortunately, of the classes that preserve recognizability, only wLNT × is closed under composition (G´ ecseg and Steinby, 1984; Baker, 1979; Maletti et al., 2009; F ¨ul ¨op and Vogler, 2009). However, the general lack of composability of tree transducers does not preclude us from conducting forward application of a cascade. We revisit the bucket brigade approach, which in Section 2 appeared to be little more than a choice of composition order. As discussed previously, application of a single transducer involves an embedding, a composition, and a projection. The embedded WRTG is in the class wLNT, and the projection forms another WRTG. As long as every transducer in the cascade can be composed with a wLNT to its left or right, depending on the application type, application of a cascade is possible. Note that this embed-compose-project process is somewhat more burdensome than in the string case. For strings, application is obtained by a single embedding, a series of compositions, and a single projecAlgorithm 2 COVER 1: inputs 2: u ∈ T∆ (Q1 X) 3: wuT ∈ M T2 = (Q×2, X X∆), Γ, R2, q20 ) 4: state q2 ∈ Q2 ×× × 5: outputs 6: set of pairs (z, w) with z ∈ TΓ ((Q1 Q2) X) fsoetrm ofed p ab yir so (nze, ,o wr m) worieth hsu zcc ∈es Tsful runs× ×on Q Qu )b y × ×ru Xles) in R2, starting from q2, and w ∈ R+∞ the sum of the weights of all such runs,. 7: complexity 8: O(|R2|size(u)) 9: 10: 11: 12: 13: 14: if u(ε) is of the form (q1,x) ∈ Q1× X then zinit ← ((q1 q2), x) else zinit ← ⊥ Πlast ←← ←{(z ⊥init, {((ε, ε), q2)}, 1)} for all← v ∈ pos(u) εsu,εch), tqha)t} u(v) ∈ ∆(k) for some fko ≥r 0ll li nv p ∈ref ipxo osr(ude)r sduoc 15: ≥Π v0 i←n p ∅r 16: for ←all ∅(z, θ, w) ∈ Πlast do 17: rf aorll a(zll, vθ0, ∈w )lv ∈(z Π) such that z(v0) = ⊥ do 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: , dfoor all(θ(v,v0).u(v)(x1,...,xk) −w →0h)∈R2 θ0 ← θ For←m sθubstitution mapping ϕ : (Q2 X) → TΓ((Q1 Q2 X) {⊥}). f→or Ti = 1to× ×k dQo for all v00 ∈ pos(h) such that h(v00) = (q02 , xi∈) for some q20 ∈ Q2 do θ0(vi, v0v00) ← q20 if u(vi) ←is q of the form (q1, x) ∈ Q1 X then ∪ ,ϕ(x)q20 ∈, x Qi) ←× X((q t1h, eqn20), x) else ϕ(q20, xi) ← ⊥ Πv ← Πv {(z[ϕ)( ←h)] ⊥v0 , θ0, w · w0)} ∪ 29: Πlast ← Πv 30: Z ← {z |← ←(z Π, θ, Xw) 31: return {(z, X ∈ (z ,θ ,wX) X∈Πl Πlast} w) | z ∈ Z} ast X tion, whereas application for trees is obtained by a series of (embed, compose, project) operations. 4.1 On-the-fly algorithms We next consider on-the-fly algorithms for application. Similar to the string case, an on-thefly approach is driven by a calling algorithm that periodically needs to know the productions in a WRTG with a common left side nonterminal. The embed-compose-project approach produces an entire application WRTG before any inference algorithm is run. In order to admit an on-the-fly approach we describe algorithms that only generate those productions in a WRTG that have a given left nonterminal. In this section we extend Definition 3. 1 as follows: a WRTG is a 6tuple G = (N, P, n0,M,G) where N, P, and n0 are defined as in Definition 3. 1, and either M = G = ∅,10 or M is a wxLNT and G is a normMal = =fo Grm =, c ∅h,ain production-free WRTG such that Σ, 10In which case Σ, the definition is functionally unchanged from before. 1062 w t[xLypN] LeT (a)pPresOYNer Qovsateiodn?f(Grwe´ca(Fsd¨uMrg(lKeS¨coa punelgsid cowtihuSza,[rtxlbce1.2i]9,l0Nn2tybT091), 984w [txy](pbL Ne)T PrespvatiosYnNero svfebda?ckw(rFdu¨MlSeo¨c apesgl onwtiuza[r,xlcb.2]ei,NlL0t2yT 91)0 Table 1: Preservation of forward and backward recognizability for various classes of top-down tree transducers. Here and elsewhere, the following abbreviations apply: w = weighted, x = extended LHS, L = linear, N = nondeleting, OQ = open question. Square brackets include a superposition of classes. For example, w[x]T signifies both wxT and wT. Algorithm 3 PRODUCE 1: inputs 2: WRTG Gin = (Nin, ∆, Pin, n0, M, G) such that M = (Q, Σ, ∆, R, q0) is a wxLNT and G = (N, Σ, P, n00, M0, G0) is a WRTG in normal form with no chain productions 3: nin ∈ Nin 4: outputs∈ 5: WRTG Gout = (Nout, ∆, Pout, n0, M, G), such that Gin ? Gout and (nin ?−→w G u) ∈ Pout ⇔ (nin − →w u) ∈ M(G). 6: complex−i →ty 7: O(|R| ), where ˜y is the largest left side tree iOn (a|Rny| | rPul|e in R |P|size( y˜) 8: if Pincontains productions of the form nin− →w u then 9: return Gin 10: Nout ← Nin 11: Pout ←← P Nin 12: Let ni←n b Pe of the form (n, q), where n ∈ N and q ∈ Q. × × 13: for all (q.y −f −wt → 1he u) ∈ R do 14: for all (θ, w2) ∈ ∈RE RPL doACE(y,G, n) do 15: Form subs)ti ∈tu RtiEonP LmAaCppEi(nyg, Gϕ, n: Qo X → T∆ (N Q) such that, for all v ∈ ydQ Q(y) × ×and X Xq0 → →∈ Q, (ifN Nth ×ereQ e)x sisutc nh0 h∈a tN, f aonrd a lxl v∈ ∈X y sdu(cyh) th anatd θ q(v∈) = n0 and y(v) = x, t∈he Nn aϕn(dq0 x , x ∈) X= ( snu0c,h hq t0)ha. 16: p0 ((n, q) −w −1 −· −w →2 ϕ(u)) 17: for← ←all ( p ∈, qN)O− −R −M − →(p0 ϕ, N(uo)u)t) do ← 18: Let p b(ke) o.f the form n0− →w δ(n1,...,nk) for 19: δN ∈out ∆ ← Nout ∪ {n0 , . . . , nk } 20: Pout ←← P Nout ∪∪ { {pn} 21: return CHAIN-REM(Gout) M(G).. In the latter case, G is a stand-in for MG ?(G M).,( analogous to the stand-ins for WSAs and G ? WSTs described in Section 2. Algorithm 3, PRODUCE, takes as input a WRTG Gin = (Nin, ∆, Pin, n0, and a desired nonterminal nin and returns another WRTG, Gout that is different from Gin in that it has more productions, specifically those beginning with nin that are in Algorithms using stand-ins should call PRODUCE to ensure the stand-in they are using has the desired productions beginning with the specific nonterminal. Note, then, that M, G) M(G).. PRODUCE obtains the effect of forward applica- Algorithm 4 REPLACE 1: 2: 3: 4: 5: 6: 7: 8: inputs y ∈ TΣ(X) WRTG G = (N, Σ, P, n0, M, G) in normal form, with no chain productions n∈ N outnpu ∈ts N set Π of pairs (θ, w) where θ is a mapping pos(y) → N and w ∈ R+∞ , each pair indicating pa ossu(cyc)ess →ful Nrun a nodn wy b ∈y p Rroductions in G, starting from n, and w is the weight of the run. complexity O(|P|size(y)) 9: Πlast← {({(ε,n)},1)} 10: for all← ←v {∈( { po(εs,(ny)) s,u1c)h} that y(v) ∈ X in prefix order fdoor 11: Πv ← ∅ 12: for ←all ∅(θ, w) ∈ Πlast do 13: ri fa Mll ( w∅) )a ∈nd Π G ∅ then 14: MG ←= ∅PR anOdD GUC6 =E ∅(G th, eθn(v)) = = −w →0 15: for all (θ(v) y(v) (n1, . . . , nk)) ∈ P do 16: Πv ← Πv∪− →{(θ y∪({v ()(vni, ni) , 1≤ )i) ≤ ∈ k P}, d dwo·w0) } 17: Πlast ← Π←v 18: return Πlast Algorithm 5 MAKE-EXPLICIT 1: inputs 2: WRTG G = (N, Σ, P, n0, M, G) in normal form 3: outputs 4: WRTG G0 = (N0, Σ, P0, n0, M, G), in normal form, such that if M ∅ and G ∅, LG0 = LM(G)., and otherwise Gf M0 = G. = = 56:: comOp(|lePx0it|y) 7: G0← 8: Ξ ←← { nG0} {seen nonterminals} 89:: ΞΨ ←← {{nn0}} {{speeenndi nnogn tneornmteinramlsi}nals} 190:: wΨh ←ile {Ψn =} ∅{ pdeon 11 10 : inl e← Ψa6n =y ∅el deoment of 12: nΨ ←←a nΨy \ e l{emn}e 13: iΨf M ← ∅\ a{nnd} G ∅ then 14: MG0 =← ∅ P aRnOdD GU 6=CE ∅(G the0,n nn) 15: for all (n P−→w RO σ(n1 , . . . , nk)) ∈ P0 do 16: for i= 1→ →to σ (kn ndo 17: if ni ∈ Ξ then 18: Ξ ←∈ Ξ ΞΞ t h∪e {nni} 19: ΞΨ ←← ΞΨ ∪∪ {{nni}} 20: return G0 Ψ = = 1063 g0 g0 −w − →1 −−w →→2 g0 − − → σ(g0, g1) α w − →3 g1 − − → α (a) Input WRTG G G a0 a0.σ(x1, x2) −w − →4 − w − → →5 σ(a0.x1, a1.x2) a0.σ(x1, x2) ψ(a2.x1, a1.x2) a0 .α − − → α a 1.α − − → α a2 .α w − →6 (w −a → →7 w − →8 −−→ ρ (b) First transducer MA in the cascade b0 b0.σ(x1, x2) b0.α −w −1 →0 α −w − →9 σ(b0.x1, b0.x2) (c) Second transducer MB in the cascade g0a0 g0a0 −w −1 −· −w →4 σ(g0a0, g1a1) −−w −− 1− − ·w − − → →5 ψ(g0a2, g1a1) − − −·w − → α g1a1 − − −·w − → α w −− 2 −− − · w−− → →6 g0a0 w − 3 − −· w− → →7 (d) Productions of MA (G). built as a consequence of building the complete MB(MA(G).). g0a0b0 g0a0b0 −w −1 −· −w4 −·w − →9 σ(g0a0b0, g1a1b0) g0a0b0 −−w − − −2 −· −w −6 − −·−w − → −1 →0 σ α g1a1b0 −w − −3· w−7 −· −w −1 →0 α (e) Complete MB (MA (G).). Figure 2: Forward application through a cascade of tree transducers using an on-the-fly method. tion in an on-the-fly manner.11 It makes calls to REPLACE, which is presented in Algorithm 4, as well as to a NORM algorithm that ensures normal form by replacing a single production not in normal form with several normal-form productions that can be combined together (Alexandrakis and Bozapalidis, 1987) and a CHAIN-REM algorithm that replaces a WRTG containing chain productions with an equivalent WRTG that does not (Mohri, 2009). As an example of stand-in construction, consider the invocation PRODUCE(G1, g0a0), where iGs1 in= F (i{g u0rae0 2}a, 1 {2σa,nψd,α M,ρA},is ∅ i,n g 20ab0., T MheA s,ta Gn)d,-i Gn WRTG that is output contains the first three of the four productions in Figure 2d. To demonstrate the use of on-the-fly application in a cascade, we next show the effect of PRODUCE when used with the cascade G ◦ MA ◦ MB, wDhUeCreE MwhBe i uss eind wFitighu three c2acs. Oe uGr dMrivin◦gM algorithm in this case is Algorithm 5, MAKE11Note further that it allows forward application of class wxLNT, something the embed-compose-project approach did not allow. 12By convention the initial nonterminal and state are listed first in graphical depictions of WRTGs and WXTTs. rJJ.JJ(x1, x2, x3) → JJ(rDT.x1, rJJ.x2, rVB.x3) rVB.VB(x1, x2, )x− 3→) → JJ VrB(rNNPS.x1, rNN.x3, rVB.x2) t.”gentle” − → ”gentle”(a) Rotation rules iVB.NN(x1, x2) iVB.NN(x1, x2)) iVB.NN(x1, x2)) → →→ →→ NN(INS iNN.x1, iNN.x2) NNNN((iINNNS.x i1, iNN.x2) NNNN((iiNN.x1, iNN.x2, INS) (b) Insertion rules t.VB(x1 , x2, x3) → X(t.x1 , t.x2, t.x3) t.”gentleman” →) → j →1 t . ””ggeennttl eemmaann”” →→ jE1PS t . ”INgSen →tle m j 1a t . I NNSS →→ j 21 (c) Translation rules Figure 3: Example rules from transducers used in decoding experiment. j 1 and j2 are Japanese words. EXPLICIT, which simply generates the full application WRTG using calls to PRODUCE. The input to MAKE-EXPLICIT is G2 = ({g0a0b0}, {σ, α}, ∅, g0a0b0, MB, G1).13 MAKE=-E ({XgPLICI}T, c{aσl,lsα }P,R ∅O, gDUCE(G2, g0a0b0). PRODUCE then seeks to cover b0.σ(x1, x2) σ(b0.x1, b0.x2) with productions from G1, wh−i →ch i σs (ab stand-in for −w →9 MA(G).. At line 14 of REPLACE, G1 is improved so that it has the appropriate productions. The productions of MA(G). that must be built to form the complete MB (MA(G).). are shown in Figure 2d. The complete MB (MA(G).). is shown in Figure 2e. Note that because we used this on-the-fly approach, we were able to avoid building all the productions in MA(G).; in particular we did not build g0a2 − −w2 −· −w →8 ρ, while a bucket brigade approach would −h −a −v −e → →bui ρlt, ,t whish production. We have also designed an analogous onthe-fly PRODUCE algorithm for backward application on linear WTT. We have now defined several on-the-fly and bucket brigade algorithms, and also discussed the possibility of embed-compose-project and offline composition strategies to application of cascades of tree transducers. Tables 2a and 2b summarize the available methods of forward and backward application of cascades for recognizabilitypreserving tree transducer classes. 5 Decoding Experiments The main purpose of this paper has been to present novel algorithms for performing applica- tion. However, it is important to demonstrate these algorithms on real data. We thus demonstrate bucket-brigade and on-the-fly backward application on a typical NLP task cast as a cascade of wLNT. We adapt the Japanese-to-English transla13Note that G2 is the initial stand-in for MB (MA (G).)., since G1 is the initial stand-in for MA (G).. 1064 obomcbtfethodW√ √STwx√L× NTwL√ √NTo mbctbfethodW√ √STw×x√ LTw√ ×LTwxL√ ×NTwL√ √NT (a) Forward application (b) Backward application Table 2: Transducer types and available methods of forward and backward application of a cascade. oc = offline composition, bb = bucket brigade, otf = on the fly. tion model of Yamada and Knight (2001) by transforming it from an English-tree-to-Japanese-string model to an English-tree-to-Japanese-tree model. The Japanese trees are unlabeled, meaning they have syntactic structure but all nodes are labeled “X”. We then cast this modified model as a cascade of LNT tree transducers. Space does not permit a detailed description, but some example rules are in Figure 3. The rotation transducer R, a samparlee ionf Fwighuicreh 3is. Tinh Fei rgoutareti o3na, t rhaanss d6u,4c5e3r R ru,l eas s, tmheinsertion transducer I,Figure 3b, has 8,122 rules, iannsde rtthieon ntr trananssladtuiocne rtr Ia,n Fsidguucreer, 3 bT, , Fasig 8u,r1e2 32c r,u lheass, 3a7nd,31 th h1e ertu rlaenss. We add an English syntax language model L to theW ceas acdadde a no Ef ntrgalinsshd uscyentras x ju lastn gdueascgrei mbeodd etol L be ttoter simulate an actual machine translation decoding task. The language model is cast as an identity WTT and thus fits naturally into the experimental framework. In our experiments we try several different language models to demonstrate varying performance of the application algorithms. The most realistic language model is a PCFG. Each rule captures the probability of a particular sequence of child labels given a parent label. This model has 7,765 rules. To demonstrate more extreme cases of the usefulness of the on-the-fly approach, we build a language model that recognizes exactly the 2,087 trees in the training corpus, each with equal weight. It has 39,455 rules. Finally, to be ultraspecific, we include a form of the “specific” language model just described, but only allow the English counterpart of the particular Japanese sentence being decoded in the language. The goal in our experiments is to apply a single tree t backward through the cascade L◦R◦I◦T ◦t tarnede tfi bndac kthwe 1rd-b tehsrto pugathh hine tchaes caapdpeli Lca◦tiRon◦ IW◦RTTG ◦t. We evaluate the speed of each approach: bucket brigade and on-the-fly. The algorithm we use to obtain the 1-best path is a modification of the kbest algorithm of Pauls and Klein (2009). Our algorithm finds the 1-best path in a WRTG and admits an on-the-fly approach. The results of the experiments are shown in Table 3. As can be seen, on-the-fly application is generally faster than the bucket brigade, about double the speed per sentence in the traditional L1eMp-xcsafe tgcyn tpemb u eo ct hkfoe tdime>/.21s 0.e78465nms tenc Table 3: Timing results to obtain 1-best from application through a weighted tree transducer cascade, using on-the-fly vs. bucket brigade backward application techniques. pcfg = model recognizes any tree licensed by a pcfg built from observed data, exact = model recognizes each of 2,000+ trees with equal weight, 1-sent = model recognizes exactly one tree. experiment that uses an English PCFG language model. The results for the other two language models demonstrate more keenly the potential advantage that an on-the-fly approach provides—the simultaneous incorporation of information from all models allows application to be done more effectively than if each information source is considered in sequence. In the “exact” case, where a very large language model that simply recognizes each of the 2,087 trees in the training corpus is used, the final application is so large that it overwhelms the resources of a 4gb MacBook Pro, while the on-the-fly approach does not suffer from this problem. The “1-sent” case is presented to demonstrate the ripple effect caused by using on-the fly. In the other two cases, a very large language model generally overwhelms the timing statistics, regardless of the method being used. But a language model that represents exactly one sentence is very small, and thus the effects of simultaneous inference are readily apparent—the time to retrieve the 1-best sentence is reduced by two orders of magnitude in this experiment. 6 Conclusion We have presented algorithms for forward and backward application of weighted tree transducer cascades, including on-the-fly variants, and demonstrated the benefit of an on-the-fly approach to application. We note that a more formal approach to application of WTTs is being developed, 1065 independent from these efforts, by F ¨ul ¨op (2010). et al. Acknowledgments We are grateful for extensive discussions with Andreas Maletti. We also appreciate the insights and advice of David Chiang, Steve DeNeefe, and others at ISI in the preparation of this work. Jonathan May and Kevin Knight were supported by NSF grants IIS-0428020 and IIS0904684. Heiko Vogler was supported by DFG VO 1011/5-1. References Athanasios Alexandrakis and Symeon Bozapalidis. 1987. Weighted grammars and Kleene’s theorem. Information Processing Letters, 24(1): 1–4. Brenda S. Baker. 1979. Composition of top-down and bottom-up tree transductions. Information and Control, 41(2): 186–213. Zolt a´n E´sik and Werner Kuich. 2003. Formal tree series. Journal of Automata, Languages and Combinatorics, 8(2):219–285. Zolt a´n F ¨ul ¨op and Heiko Vogler. 2009. Weighted tree automata and tree transducers. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata, chapter 9, pages 3 13–404. Springer-Verlag. Zolt a´n F ¨ul ¨op, Andreas Maletti, and Heiko Vogler. 2010. Backward and forward application of weighted extended tree transducers. Unpublished manuscript. Ferenc G ´ecseg and Magnus Steinby. 1984. Tree Automata. Akad e´miai Kiad o´, Budapest. Liang Huang and David Chiang. 2005. Better k-best parsing. In Harry Bunt, Robert Malouf, and Alon Lavie, editors, Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pages 53–64, Vancouver, October. Association for Computational Linguistics. Werner Kuich. 1998. Formal power series over trees. In Symeon Bozapalidis, editor, Proceedings of the 3rd International Conference on Developments in Language Theory (DLT), pages 61–101, Thessaloniki, Greece. Aristotle University of Thessaloniki. Werner Kuich. 1999. Tree transducers and formal tree series. Acta Cybernetica, 14: 135–149. Andreas Maletti, Jonathan Graehl, Mark Hopkins, and Kevin Knight. 2009. The power of extended topdown tree transducers. SIAM Journal on Computing, 39(2):410–430. Andreas Maletti. 2006. Compositions of tree series transformations. Theoretical Computer Science, 366:248–271. Andreas Maletti. 2008. Compositions of extended topdown tree transducers. Information and Computation, 206(9–10): 1187–1 196. Andreas Maletti. 2009. Personal Communication. Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. 2000. The design principles of a weighted finite-state transducer library. Theoretical Computer Science, 231: 17–32. Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Lin- guistics, 23(2):269–312. Mehryar Mohri. 2009. Weighted automata algorithms. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata, chapter 6, pages 213–254. Springer-Verlag. Adam Pauls and Dan Klein. 2009. K-best A* parsing. In Keh-Yih Su, Jian Su, Janyce Wiebe, and Haizhou Li, editors, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 958–966, Suntec, Singapore, August. Association for Computational Linguistics. Fernando Pereira and Michael Riley. 1997. Speech recognition by composition of weighted finite automata. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, chapter 15, pages 431–453. MIT Press, Cambridge, MA. William A. Woods. 1980. Cascaded ATN grammars. American Journal of Computational Linguistics, 6(1): 1–12. Kenji Yamada and Kevin Knight. 2001. A syntaxbased statistical translation model. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pages 523–530, Toulouse, France, July. Association for Computational Linguistics. 1066
4 0.18027888 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
5 0.16202222 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.12543419 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
7 0.10302134 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
8 0.100527 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
9 0.092233099 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
10 0.09056218 69 acl-2010-Constituency to Dependency Translation with Forests
11 0.08961039 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
12 0.086123385 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
13 0.085960805 71 acl-2010-Convolution Kernel over Packed Parse Forest
14 0.084044024 67 acl-2010-Computing Weakest Readings
15 0.081485823 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
16 0.079930991 116 acl-2010-Finding Cognate Groups Using Phylogenies
17 0.07910829 243 acl-2010-Tree-Based and Forest-Based Translation
18 0.075842753 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
19 0.067350641 133 acl-2010-Hierarchical Search for Word Alignment
20 0.061555255 236 acl-2010-Top-Down K-Best A* Parsing
topicId topicWeight
[(0, -0.148), (1, -0.107), (2, 0.078), (3, -0.023), (4, -0.13), (5, -0.056), (6, 0.165), (7, 0.041), (8, -0.109), (9, -0.139), (10, -0.023), (11, -0.138), (12, 0.072), (13, -0.098), (14, -0.067), (15, -0.004), (16, 0.073), (17, -0.042), (18, 0.081), (19, 0.093), (20, -0.021), (21, 0.009), (22, 0.031), (23, 0.014), (24, 0.163), (25, -0.158), (26, 0.007), (27, -0.088), (28, -0.059), (29, 0.08), (30, -0.008), (31, -0.006), (32, 0.055), (33, 0.005), (34, -0.125), (35, 0.063), (36, 0.01), (37, -0.098), (38, 0.081), (39, 0.077), (40, -0.032), (41, 0.018), (42, 0.075), (43, 0.024), (44, -0.03), (45, 0.076), (46, 0.056), (47, -0.12), (48, -0.141), (49, -0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.95614392 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
2 0.82811469 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
Author: Jonathan May ; Kevin Knight ; Heiko Vogler
Abstract: Weighted tree transducers have been proposed as useful formal models for representing syntactic natural language processing applications, but there has been little description of inference algorithms for these automata beyond formal foundations. We give a detailed description of algorithms for application of cascades of weighted tree transducers to weighted tree acceptors, connecting formal theory with actual practice. Additionally, we present novel on-the-fly variants of these algorithms, and compare their performance on a syntax machine translation cascade based on (Yamada and Knight, 2001). 1 Motivation Weighted finite-state transducers have found recent favor as models of natural language (Mohri, 1997). In order to make actual use of systems built with these formalisms we must first calculate the set of possible weighted outputs allowed by the transducer given some input, which we call forward application, or the set of possible weighted inputs given some output, which we call backward application. After application we can do some inference on this result, such as determining its k highest weighted elements. We may also want to divide up our problems into manageable chunks, each represented by a transducer. As noted by Woods (1980), it is easier for designers to write several small transducers where each performs a simple transformation, rather than painstakingly construct a single complicated device. We would like to know, then, the result of transformation of input or output by a cascade of transducers, one operating after the other. As we will see, there are various strategies for approaching this problem. We will consider offline composition, bucket brigade applica- tion, and on-the-fly application. Application of cascades of weighted string transducers (WSTs) has been well-studied (Mohri, Heiko Vogler Technische Universit a¨t Dresden Institut f u¨r Theoretische Informatik 01062 Dresden, Germany he iko .vogle r@ tu-dre s den .de 1997). Less well-studied but of more recent interest is application of cascades of weighted tree transducers (WTTs). We tackle application of WTT cascades in this work, presenting: • • • explicit algorithms for application of WTT casceaxpdelisc novel algorithms for on-the-fly application of nWoTvTe lca alscgoardieths,m mansd f experiments comparing the performance of tehxepseer iamlgeonrtisthm cos.m 2 Strategies for the string case Before we discuss application of WTTs, it is helpful to recall the solution to this problem in the WST domain. We recall previous formal presentations of WSTs (Mohri, 1997) and note informally that they may be represented as directed graphs with designated start and end states and edges labeled with input symbols, output symbols, and weights.1 Fortunately, the solution for WSTs is practically trivial—we achieve application through a series of embedding, composition, and projection operations. Embedding is simply the act of representing a string or regular string language as an identity WST. Composition of WSTs, that is, generating a single WST that captures the transformations of two input WSTs used in sequence, is not at all trivial, but has been well covered in, e.g., (Mohri, 2009), where directly implementable algorithms can be found. Finally, projection is another trivial operation—the domain or range language can be obtained from a WST by ignoring the output or input symbols, respectively, on its arcs, and summing weights on otherwise identical arcs. By embedding an input, composing the result with the given WST, and projecting the result, forward application is accomplished.2 We are then left with a weighted string acceptor (WSA), essentially a weighted, labeled graph, which can be traversed R+1∪W {e+ as∞su}m,te ha thtro thuegh woeuitgh t hi osf p aa ppaetrh t ihsa cta wlceuilgahtetds a asre th ien prod∪uct { +of∞ ∞th}e, wtheaitgh thtes wofe i gtsh etd ogfes a, panatdh t ihsat c athlceu lwateeigdh ats so tfh ae (not necessarily finite) set T of paths is calculated as the sum of the weights of the paths of T. 2For backward applications, the roles of input and output are simply exchanged. 1058 ProceedingUsp opfs thaela 4, 8Stwhe Adnennu,a 1l1- M16ee Jtiunlgy o 2f0 t1h0e. A ?c ss2o0c1ia0ti Aosnso focria Ctioonm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s058–1066, a:a/1a:a/1 b:b/a.:5b/.1aa::ba//. 49Ea:ba/:a.6/.5 (a) InpAut strain:ag/ “a aB” emab:ae/d1dedC in an D(b) first Wa:b:aS/T./4. in cascadeE :c/.6Fa:d/.4 (c//).. second WFST in cascbaad::dde// identityA WSaT: (d)b:Oda/c:f.d3Al./6ai5n.:0c3e/.0ca:o7m/1pao :sBcdi/t.32o584n6a :p/1roa:cCa/h:db.3:/c6d.2/46.35b:a(/be.a)5 :BbA:/u.D1c/keDtbrigBadEDae:ba/p.49proach:ECDa:b/ a.:6b/5.(f)Rdc/Ae./0sD.u35b7aFl4t:c o/f.76ofcBld/iE.nD53e4F6orFbcud/.c0k7e3tapbCpd:cDli/cF.a12t38ion (g)dcI/n.A3i5tD46dcaF/lD.0oF7n3-thea f:lcBdy/E .b(12h:d)8c/O.3A5nD-4dtchF/.e0C -7f3EDlyFas:t /n.B9d-EDinF afterc /x.d3cp/6.l1o28ring C(i)ECDOcFE/.nA5-tD4hdcF/e.0- fl37ysdcta/n.35dB64-iEDnF afteBrc/Eb.3eFs6dtc/ pd./1a2.t83h46 asbeC nEDF found Cobm:bdap:c/:o/d.3/.sa6.5e05D:c tF.h0e7 tranaaaa:sd:c: cdd// /. u12..53c2846ers stacnd//d..A53-i46nDdc f.o.00r73 (f) Appaal:A:ybaD W..19ST (b)BB tEoD WST (a) aftercd p/..0/0..rDo5373Fj46ectionc o.12u28tcdg//o.A.35iDn64dgcF// e0C0dC73gEDeFFs of sBtEaDFteF ADdFc// FiBgBuEDFreF 1: Tc/ .h2.34dcr/6/e. 1e2 8 d/ iA. f53fD64edcF/ r. 0 eCC37nEDtF appBroBEaDFcFhesdc ./t2o.3d4c a..12p28plicatioCCnED tF/ h. A53r6D4ocd/uF/. g0073h cascBBaEdDFeFs ocdf/ .3W264dcS//.1.T282s. bydc w//..53e46ll-known aElFgorithdc/m./2.34s6 to e//.f.5f3i46cieCnEtFly finBdE the kd-c/ bedst/ p6aths. Because WSTs can be freely composed, extending application to operate on a cascade of WSTs is fairly trivial. The only question is one of composition order: whether to initially compose the cascade into a single transducer (an approach we call offline composition) or to compose the initial embedding with the first transducer, trim useless states, compose the result with the second, and so on (an approach we call bucket brigade). The appropriate strategy generally depends on the structure of the individual transducers. A third approach builds the result incrementally, as dictated by some algorithm that requests information about it. Such an approach, which we call on-the-fly, was described in (Pereira and Riley, 1997; Mohri, 2009; Mohri et al., 2000). If we can efficiently calculate the outgoing edges of a state of the result WSA on demand, without calculating all edges in the entire machine, we can maintain a stand-in for the result structure, a machine consisting at first of only the start state of the true result. As a calling algorithm (e.g., an implementation of Dijkstra’s algorithm) requests information about the result graph, such as the set of outgoing edges from a state, we replace the current stand-in with a richer version by adding the result of the request. The on-the-fly approach has a distinct advantage over the other two methods in that the entire result graph need not be built. A graphical representation of all three methods is presented in Figure 1. 3 AppCliEcdcF//a..53ti64on of treeB tFranscd/d./3.u264cers Now let us revisit these strategies in the setting of trees and tree transducers. Imagine we have a tree or set of trees as input that can be represented as a weighted regular tree grammar3 (WRTG) and a WTT that can transform that input with some weight. We would like to know the k-best trees the WTT can produce as output for that input, along with their weights. We already know of several methods for acquiring k-best trees from a WRTG (Huang and Chiang, 2005; Pauls and Klein, 2009), so we then must ask if, analogously to the string case, WTTs preserve recognizability4 and we can form an application WRTG. Before we begin, however, we must define WTTs and WRTGs. 3.1 Preliminaries5 A ranked alphabet is a finite set Σ such that every member σ ∈ Σ has a rank rk(σ) ∈ N. We cerayll ⊆ Σ, ∈k ∈ aNs t ahe r set rokf tσho)s ∈e σ ∈ Σe such that r⊆k(σ Σ), k= ∈k. NTh teh ese ste otf o vfa trhioasbele σs σi s∈ d eΣnoted X = {x1, x2, . . .} and is assumed to be disjnooitnetd df Xrom = any rank,e.d. a.}lp ahnadb iest aussseudm iend dth tios paper. We use to denote a symbol of rank 0 that is not iWn any e ra ⊥nk toed d eanlpohtaeb aet s yumsebdo lin o fth riasn paper. tA is tr neoet t ∈ TΣ is denoted σ(t1 , . . . , tk) where k ≥ 0, σ ∈ and t1, . . . , tk ∈ TΣ. F)o wr σ ∈ we mΣe(km) ⊥ Σ T(k), σ ∈ Σk(0 ≥) Σ 3This generates the same class of weighted tree languages as weighted tree automata, the direct analogue of WSAs, and is more useful for our purposes. 4A weighted tree language is recognizable iff it can be represented by a wrtg. 5The following formal definitions and notations are needed for understanding and reimplementation of the presented algorithms, but can be safely skipped on first reading and consulted when encountering an unfamiliar term. 1059 write σ ∈ TΣ as shorthand for σ() . For every set Sw rditiesjσo in ∈t f Trom Σ, let TΣ (S) = TΣ∪S, where, for all s ∈ S, rk(s) = 0. lW se ∈ d,e rfkin(es) th 0e. positions of a tree t = σ(t1, . . . , tk), for k 0, σ ∈ t1, . . . , tk ∈ TΣ, as a set pos(≥t) ⊂ N∗ s∈uch that {∈ε} T ∪ 1e t≤ p ois (≤t) k ⊂, ⊂v ∈ pTohse( tse)t =of {lεea}f ∪ po {siivtio |ns 1 l ≤v(t i) ≤⊆ k p,ovs(t ∈) apores t(hto)s}e. pTohseit sieotns o fv l a∈f p poossit(ito)n ssu lvch(t )th ⊆at pfoors tn)o ir ∈ th Nse, pvio ∈it ponoss(t v). ∈ We p presume hsta tnhadatr dfo lrex nioco igr ∈aph Nic, ovrid ∈eri pnogss( <∈ nTΣd ≤an odn v p o∈s pos(t). The label of t at Lpoestit ti,osn v, Tdenaontedd v v by ∈ t( pvo)s,( tt)he. sTuhbetr leaeb eolf ot fa tt v, denoted by t|v, and the replacement at v by s, vde,n doetneodt e bdy tb[ys] tv|, are defined as follows: ≥ pos(t) = ,{ aivs a| Σ(k), pos(ti)}. 1. For every σ ∈ Σ(0) , σ(ε) = σ, σ|ε = σ, and σF[osr]ε e v=e sy. 2. For every t = σ(t1 , . . . , tk) such that k = rk(σ) and k 1, t(ε) = σ, t|ε = t, aknd = t[ rsk]ε( =) ns.d kFo ≥r every 1) ≤= iσ ≤ t| k and v ∈ pos(ti), t(ivF) =r vtie (rvy) ,1 1t| ≤iv =i ≤ti |v k, aanndd tv[s] ∈iv p=o sσ(t(t1 , . . . , ti−1 , ti[(sv])v, , tt|i+1 , . . . , t|k). The size of a tree t, size (t) is |pos(t) |, the cardinTahliety s iozef i otsf apo tsrieteio tn, sseizt.e (Tt)he is s y |ipelods (ste)t| ,o tfh ae tcraereis the set of labels of its leaves: for a tree t, yd (t) = {t(v) | v ∈ lv(t)}. {Lt(etv )A | avn ∈d lBv( tb)e} sets. Let ϕ : A → TΣ (B) be Lae mt Aapp ainndg. B W bee seexttes.nd L ϕ t oϕ th :e A Am →appi Tng ϕ : TΣ (A) → TΣ (B) such that for a ∈ A, ϕ(a) = ϕ(a) and( Afo)r →k 0, σ ∈ Σch(k th) , atn fdo t1, . . . , tk ∈ TΣ (A), ϕan(dσ( fto1r, . . . ,t0k,) σ) =∈ σ Σ(ϕ(t1), . . . ,ϕ(tk)). ∈ W Te indicate such extensions by describing ϕ as a substitution mapping and then using ϕ without further comment. We use R+ to denote the set {w ∈ R | w 0} and R+∞ to dentoote d Ren+o ∪e {th+e∞ set}. { ≥ ≥ ≥ Definition 3.1 (cf. (Alexandrakis and Bozapalidis, 1987)) A weighted regular tree grammar (WRTG) is a 4-tuple G = (N, Σ, P, n0) where: 1. N is a finite set of nonterminals, with n0 ∈ N the start nonterminal. 2. Σ is a ranked alphabet of input symbols, where Σ ∩ N = ∅. 3. PΣ ∩is Na =tup ∅le. (P0, π), where P0 is a finite set of productions, each production p of the form n → u, n ∈ N, u ∈ TΣ(N), and π : P0 → R+ ins a→ →w uei,g nht ∈ ∈fu Nnc,ti uo n∈ o Tf the productions. W→e w Rill refer to P as a finite set of weighted productions, each production p of the form n −π −(p →) u. A production p is a chain production if it is of the form ni nj, where ni, nj ∈ N.6 − →w 6In (Alexandrakis and Bozapalidis, 1987), chain productions are forbidden in order to avoid infinite summations. We explicitly allow such summations. A WRTG G is in normal form if each production is either a chain production or is of the form n σ(n1, . . . , nk) where σ ∈ Σ(k) and n1, . . . , nk →∈ σ N(n. For WRTG∈ G N =. (N, Σ, P, n0), s, t, u ∈ TΣ(N), n ∈ N, and p ∈ P of the form n −→ ∈w T u, we nobt ∈ain N Na ,d aenridva ptio ∈n s Ptep o ffr tohme fso rtom mt n by− →repl ua,ci wneg some leaf nonterminal in s labeled n with u. For- − →w mally, s ⇒pG t if there exists some v ∈ lv(s) smuaclhly t,ha st s⇒(v) =t i fn t haenrde s e[xui]svt = so tm. e W ve say t(hsis) derivation step is leftmost if, for all v0 ∈ lv(s) where v0 < v, s(v0) ∈ Σ. We hencef∈orth lv a(ss-) sume all derivation )ste ∈ps a.re leftmost. If, for some m ∈ N, pi ∈ P, and ti ∈ TΣ (N) for all s1o m≤e i m m≤ ∈ m N, n0 ⇒∈ pP1 t a1n ⇒∈pm T tm, we say t1he ≤ sequence ,d n = (p1, . . . ,p.m ⇒) is a derivation of tm in G and that n0 ⇒∗ tm; the weight of d is wt(d) = π(p1) · . . . ⇒· π(pm). The weighted tree language rec)og ·n .i.z.ed · π by(p G is the mapping LG : TΣ → R+∞ such that for every t ∈ TΣ, LG(t) is the sum→ →of R the swuecihgth htsa to ffo arl el v(eproyss ti b∈ly T infinitely many) derivations of t in G. A weighted tree language f : TΣ → R+∞ is recognizable if there is a WRTG G such t→hat R f = LG. We define a partial ordering ? on WRTGs sucWh eth date finore W aR TpGarst aGl1 r=d r(iNng1 , Σ?, P o1n , n0) and G2 (N2, Σ, P2, n0), we say G1 ? G2 iff N1 ⊆ N2 and P1 ⊆ P2, where the w?eigh Gts are pres⊆erve Nd. ... = Definition 3.2 (cf. Def. 1of (Maletti, 2008)) A weighted extended top-down tree transducer (WXTT) is a 5-tuple M = (Q, Σ, ∆, R, q0) where: 1. Q is a finite set of states. 2. Σ and ∆ are the ranked alphabets of input and output symbols, respectively, where (Σ ∪ ∆) ∩ Q = 3. (RΣ i ∪s a∆ )tu ∩ple Q ( =R 0∅, .π), where R0 is a finite set of rules, each rule r of the form q.y → u for q ∈ ru lQes, y c∈h T ruΣle(X r), o fa tnhde u fo r∈m T q∆.y(Q − → →× u uX fo)r. Wqe ∈ ∈fu Qrt,hye r ∈req Tuire(X Xth)a,t annod v uari ∈abl Te x( Q∈ ×X appears rmthoerre rtehqauni roen tchea itn n y, aanrida bthleat x xe ∈ach X Xva arpi-able appearing in u is also in y. Moreover, π : R0 → R+∞ is a weight function of the rules. As →for RWRTGs, we refer to R as a finite set of weighted rules, each rule r of the form ∅. q.y −π −(r →) u. A WXTT is linear (respectively, nondeleting) if, for each rule r of the form q.y u, each x ∈ yd (y) ∩ X appears at most on− →ce ( ur,es epaecchxtive ∈ly, dat( lye)a ∩st Xonc aep) iena us. tW meo dsten oontcee th (ree scpleascsof all WXTTs as wxT and add the letters L and N to signify the subclasses of linear and nondeleting WTT, respectively. Additionally, if y is of the form σ(x1 , . . . , xk), we remove the letter “x” to signify − →w 1060 × ×× × the transducer is not extended (i.e., it is a “traditional” WTT (F¨ ul¨ op and Vogler, 2009)). For WXTT M = (Q, Σ, ∆, R, q0), s, t ∈ T∆(Q TΣ), and r ∈ R of the form q.y −w →), u, we obtain a× d Ter)iv,a atniodn r s ∈te pR ofrfom the s f trom mt b.yy r→epl ua,c wineg sbotamine leaf of s labeled with q and a tree matching y by a transformation of u, where each instance of a variable has been replaced by a corresponding subtree of the y-matching tree. Formally, s ⇒rM t if there oisf tah peo ysi-tmioantc vh n∈g tp roese(.s F)o, am saulblys,ti stu ⇒tion mapping ϕ : X → TΣ, and a rule q.y −u→w bs u ∈ R such that ϕs(v :) X X= → →(q, T ϕ(y)) and t = s[ϕ− →0(u u)] ∈v, wRh seurech hϕ t0h aist a substitution mapping Q X → T∆ (Q TΣ) dae sfiunbesdti usuticohn t mhaatp ϕpin0(qg0, Q Qx) × = X ( →q0, Tϕ(x()Q) f×or T all q0 ∈ Q and x ∈ X. We say this derivation step is l∈eft Qmo asnt dif, x f o∈r Xall. v W0 e∈ s lyv( tsh)i w deherirvea tvio0 n< s v, s(v0) ∈ ∆. We hencefor∈th lavs(sus)m we haellr ede vrivation steps) a ∈re ∆le.ftm Woes ht.e nIcf,e ffoorr sho amsesu sm ∈e aTllΣ d, emriv a∈t oNn, ri p∈s R ar, ea lnedf ttmi o∈s tT.∆ I f(,Q f ×r sToΣm) efo sr ∈all T T1 ≤, m mi ≤ ∈ m, (q0∈, s R) ,⇒ anrd1 tt1 . . . ⇒(rQm ×tm T, w)e f say lth 1e sequence d =, ()r1 ⇒ , . . . , rm..) .i s⇒ ⇒a derivation of (s, tm) in M; the weight of d is wt(d) = π(r1) · . . . · π(rm). The weighted tree transformation )r ·ec .o..gn ·i πze(rd by M is the mapping τM : TΣ T∆ → R+∞, such that for every s ∈ TΣ and t ∈× T T∆, τM→(s R, t) is the × µ× foofrth eve ewryeig sh ∈ts Tof aalln (dpo ts ∈sib Tly infinitely many) derivations of (s, t) in M. The composition of two weighted tree transformations τ : TΣ T∆ → R+∞ and : T∆ TΓ → R+∞ is the weight×edT tree→ →tra Rnsformation (τ×; Tµ) :→ →TΣ R TΓ → R+∞ wPhere for every s ∈ TΣ and u ∈ TΓ, (τ×; Tµ) (→s, uR) = Pt∈T∆ τ(s, t) · µ(t,u). 3.2 Applicable classes We now consider transducer classes where recognizability is preserved under application. Table 1 presents known results for the top-down tree transducer classes described in Section 3. 1. Unlike the string case, preservation of recognizability is not universal or symmetric. This is important for us, because we can only construct an application WRTG, i.e., a WRTG representing the result of application, if we can ensure that the language generated by application is in fact recognizable. Of the types under consideration, only wxLNT and wLNT preserve forward recognizability. The two classes marked as open questions and the other classes, which are superclasses of wNT, do not or are presumed not to. All subclasses of wxLT preserve backward recognizability.7 We do not consider cases where recognizability is not preserved tshuamt in the remainder of this paper. If a transducer M of a class that preserves forward recognizability is applied to a WRTG G, we can call the forward ap7Note that the introduction of weights limits recognizability preservation considerably. For example, (unweighted) xT preserves backward recognizability. plication WRTG M(G). and if M preserves backward recognizability, we can call the backward application WRTG M(G)/. Now that we have explained the application problem in the context of weighted tree transducers and determined the classes for which application is possible, let us consider how to build forward and backward application WRTGs. Our basic approach mimics that taken for WSTs by using an embed-compose-project strategy. As in string world, if we can embed the input in a transducer, compose with the given transducer, and project the result, we can obtain the application WRTG. Embedding a WRTG in a wLNT is a trivial operation—if the WRTG is in normal form and chain production-free,8 for every production of the form n − →w σ(n1 , . . . , nk), create a rule ofthe form n.σ(x1 , . . . , xk) − →w σ(n1 .x1, . . . , nk.xk). Range × projection of a w− x→LN σT(n is also trivial—for every q ∈ Q and u ∈ T∆ (Q X) create a production of the form q ∈−→w T u(0 where )u 0c is formed from u by replacing al−l → →lea uves of the form q.x with the leaf q, i.e., removing references to variables, and w is the sum of the weights of all rules of the form q.y → u in R.9 Domain projection for wxLT is bq.eyst →exp ulai inne dR b.y way of example. The left side of a rule is preserved, with variables leaves replaced by their associated states from the right side. So, the rule q1.σ(γ(x1) , x2) − →w δ(q2.x2, β(α, q3.x1)) would yield the production q1 q− →w σ(γ(q3) , q2) in the domain projection. Howev− →er, aσ dγe(lqeting rule such as q1.σ(x1 , x2) − →w γ(q2.x2) necessitates the introduction of a new →non γte(rqminal ⊥ that can genienrtartoed aullc toiof nT Σo fw ai nthe wwe niognhtte r1m . The only missing piece in our embed-composeproject strategy is composition. Algorithm 1, which is based on the declarative construction of Maletti (2006), generates the syntactic composition of a wxLT and a wLNT, a generalization of the basic composition construction of Baker (1979). It calls Algorithm 2, which determines the sequences of rules in the second transducer that match the right side of a single rule in the × first transducer. Since the embedded WRTG is of type wLNT, it may be either the first or second argument provided to Algorithm 1, depending on whether the application is forward or backward. We can thus use the embed-compose-project strategy for forward application of wLNT and backward application of wxLT and wxLNT. Note that we cannot use this strategy for forward applica8Without loss of generality we assume this is so, since standard algorithms exist to remove chain productions (Kuich, 1998; E´sik and Kuich, 2003; Mohri, 2009) and convert into normal form (Alexandrakis and Bozapalidis, 1987). 9Finitely many such productions may be formed. 1061 tion of wxLNT, even though that class preserves recognizability. Algorithm 1COMPOSE 1: inputs 2: wxLT M1 = (Q1, Σ, ∆, R1, q10 ) 3: wLNT M2 = (Q2, ∆, Γ, R2, q20 ) 4: outputs 5: wxLT M3 = ((Q1 Q2), Σ, Γ, R3, (q10 , q20 )) such that M3 = (τM1 ; τM2 Q). 6: complexity 7: O(|R1 | max(|R2|size( ˜u), |Q2|)), where ˜u is the × lOar(g|eRst |rimgahtx s(|idRe t|ree in a,n|yQ ru|l))e in R1 8: Let R3be of the form (R30,π) 9: R3 ← (∅, ∅) 10: Ξ ←← ←{ ((q∅,10∅ , q20 )} {seen states} 11 10 : ΨΞ ←← {{((qq10 , q20 ))}} {{speeennd sintagt essta}tes} 1112:: Ψwh ←ile {Ψ( ∅ do) 1123:: (ilqe1 , Ψq26 =) ← ∅ daony element of 14: ← Ψ) \← {a(nqy1 , ql2em)}e 15: for all (q1.y q− −w →1 u) ∈ R1 do 16: for all (z, −w − →2) u∈) )C ∈O RVER(u, M2, q2) do 17: for all (q, x) )∈ ∈∈ C yOdV V(Ez)R ∩(u u(,(QM1 Q2) X) do 18: i fa lql (∈q ,Ξx )th ∈en y 19: qΞ6 ∈ ← Ξ tΞh e∪n {q} 20: ΞΨ ←← ΞΨ ∪∪ {{qq}} 21: r ← ((Ψq1 ← , q 2Ψ) .y {→q }z) 22: rR30 ← ←← (( qR03 ∪ {).ry} 23: π(r)← ←← R π(∪r) { +r} (w1 · w2) 24: return M3 = Ψ 4 Ψ Application of tree transducer cascades What about the case of an input WRTG and a cascade of tree transducers? We will revisit the three strategies for accomplishing application discussed above for the string case. In order for offline composition to be a viable strategy, the transducers in the cascade must be closed under composition. Unfortunately, of the classes that preserve recognizability, only wLNT × is closed under composition (G´ ecseg and Steinby, 1984; Baker, 1979; Maletti et al., 2009; F ¨ul ¨op and Vogler, 2009). However, the general lack of composability of tree transducers does not preclude us from conducting forward application of a cascade. We revisit the bucket brigade approach, which in Section 2 appeared to be little more than a choice of composition order. As discussed previously, application of a single transducer involves an embedding, a composition, and a projection. The embedded WRTG is in the class wLNT, and the projection forms another WRTG. As long as every transducer in the cascade can be composed with a wLNT to its left or right, depending on the application type, application of a cascade is possible. Note that this embed-compose-project process is somewhat more burdensome than in the string case. For strings, application is obtained by a single embedding, a series of compositions, and a single projecAlgorithm 2 COVER 1: inputs 2: u ∈ T∆ (Q1 X) 3: wuT ∈ M T2 = (Q×2, X X∆), Γ, R2, q20 ) 4: state q2 ∈ Q2 ×× × 5: outputs 6: set of pairs (z, w) with z ∈ TΓ ((Q1 Q2) X) fsoetrm ofed p ab yir so (nze, ,o wr m) worieth hsu zcc ∈es Tsful runs× ×on Q Qu )b y × ×ru Xles) in R2, starting from q2, and w ∈ R+∞ the sum of the weights of all such runs,. 7: complexity 8: O(|R2|size(u)) 9: 10: 11: 12: 13: 14: if u(ε) is of the form (q1,x) ∈ Q1× X then zinit ← ((q1 q2), x) else zinit ← ⊥ Πlast ←← ←{(z ⊥init, {((ε, ε), q2)}, 1)} for all← v ∈ pos(u) εsu,εch), tqha)t} u(v) ∈ ∆(k) for some fko ≥r 0ll li nv p ∈ref ipxo osr(ude)r sduoc 15: ≥Π v0 i←n p ∅r 16: for ←all ∅(z, θ, w) ∈ Πlast do 17: rf aorll a(zll, vθ0, ∈w )lv ∈(z Π) such that z(v0) = ⊥ do 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: , dfoor all(θ(v,v0).u(v)(x1,...,xk) −w →0h)∈R2 θ0 ← θ For←m sθubstitution mapping ϕ : (Q2 X) → TΓ((Q1 Q2 X) {⊥}). f→or Ti = 1to× ×k dQo for all v00 ∈ pos(h) such that h(v00) = (q02 , xi∈) for some q20 ∈ Q2 do θ0(vi, v0v00) ← q20 if u(vi) ←is q of the form (q1, x) ∈ Q1 X then ∪ ,ϕ(x)q20 ∈, x Qi) ←× X((q t1h, eqn20), x) else ϕ(q20, xi) ← ⊥ Πv ← Πv {(z[ϕ)( ←h)] ⊥v0 , θ0, w · w0)} ∪ 29: Πlast ← Πv 30: Z ← {z |← ←(z Π, θ, Xw) 31: return {(z, X ∈ (z ,θ ,wX) X∈Πl Πlast} w) | z ∈ Z} ast X tion, whereas application for trees is obtained by a series of (embed, compose, project) operations. 4.1 On-the-fly algorithms We next consider on-the-fly algorithms for application. Similar to the string case, an on-thefly approach is driven by a calling algorithm that periodically needs to know the productions in a WRTG with a common left side nonterminal. The embed-compose-project approach produces an entire application WRTG before any inference algorithm is run. In order to admit an on-the-fly approach we describe algorithms that only generate those productions in a WRTG that have a given left nonterminal. In this section we extend Definition 3. 1 as follows: a WRTG is a 6tuple G = (N, P, n0,M,G) where N, P, and n0 are defined as in Definition 3. 1, and either M = G = ∅,10 or M is a wxLNT and G is a normMal = =fo Grm =, c ∅h,ain production-free WRTG such that Σ, 10In which case Σ, the definition is functionally unchanged from before. 1062 w t[xLypN] LeT (a)pPresOYNer Qovsateiodn?f(Grwe´ca(Fsd¨uMrg(lKeS¨coa punelgsid cowtihuSza,[rtxlbce1.2i]9,l0Nn2tybT091), 984w [txy](pbL Ne)T PrespvatiosYnNero svfebda?ckw(rFdu¨MlSeo¨c apesgl onwtiuza[r,xlcb.2]ei,NlL0t2yT 91)0 Table 1: Preservation of forward and backward recognizability for various classes of top-down tree transducers. Here and elsewhere, the following abbreviations apply: w = weighted, x = extended LHS, L = linear, N = nondeleting, OQ = open question. Square brackets include a superposition of classes. For example, w[x]T signifies both wxT and wT. Algorithm 3 PRODUCE 1: inputs 2: WRTG Gin = (Nin, ∆, Pin, n0, M, G) such that M = (Q, Σ, ∆, R, q0) is a wxLNT and G = (N, Σ, P, n00, M0, G0) is a WRTG in normal form with no chain productions 3: nin ∈ Nin 4: outputs∈ 5: WRTG Gout = (Nout, ∆, Pout, n0, M, G), such that Gin ? Gout and (nin ?−→w G u) ∈ Pout ⇔ (nin − →w u) ∈ M(G). 6: complex−i →ty 7: O(|R| ), where ˜y is the largest left side tree iOn (a|Rny| | rPul|e in R |P|size( y˜) 8: if Pincontains productions of the form nin− →w u then 9: return Gin 10: Nout ← Nin 11: Pout ←← P Nin 12: Let ni←n b Pe of the form (n, q), where n ∈ N and q ∈ Q. × × 13: for all (q.y −f −wt → 1he u) ∈ R do 14: for all (θ, w2) ∈ ∈RE RPL doACE(y,G, n) do 15: Form subs)ti ∈tu RtiEonP LmAaCppEi(nyg, Gϕ, n: Qo X → T∆ (N Q) such that, for all v ∈ ydQ Q(y) × ×and X Xq0 → →∈ Q, (ifN Nth ×ereQ e)x sisutc nh0 h∈a tN, f aonrd a lxl v∈ ∈X y sdu(cyh) th anatd θ q(v∈) = n0 and y(v) = x, t∈he Nn aϕn(dq0 x , x ∈) X= ( snu0c,h hq t0)ha. 16: p0 ((n, q) −w −1 −· −w →2 ϕ(u)) 17: for← ←all ( p ∈, qN)O− −R −M − →(p0 ϕ, N(uo)u)t) do ← 18: Let p b(ke) o.f the form n0− →w δ(n1,...,nk) for 19: δN ∈out ∆ ← Nout ∪ {n0 , . . . , nk } 20: Pout ←← P Nout ∪∪ { {pn} 21: return CHAIN-REM(Gout) M(G).. In the latter case, G is a stand-in for MG ?(G M).,( analogous to the stand-ins for WSAs and G ? WSTs described in Section 2. Algorithm 3, PRODUCE, takes as input a WRTG Gin = (Nin, ∆, Pin, n0, and a desired nonterminal nin and returns another WRTG, Gout that is different from Gin in that it has more productions, specifically those beginning with nin that are in Algorithms using stand-ins should call PRODUCE to ensure the stand-in they are using has the desired productions beginning with the specific nonterminal. Note, then, that M, G) M(G).. PRODUCE obtains the effect of forward applica- Algorithm 4 REPLACE 1: 2: 3: 4: 5: 6: 7: 8: inputs y ∈ TΣ(X) WRTG G = (N, Σ, P, n0, M, G) in normal form, with no chain productions n∈ N outnpu ∈ts N set Π of pairs (θ, w) where θ is a mapping pos(y) → N and w ∈ R+∞ , each pair indicating pa ossu(cyc)ess →ful Nrun a nodn wy b ∈y p Rroductions in G, starting from n, and w is the weight of the run. complexity O(|P|size(y)) 9: Πlast← {({(ε,n)},1)} 10: for all← ←v {∈( { po(εs,(ny)) s,u1c)h} that y(v) ∈ X in prefix order fdoor 11: Πv ← ∅ 12: for ←all ∅(θ, w) ∈ Πlast do 13: ri fa Mll ( w∅) )a ∈nd Π G ∅ then 14: MG ←= ∅PR anOdD GUC6 =E ∅(G th, eθn(v)) = = −w →0 15: for all (θ(v) y(v) (n1, . . . , nk)) ∈ P do 16: Πv ← Πv∪− →{(θ y∪({v ()(vni, ni) , 1≤ )i) ≤ ∈ k P}, d dwo·w0) } 17: Πlast ← Π←v 18: return Πlast Algorithm 5 MAKE-EXPLICIT 1: inputs 2: WRTG G = (N, Σ, P, n0, M, G) in normal form 3: outputs 4: WRTG G0 = (N0, Σ, P0, n0, M, G), in normal form, such that if M ∅ and G ∅, LG0 = LM(G)., and otherwise Gf M0 = G. = = 56:: comOp(|lePx0it|y) 7: G0← 8: Ξ ←← { nG0} {seen nonterminals} 89:: ΞΨ ←← {{nn0}} {{speeenndi nnogn tneornmteinramlsi}nals} 190:: wΨh ←ile {Ψn =} ∅{ pdeon 11 10 : inl e← Ψa6n =y ∅el deoment of 12: nΨ ←←a nΨy \ e l{emn}e 13: iΨf M ← ∅\ a{nnd} G ∅ then 14: MG0 =← ∅ P aRnOdD GU 6=CE ∅(G the0,n nn) 15: for all (n P−→w RO σ(n1 , . . . , nk)) ∈ P0 do 16: for i= 1→ →to σ (kn ndo 17: if ni ∈ Ξ then 18: Ξ ←∈ Ξ ΞΞ t h∪e {nni} 19: ΞΨ ←← ΞΨ ∪∪ {{nni}} 20: return G0 Ψ = = 1063 g0 g0 −w − →1 −−w →→2 g0 − − → σ(g0, g1) α w − →3 g1 − − → α (a) Input WRTG G G a0 a0.σ(x1, x2) −w − →4 − w − → →5 σ(a0.x1, a1.x2) a0.σ(x1, x2) ψ(a2.x1, a1.x2) a0 .α − − → α a 1.α − − → α a2 .α w − →6 (w −a → →7 w − →8 −−→ ρ (b) First transducer MA in the cascade b0 b0.σ(x1, x2) b0.α −w −1 →0 α −w − →9 σ(b0.x1, b0.x2) (c) Second transducer MB in the cascade g0a0 g0a0 −w −1 −· −w →4 σ(g0a0, g1a1) −−w −− 1− − ·w − − → →5 ψ(g0a2, g1a1) − − −·w − → α g1a1 − − −·w − → α w −− 2 −− − · w−− → →6 g0a0 w − 3 − −· w− → →7 (d) Productions of MA (G). built as a consequence of building the complete MB(MA(G).). g0a0b0 g0a0b0 −w −1 −· −w4 −·w − →9 σ(g0a0b0, g1a1b0) g0a0b0 −−w − − −2 −· −w −6 − −·−w − → −1 →0 σ α g1a1b0 −w − −3· w−7 −· −w −1 →0 α (e) Complete MB (MA (G).). Figure 2: Forward application through a cascade of tree transducers using an on-the-fly method. tion in an on-the-fly manner.11 It makes calls to REPLACE, which is presented in Algorithm 4, as well as to a NORM algorithm that ensures normal form by replacing a single production not in normal form with several normal-form productions that can be combined together (Alexandrakis and Bozapalidis, 1987) and a CHAIN-REM algorithm that replaces a WRTG containing chain productions with an equivalent WRTG that does not (Mohri, 2009). As an example of stand-in construction, consider the invocation PRODUCE(G1, g0a0), where iGs1 in= F (i{g u0rae0 2}a, 1 {2σa,nψd,α M,ρA},is ∅ i,n g 20ab0., T MheA s,ta Gn)d,-i Gn WRTG that is output contains the first three of the four productions in Figure 2d. To demonstrate the use of on-the-fly application in a cascade, we next show the effect of PRODUCE when used with the cascade G ◦ MA ◦ MB, wDhUeCreE MwhBe i uss eind wFitighu three c2acs. Oe uGr dMrivin◦gM algorithm in this case is Algorithm 5, MAKE11Note further that it allows forward application of class wxLNT, something the embed-compose-project approach did not allow. 12By convention the initial nonterminal and state are listed first in graphical depictions of WRTGs and WXTTs. rJJ.JJ(x1, x2, x3) → JJ(rDT.x1, rJJ.x2, rVB.x3) rVB.VB(x1, x2, )x− 3→) → JJ VrB(rNNPS.x1, rNN.x3, rVB.x2) t.”gentle” − → ”gentle”(a) Rotation rules iVB.NN(x1, x2) iVB.NN(x1, x2)) iVB.NN(x1, x2)) → →→ →→ NN(INS iNN.x1, iNN.x2) NNNN((iINNNS.x i1, iNN.x2) NNNN((iiNN.x1, iNN.x2, INS) (b) Insertion rules t.VB(x1 , x2, x3) → X(t.x1 , t.x2, t.x3) t.”gentleman” →) → j →1 t . ””ggeennttl eemmaann”” →→ jE1PS t . ”INgSen →tle m j 1a t . I NNSS →→ j 21 (c) Translation rules Figure 3: Example rules from transducers used in decoding experiment. j 1 and j2 are Japanese words. EXPLICIT, which simply generates the full application WRTG using calls to PRODUCE. The input to MAKE-EXPLICIT is G2 = ({g0a0b0}, {σ, α}, ∅, g0a0b0, MB, G1).13 MAKE=-E ({XgPLICI}T, c{aσl,lsα }P,R ∅O, gDUCE(G2, g0a0b0). PRODUCE then seeks to cover b0.σ(x1, x2) σ(b0.x1, b0.x2) with productions from G1, wh−i →ch i σs (ab stand-in for −w →9 MA(G).. At line 14 of REPLACE, G1 is improved so that it has the appropriate productions. The productions of MA(G). that must be built to form the complete MB (MA(G).). are shown in Figure 2d. The complete MB (MA(G).). is shown in Figure 2e. Note that because we used this on-the-fly approach, we were able to avoid building all the productions in MA(G).; in particular we did not build g0a2 − −w2 −· −w →8 ρ, while a bucket brigade approach would −h −a −v −e → →bui ρlt, ,t whish production. We have also designed an analogous onthe-fly PRODUCE algorithm for backward application on linear WTT. We have now defined several on-the-fly and bucket brigade algorithms, and also discussed the possibility of embed-compose-project and offline composition strategies to application of cascades of tree transducers. Tables 2a and 2b summarize the available methods of forward and backward application of cascades for recognizabilitypreserving tree transducer classes. 5 Decoding Experiments The main purpose of this paper has been to present novel algorithms for performing applica- tion. However, it is important to demonstrate these algorithms on real data. We thus demonstrate bucket-brigade and on-the-fly backward application on a typical NLP task cast as a cascade of wLNT. We adapt the Japanese-to-English transla13Note that G2 is the initial stand-in for MB (MA (G).)., since G1 is the initial stand-in for MA (G).. 1064 obomcbtfethodW√ √STwx√L× NTwL√ √NTo mbctbfethodW√ √STw×x√ LTw√ ×LTwxL√ ×NTwL√ √NT (a) Forward application (b) Backward application Table 2: Transducer types and available methods of forward and backward application of a cascade. oc = offline composition, bb = bucket brigade, otf = on the fly. tion model of Yamada and Knight (2001) by transforming it from an English-tree-to-Japanese-string model to an English-tree-to-Japanese-tree model. The Japanese trees are unlabeled, meaning they have syntactic structure but all nodes are labeled “X”. We then cast this modified model as a cascade of LNT tree transducers. Space does not permit a detailed description, but some example rules are in Figure 3. The rotation transducer R, a samparlee ionf Fwighuicreh 3is. Tinh Fei rgoutareti o3na, t rhaanss d6u,4c5e3r R ru,l eas s, tmheinsertion transducer I,Figure 3b, has 8,122 rules, iannsde rtthieon ntr trananssladtuiocne rtr Ia,n Fsidguucreer, 3 bT, , Fasig 8u,r1e2 32c r,u lheass, 3a7nd,31 th h1e ertu rlaenss. We add an English syntax language model L to theW ceas acdadde a no Ef ntrgalinsshd uscyentras x ju lastn gdueascgrei mbeodd etol L be ttoter simulate an actual machine translation decoding task. The language model is cast as an identity WTT and thus fits naturally into the experimental framework. In our experiments we try several different language models to demonstrate varying performance of the application algorithms. The most realistic language model is a PCFG. Each rule captures the probability of a particular sequence of child labels given a parent label. This model has 7,765 rules. To demonstrate more extreme cases of the usefulness of the on-the-fly approach, we build a language model that recognizes exactly the 2,087 trees in the training corpus, each with equal weight. It has 39,455 rules. Finally, to be ultraspecific, we include a form of the “specific” language model just described, but only allow the English counterpart of the particular Japanese sentence being decoded in the language. The goal in our experiments is to apply a single tree t backward through the cascade L◦R◦I◦T ◦t tarnede tfi bndac kthwe 1rd-b tehsrto pugathh hine tchaes caapdpeli Lca◦tiRon◦ IW◦RTTG ◦t. We evaluate the speed of each approach: bucket brigade and on-the-fly. The algorithm we use to obtain the 1-best path is a modification of the kbest algorithm of Pauls and Klein (2009). Our algorithm finds the 1-best path in a WRTG and admits an on-the-fly approach. The results of the experiments are shown in Table 3. As can be seen, on-the-fly application is generally faster than the bucket brigade, about double the speed per sentence in the traditional L1eMp-xcsafe tgcyn tpemb u eo ct hkfoe tdime>/.21s 0.e78465nms tenc Table 3: Timing results to obtain 1-best from application through a weighted tree transducer cascade, using on-the-fly vs. bucket brigade backward application techniques. pcfg = model recognizes any tree licensed by a pcfg built from observed data, exact = model recognizes each of 2,000+ trees with equal weight, 1-sent = model recognizes exactly one tree. experiment that uses an English PCFG language model. The results for the other two language models demonstrate more keenly the potential advantage that an on-the-fly approach provides—the simultaneous incorporation of information from all models allows application to be done more effectively than if each information source is considered in sequence. In the “exact” case, where a very large language model that simply recognizes each of the 2,087 trees in the training corpus is used, the final application is so large that it overwhelms the resources of a 4gb MacBook Pro, while the on-the-fly approach does not suffer from this problem. The “1-sent” case is presented to demonstrate the ripple effect caused by using on-the fly. In the other two cases, a very large language model generally overwhelms the timing statistics, regardless of the method being used. But a language model that represents exactly one sentence is very small, and thus the effects of simultaneous inference are readily apparent—the time to retrieve the 1-best sentence is reduced by two orders of magnitude in this experiment. 6 Conclusion We have presented algorithms for forward and backward application of weighted tree transducer cascades, including on-the-fly variants, and demonstrated the benefit of an on-the-fly approach to application. We note that a more formal approach to application of WTTs is being developed, 1065 independent from these efforts, by F ¨ul ¨op (2010). et al. Acknowledgments We are grateful for extensive discussions with Andreas Maletti. We also appreciate the insights and advice of David Chiang, Steve DeNeefe, and others at ISI in the preparation of this work. Jonathan May and Kevin Knight were supported by NSF grants IIS-0428020 and IIS0904684. Heiko Vogler was supported by DFG VO 1011/5-1. References Athanasios Alexandrakis and Symeon Bozapalidis. 1987. Weighted grammars and Kleene’s theorem. Information Processing Letters, 24(1): 1–4. Brenda S. Baker. 1979. Composition of top-down and bottom-up tree transductions. Information and Control, 41(2): 186–213. Zolt a´n E´sik and Werner Kuich. 2003. Formal tree series. Journal of Automata, Languages and Combinatorics, 8(2):219–285. Zolt a´n F ¨ul ¨op and Heiko Vogler. 2009. Weighted tree automata and tree transducers. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata, chapter 9, pages 3 13–404. Springer-Verlag. Zolt a´n F ¨ul ¨op, Andreas Maletti, and Heiko Vogler. 2010. Backward and forward application of weighted extended tree transducers. Unpublished manuscript. Ferenc G ´ecseg and Magnus Steinby. 1984. Tree Automata. Akad e´miai Kiad o´, Budapest. Liang Huang and David Chiang. 2005. Better k-best parsing. In Harry Bunt, Robert Malouf, and Alon Lavie, editors, Proceedings of the Ninth International Workshop on Parsing Technologies (IWPT), pages 53–64, Vancouver, October. Association for Computational Linguistics. Werner Kuich. 1998. Formal power series over trees. In Symeon Bozapalidis, editor, Proceedings of the 3rd International Conference on Developments in Language Theory (DLT), pages 61–101, Thessaloniki, Greece. Aristotle University of Thessaloniki. Werner Kuich. 1999. Tree transducers and formal tree series. Acta Cybernetica, 14: 135–149. Andreas Maletti, Jonathan Graehl, Mark Hopkins, and Kevin Knight. 2009. The power of extended topdown tree transducers. SIAM Journal on Computing, 39(2):410–430. Andreas Maletti. 2006. Compositions of tree series transformations. Theoretical Computer Science, 366:248–271. Andreas Maletti. 2008. Compositions of extended topdown tree transducers. Information and Computation, 206(9–10): 1187–1 196. Andreas Maletti. 2009. Personal Communication. Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. 2000. The design principles of a weighted finite-state transducer library. Theoretical Computer Science, 231: 17–32. Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Lin- guistics, 23(2):269–312. Mehryar Mohri. 2009. Weighted automata algorithms. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata, chapter 6, pages 213–254. Springer-Verlag. Adam Pauls and Dan Klein. 2009. K-best A* parsing. In Keh-Yih Su, Jian Su, Janyce Wiebe, and Haizhou Li, editors, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 958–966, Suntec, Singapore, August. Association for Computational Linguistics. Fernando Pereira and Michael Riley. 1997. Speech recognition by composition of weighted finite automata. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, chapter 15, pages 431–453. MIT Press, Cambridge, MA. William A. Woods. 1980. Cascaded ATN grammars. American Journal of Computational Linguistics, 6(1): 1–12. Kenji Yamada and Kevin Knight. 2001. A syntaxbased statistical translation model. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pages 523–530, Toulouse, France, July. Association for Computational Linguistics. 1066
3 0.74325758 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
4 0.65021014 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
5 0.63246685 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
6 0.55114132 67 acl-2010-Computing Weakest Readings
7 0.51022726 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.46864825 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
9 0.41545448 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
10 0.41233075 116 acl-2010-Finding Cognate Groups Using Phylogenies
11 0.40418732 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
12 0.38873497 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
13 0.36990422 71 acl-2010-Convolution Kernel over Packed Parse Forest
14 0.35938299 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
15 0.35399219 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
16 0.34178132 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
17 0.32759887 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction
18 0.32647884 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
19 0.31930703 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
20 0.3157475 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
topicId topicWeight
[(4, 0.011), (14, 0.338), (25, 0.062), (33, 0.044), (39, 0.016), (41, 0.03), (42, 0.011), (59, 0.058), (65, 0.018), (73, 0.024), (78, 0.084), (83, 0.048), (84, 0.038), (98, 0.108)]
simIndex simValue paperId paperTitle
1 0.9552139 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
same-paper 2 0.84558994 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
3 0.77079231 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
4 0.67758155 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
Author: Weiwei Guo ; Mona Diab
Abstract: Word Sense Disambiguation remains one ofthe most complex problems facing computational linguists to date. In this paper we present a system that combines evidence from a monolingual WSD system together with that from a multilingual WSD system to yield state of the art performance on standard All-Words data sets. The monolingual system is based on a modification ofthe graph based state ofthe art algorithm In-Degree. The multilingual system is an improvement over an AllWords unsupervised approach, SALAAM. SALAAM exploits multilingual evidence as a means of disambiguation. In this paper, we present modifications to both of the original approaches and then their combination. We finally report the highest results obtained to date on the SENSEVAL 2 standard data set using an unsupervised method, we achieve an overall F measure of 64.58 using a voting scheme.
5 0.54759002 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
6 0.53227878 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
7 0.52710515 214 acl-2010-Sparsity in Dependency Grammar Induction
8 0.52603185 35 acl-2010-Automated Planning for Situated Natural Language Generation
9 0.52416384 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
10 0.49688995 168 acl-2010-Learning to Follow Navigational Directions
11 0.49059609 67 acl-2010-Computing Weakest Readings
12 0.48718375 190 acl-2010-P10-5005 k2opt.pdf
13 0.48579374 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
14 0.48529029 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
15 0.48480868 71 acl-2010-Convolution Kernel over Packed Parse Forest
16 0.47953218 162 acl-2010-Learning Common Grammar from Multilingual Corpus
17 0.47444201 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
18 0.46834815 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
19 0.46606299 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
20 0.46504503 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing