emnlp emnlp2011 emnlp2011-52 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
Reference: text
sentIndex sentText sentNum sentScore
1 s att a@ de i Abstract We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. [sent-6, score-0.661]
2 We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees. [sent-7, score-0.285]
3 Recent work has reduced non-projective parsing to the identification of a maximum spanning tree in a graph (McDonald et al. [sent-11, score-0.09]
4 it programming has been successfully used for projective parsing (Huang and Sagae, 2010; Kuhlmann et al. [sent-19, score-0.177]
5 Dynamic programming algorithms for parsing (also known as chart-based algorithms) allow polynomial space representations of all parse trees for a given input string, even in cases where the size of this set is exponential in the length of the string itself. [sent-21, score-0.272]
6 (201 1) and present a polynomial dynamic programming algorithm for non-projective transitionbased parsing. [sent-24, score-0.2]
7 Our algorithm is coupled with a simplified version of the transition system from Attardi (2006), which has high coverage for the type of non-projective structures that appear in various treebanks. [sent-25, score-0.399]
8 Instead of an additional transition operation which permits swapping of two elements in the stack (Titov et al. [sent-26, score-0.727]
9 , 2009; Nivre, 2009), Attardi’s system allows reduction of elements at non-adjacent positions in the stack. [sent-27, score-0.108]
10 The implication for this, for example, is that one can now approach the problem of unsupervised learning of non-projective dependency structures within the transition-based framework. [sent-29, score-0.089]
11 Dynamic programming algorithms for nonprojective parsing have been proposed by Kahane et al. [sent-30, score-0.184]
12 tc ho2d0s11 in A Nsasotuciraatlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinagguesis 1ti2c3s4–1245, introduce a dynamic programming algorithm for inference with non-projective structures of unbounded gap degree. [sent-35, score-0.202]
13 2 Transition-based Dependency Parsing In this section we briefly introduce the basic definitions for transition-based dependency parsing. [sent-41, score-0.089]
14 Throughout stehi ass paper we fd oenuort pea trshee input string as w = a0 · · · an−1, n ≥ 1, where a0 = $ and ai ∈ Σ \ {$} for ·e·a·cah iw,it nh ≥1 iw n − a 1. [sent-46, score-0.086]
15 A dependency tree for w is a directed tree Gw = (Vw, Aw), where Vw = {0, . [sent-47, score-0.167]
16 Furthermore, each arc in Aw encodes a dependency relation between two tokens. [sent-55, score-0.207]
17 We write i → j to denote a directed arc (i, j) ∈ Aw, where nio →de ij i tso t dhee nhoetaed a aa dndir encotdede j risc t(hie, dependent. [sent-56, score-0.122]
18 A configuration is defined relative to some input ≤ ≤ × string w, and is a triple (σ, β, A), where σ and β are disjoint lists called stack and buffer, respectively, and A ⊆ Vw Vw is a set of arcs. [sent-58, score-0.599]
19 Given an input string w, a parser based on S incrementally processes w from left to right, starting in the initial configuration I(w). [sent-61, score-0.277]
20 At each step, the parser nondeterministically applies one transition, or else it stops if it has reached some terminal configuration. [sent-62, score-0.102]
21 The dependency graph defined by the arc set associated with a terminal configuration is then returned as one possible analysis for w. [sent-63, score-0.397]
22 , cm, m ≥ 1, of configurations such that, for every iwith, m1 ≤ ≥i ≤ m, ci−1 ‘ti ci nfosr some ti ∈ fTor. [sent-67, score-0.129]
23 eInv rotyh ier w words, ie ≤ach m configuration in a computation is obtained as the value of the preceding configuration under some transition. [sent-68, score-0.31]
24 A computation is called complete whenever c0 = I(w) for some inu, put string w, and cm ∈ Ct. [sent-69, score-0.259]
25 We can view a ∈tr Cansition-based dependency parser as a device mapping strings into graphs (dependency trees). [sent-70, score-0.151]
26 Without any restriction on transition functions in T, these functions might have an infinite domain, and could thus encode even nonrecursively enumerable languages. [sent-71, score-0.4]
27 However, in standard practice for natural language parsing, transitions are always specified by some finite mean. [sent-72, score-0.48]
28 In particular, the definition of each transition depends on some finite window at the top of the stack and some finite window at the beginning of the buffer in each configuration. [sent-73, score-1.035]
29 In this case, we can view a transition-based dependency parser as a notational variant of a push-down transducer (Hopcroft et al. [sent-74, score-0.225]
30 , 2000), whose computations output sequences that directly encode dependency trees. [sent-75, score-0.248]
31 These transducers are nondeterministic, meaning that several transitions can be applied to some configurations. [sent-76, score-0.372]
32 The transition systems we investigate in this paper follow these principles. [sent-77, score-0.34]
33 We denote the stack with its topmost element to the right and the buffer with its first ele- ment to the left. [sent-79, score-0.691]
34 We indicate concatenation in the stack and buffer by a vertical bar. [sent-80, score-0.495]
35 For example, for k ∈ Vw, σ|k denotes some stack with topmost elemke ∈nt V k a,n σd| k|β ndoetneoste sos some c bkuf wfeirth hw toithpm mfiorsstt eellee-ment kk. [sent-81, score-0.537]
36 2 A Non-projective Transition System We now turn to give a description of our transition system for non-projective parsing. [sent-87, score-0.401]
37 While a projective dependency tree satisfies the requirement that, for every arc in the tree, there is a directed path between its headword and each of the words between the two endpoints of the arc, a nonprojective dependency tree may violate this condition. [sent-88, score-0.455]
38 This idea is demonstrated by Attardi (2006), who proposes a transition system whose individual transitions can deal with non-projective dependencies only to a limited extent, depending on the distance in the stack of the nodes involved in the newly con- structed dependency. [sent-90, score-1.05]
39 The author defines this distance as the degree of the transition, with transitions of degree one being able to handle only projective dependencies. [sent-91, score-0.753]
40 This formulation permits parsing a subset of the non-projective trees, where this subset depends on the degree of the transitions. [sent-92, score-0.267]
41 The reported coverage in Attardi (2006) is already very high when the system is restricted to transitions of degree two or three. [sent-93, score-0.596]
42 For instance, on training data for Czech containing 28,934 non-projective relations, 27,181 can be handled by degree two transitions, and 1,668 additional dependencies can be handled by degree three transitions. [sent-94, score-0.382]
43 We now turn to describe our variant of the transition system ofAttardi (2006), which is equivalent to the original system restricted to transitions of degree two. [sent-96, score-1.028]
44 It is not difficult to extend our algorithms (§4) to higher degree transitions, nbudt otuhirs comes mats st (h§e4 expense eofr higher complexity. [sent-98, score-0.19]
45 m Let w = a0 · · · an−1 be an input string over Σ defined as in §2. [sent-100, score-0.131]
46 Our transition sys- tdeemfi nfeodr non-projective dependency parsing is S(np) = (C,T(np),I(np), Ct(np)), 1236 Table 1: The number of non-projective relations of various degrees for several treebanks (training sets), as reported by the parser of Attardi (2006). [sent-102, score-0.611]
47 ’ The parser did not detect non-projective relations of degree higher than 4. [sent-105, score-0.228]
48 where C is the same set of configurations defined in §2. [sent-106, score-0.141]
49 The initialization function I(np) maps each string w Ttoh teh ien i ntiiatli azla configuration ([¢] , β0, ∅). [sent-108, score-0.24]
50 The sstertinogft ewrm toin thael c ionnitfiiaglu craotniofignsu (c[o¢n],taβins all configurations of the form ([¢, 0] , [] , A), for any set of fairgcusr Aati. [sent-109, score-0.096]
51 The set of transition functions is defined as rCatt(niopn) T(np) = {shb | b ∈ Σ} ∪ {la1, ra1 , la2, ra2}, where each transition is specified below. [sent-110, score-0.791]
52 ∪ Each of the above transitions is undefined on configurations that do not match the forms specified above. [sent-112, score-0.504]
53 As an example, transition la2 is not defined for a configuration (σ, β, A) with |σ| ≤ 2, and transition shb i sg nroatti odne f(iσne,dβ Afor) a configuration (σ, k|β, A) withi bs ak, or nfeodr a configuration (σ, [] , A). [sent-113, score-1.389]
54 Tthra bn6 =siti aon shb removes the first node from the buffer, in case this node represents symbol b ∈ Σ, and pushes it into the stack. [sent-114, score-0.425]
55 The remaining four transitions are called reduce transitions, i. [sent-116, score-0.434]
56 Notice that in the transition system at hand all the reduce transitions decrease the size of the stack by one element. [sent-119, score-1.08]
57 Transition la1 creates a new arc with the topmost node on the stack as the head and the secondtopmost node as the dependent, and removes the latter from the stack. [sent-120, score-0.698]
58 Transitions la1 and ra1 have degree one, as already explained. [sent-122, score-0.166]
59 Transition la2 and transition ra2 are very similar to la1 and ra1, respectively, but with the difference that they create a new arc between the topmost node in the stack and a node which is two positions below the topmost node. [sent-124, score-1.2]
60 Hence, these transitions have degree two, and are the key components in parsing of non-projective dependencies. [sent-125, score-0.603]
61 = We turn next to describe the equivalence between our system and the system in Attardi (2006). [sent-126, score-0.12]
62 The transition-based parser presented by Attardi pushes back into the buffer elements that are in the top position of the stack. [sent-127, score-0.337]
63 However, a careful analysis shows that only the first position in the buffer can be affected by this operation, in the sense that elements that are pushed back from the stack are never found in buffer positions other than the first. [sent-128, score-0.803]
64 This means that we can consider the first element of the buffer as an additional stack element, always sitting on the top of the top-most stack symbol. [sent-129, score-0.831]
65 We extend this and defin)e ‘ an isomorphism between computations in both systems, such that a computation c0, . [sent-131, score-0.211]
66 , cm in the original parser is mapped to a com- putation mc(c0) , . [sent-134, score-0.151]
67 , mc(cm) in the variant, with both generating the same dependency graph A. [sent-137, score-0.089]
68 This 1237 1 2 3 2n − 2 2n − 1 2n Figure 1: A dependency structure of arbitrary gap degree that can be parsed with Attardi’s parser. [sent-138, score-0.352]
69 A relevant property of the set of dependency structures that can be processed by Attardi’s parser, even when restricted to transitions of degree two, is that the number of discontinuities present in each of their subtrees, defined as the gap degree by Bodirsky et al. [sent-140, score-0.936]
70 As mentioned in § 1, the computational complexity Aosf mthee dynamic programming algorithm tohmatp lwexillbe described in later sections does not depend on the gap degree, contrary to the non-projective dependency chart parsers presented by G ´omez-Rodr ı´guez et al. [sent-143, score-0.315]
71 (2009) and by Kuhlmann and Satta (2009), whose running time is exponential in the maximum gap degree allowed by the grammar. [sent-144, score-0.269]
72 ra2 3 A Generative Probabilistic Model In this section we introduce a generative probabilistic model based on the transition system of §2. [sent-145, score-0.452]
73 Iisnt cfor mmoadle language theory, athnseirteio ins a ssttaenmda orfd way of giving a probabilistic interpretation to a nondeterministic parser whose computations are based on sequences ofelementaryoperations suchas transitions. [sent-147, score-0.358]
74 The idea is to define conditional probability distributions over instances of the transition functions, and to ‘combine’ these probabilities to assign probabilities to computations and strings. [sent-148, score-0.499]
75 One difficulty we have to face with when dealing with transition systems is that the notion of compu- tation, defined in §2. [sent-149, score-0.385]
76 This is a pitfall to generative modeling, where we are interested in a system whose computations lead to the generation of any string. [sent-152, score-0.242]
77 To overcome this problem, we observe that each computation, defined as a sequence of stacks and buffers (the configurations) can equivalently be expressed as a sequence of stacks and transitions. [sent-153, score-0.155]
78 Let σi, be the stack associated with ci, for, emac h≥ ≥i 1w. [sent-158, score-0.307]
79 Let also Cσ be the set o, ff oarll e satcahck is waistshoc 0ia ≤ted i w ≤it hm configurations in C. [sent-160, score-0.096]
80 We can make explicit the transitions that have been used in the computation by rewriting γ in the form σ0 ‘t1 σ1 · · · σm−1 ‘tm σm. [sent-161, score-0.424]
81 In this way, γ generates a string t·hσat is composed by all symbols that are pushed into the stack by transitions shb, in the left to right order. [sent-162, score-0.839]
82 (1) Yi=1 To assign probabilities to complete computations we should further multiply p(γ) by factors ps (σ0) and pe(σm), where ps and pe are start and end probability distributions, respectively, both defined over Cσ. [sent-164, score-0.231]
83 2, all initial configurations are hasast,o acsiat deedfi nweidth i nsta §c2k. [sent-166, score-0.096]
84 , the probabilities of the transition operations do not depend on the time step. [sent-171, score-0.34]
85 As a second step we observe that, according to the definition of transition system, each t ∈ T has an indfinefiitnei tdioomn oaifn tr. [sent-172, score-0.34]
86 nAs commonly adopted ∈so Tlu htiaosn a ins to introduce a special function, called history function and denoted by H, defined over the set Cσ and taking values over some finite set. [sent-173, score-0.225]
87 Since H is finitely valued, and since T is a finite set, the above condition guarantees that there will only be a finite number of parameters p(t | σ) in our model. [sent-176, score-0.144]
88 Sraom featre we h(ta |ve σ presented a general discussion of how to turn a transition-based parser into a generative probabilistic model, and have avoided further ,l 1238 specification of the history function. [sent-177, score-0.213]
89 We now turn our attention to the non-projective transition system of §2. [sent-178, score-0.401]
90 We start the discussion with a na¨ ıve model using a history function defined by a fixed size window over the topmost portion of the stack. [sent-181, score-0.28]
91 More precisely, each transition is conditioned on the lexical form of the three symbols at the top of the stack σ, indicated as b3, b2, b1 ∈ Σ below, with b1 referring to the topmost symbol. [sent-182, score-0.845]
92 (4) Xb∈Σ Intuitively, parameter θrbd denotes the probability that we perform a reduce transition instead of a shift transition, given that we have seen lexical form b at the top of the stack. [sent-195, score-0.43]
93 Similarly, parameter denotes the probability that we perform a reduce transition of degree 2 (see §2. [sent-196, score-0.567]
94 2) instead of a reduce trans- θbrd2,2b1 iittiioonn ooff degree 1, given )th iants we h oafv ae seen ele txraincaslforms b1 and b2 at the top of the stack. [sent-197, score-0.196]
95 Tabular parsing We present here a dynamic programming algorithm for simulating the computations of the system from §2–3. [sent-200, score-0.386]
96 a compact representation uofr tahlgeo rsietth Γ(w), defined as the set of all possible computations of the model when processing w. [sent-202, score-0.204]
97 In combination with the appropriate semirings, this method can provide for instance the highest probability computation in Γ(w), or else the probability of w, defined as the sum of all probabilities of computations in Γ(w). [sent-203, score-0.256]
98 We follow a standard approach in the literature on dynamic programming simulation of stack-based automata (Lang, 1974; Tomita, 1986; Billot and Lang, 1989). [sent-204, score-0.167]
99 mh ,u2cimn h3stackbusfize rsizej refub Figure 2: Schematic representation of the computations γ associated with item [h1, i, h2h3, j] . [sent-207, score-0.159]
100 The basic idea in this approach is to decompose computations of the parser into smaller parts, group them into equivalence classes and recombine to obtain larger parts of computations. [sent-210, score-0.249]
wordName wordTfidf (topN-words)
[('transitions', 0.372), ('transition', 0.34), ('vw', 0.31), ('stack', 0.307), ('attardi', 0.263), ('shb', 0.222), ('buffer', 0.188), ('topmost', 0.167), ('degree', 0.166), ('computations', 0.159), ('configuration', 0.129), ('guez', 0.111), ('kuhlmann', 0.108), ('mc', 0.107), ('configurations', 0.096), ('arc', 0.094), ('cm', 0.089), ('dependency', 0.089), ('string', 0.086), ('nivre', 0.078), ('finite', 0.072), ('satta', 0.071), ('gap', 0.071), ('dynamic', 0.068), ('parsing', 0.065), ('np', 0.063), ('programming', 0.063), ('parser', 0.062), ('aw', 0.058), ('nonprojective', 0.056), ('nfeodr', 0.055), ('stacks', 0.055), ('computation', 0.052), ('generative', 0.052), ('projective', 0.049), ('sagae', 0.049), ('nondeterministic', 0.048), ('tabular', 0.048), ('node', 0.046), ('ve', 0.046), ('defined', 0.045), ('elements', 0.044), ('pushes', 0.043), ('crossing', 0.043), ('notational', 0.043), ('pushed', 0.043), ('transitionbased', 0.043), ('terminal', 0.04), ('history', 0.04), ('gw', 0.04), ('lang', 0.038), ('removes', 0.038), ('xb', 0.038), ('ins', 0.036), ('specified', 0.036), ('simulation', 0.036), ('permits', 0.036), ('semirings', 0.034), ('ak', 0.034), ('ci', 0.033), ('positions', 0.033), ('exponential', 0.032), ('hw', 0.032), ('called', 0.032), ('ct', 0.032), ('system', 0.031), ('variant', 0.031), ('symbols', 0.031), ('denotes', 0.031), ('symbol', 0.03), ('turn', 0.03), ('reduce', 0.03), ('functions', 0.03), ('element', 0.029), ('na', 0.029), ('probabilistic', 0.029), ('shift', 0.029), ('balance', 0.028), ('equivalence', 0.028), ('simplified', 0.028), ('window', 0.028), ('directed', 0.028), ('pe', 0.027), ('restricted', 0.027), ('normalization', 0.026), ('polynomial', 0.026), ('formalisms', 0.026), ('parsed', 0.026), ('tree', 0.025), ('initialization', 0.025), ('handled', 0.025), ('complexity', 0.024), ('encodes', 0.024), ('ted', 0.024), ('huang', 0.024), ('nar', 0.024), ('aotfe', 0.024), ('eofr', 0.024), ('cfor', 0.024), ('tahle', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
2 0.12622602 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
3 0.10395943 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
4 0.10242131 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
5 0.10001861 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
6 0.097883195 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
7 0.094418727 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
8 0.083285309 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
9 0.081785314 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
10 0.077840269 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
11 0.071981139 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
12 0.070773676 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
13 0.064189926 136 emnlp-2011-Training a Parser for Machine Translation Reordering
14 0.061054472 96 emnlp-2011-Multilayer Sequence Labeling
15 0.060647983 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
16 0.055195093 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
17 0.053845823 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
18 0.052645728 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
19 0.049972016 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
20 0.047632616 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
topicId topicWeight
[(0, 0.179), (1, 0.078), (2, -0.026), (3, 0.123), (4, 0.001), (5, 0.043), (6, -0.045), (7, -0.118), (8, 0.079), (9, -0.021), (10, -0.028), (11, 0.016), (12, 0.025), (13, 0.004), (14, 0.052), (15, -0.057), (16, 0.035), (17, 0.02), (18, 0.133), (19, -0.135), (20, 0.16), (21, -0.102), (22, 0.028), (23, -0.101), (24, 0.063), (25, 0.146), (26, -0.093), (27, 0.07), (28, 0.085), (29, 0.055), (30, -0.149), (31, 0.06), (32, 0.037), (33, 0.061), (34, 0.162), (35, 0.17), (36, 0.098), (37, -0.079), (38, -0.182), (39, 0.105), (40, 0.013), (41, 0.288), (42, -0.037), (43, -0.027), (44, 0.054), (45, -0.011), (46, -0.045), (47, 0.127), (48, -0.073), (49, 0.112)]
simIndex simValue paperId paperTitle
same-paper 1 0.957479 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
2 0.6311484 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
3 0.62015188 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
4 0.38193861 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
Author: Enrique Henestroza Anguiano ; Marie Candito
Abstract: This paper develops a framework for syntactic dependency parse correction. Dependencies in an input parse tree are revised by selecting, for a given dependent, the best governor from within a small set of candidates. We use a discriminative linear ranking model to select the best governor from a group of candidates for a dependent, and our model includes a rich feature set that encodes syntactic structure in the input parse tree. The parse correction framework is parser-agnostic, and can correct attachments using either a generic model or specialized models tailored to difficult attachment types like coordination and pp-attachment. Our experiments show that parse correction, combining a generic model with specialized models for difficult attachment types, can successfully improve the quality of predicted parse trees output by sev- eral representative state-of-the-art dependency parsers for French.
5 0.36343595 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
6 0.35432887 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
7 0.34739915 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
8 0.30087209 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
9 0.26558337 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
10 0.25745934 96 emnlp-2011-Multilayer Sequence Labeling
11 0.23381874 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
12 0.23252563 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
13 0.21640967 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
14 0.20622386 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
15 0.2043042 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
16 0.20417787 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
17 0.20296974 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
18 0.18964213 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
19 0.18917316 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
20 0.18557549 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
topicId topicWeight
[(23, 0.076), (36, 0.035), (37, 0.024), (45, 0.054), (53, 0.012), (54, 0.019), (62, 0.013), (64, 0.032), (66, 0.02), (69, 0.034), (79, 0.036), (82, 0.466), (90, 0.025), (96, 0.049), (98, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.88032871 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
2 0.87955761 129 emnlp-2011-Structured Sparsity in Structured Prediction
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.
3 0.64366239 107 emnlp-2011-Probabilistic models of similarity in syntactic context
Author: Diarmuid O Seaghdha ; Anna Korhonen
Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.
4 0.3655138 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
5 0.35884854 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
6 0.34993124 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
7 0.34566364 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
8 0.34150967 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
9 0.33933228 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
10 0.33783779 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
11 0.33612731 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
12 0.33428493 77 emnlp-2011-Large-Scale Cognate Recovery
13 0.33096978 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
14 0.32952669 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
15 0.32580486 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
16 0.32441109 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
17 0.31781697 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
18 0.31628066 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
19 0.3157281 106 emnlp-2011-Predicting a Scientific Communitys Response to an Article
20 0.3121013 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances