emnlp emnlp2011 emnlp2011-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
Reference: text
sentIndex sentText sentNum sentScore
1 Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. [sent-4, score-0.565]
2 This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. [sent-5, score-0.13]
3 1 Introduction Data-oriented Parsing (DOP) is an approach to wide-coverage parsing based on assigning structures to new sentences using fragments of variable size extracted from a treebank. [sent-8, score-0.46]
4 By formalizing the idea of using large fragments of earlier language experience to analyze new sentences, DOP captures an important property of language cogni- tion that has shaped natural language. [sent-20, score-0.417]
5 Later DOP models have used the Goodman transformation (Goodman, 1996, 2003) to obtain a compact representation of all fragments in the treebank (Bod, 2003; Bansal and Klein, 2010). [sent-28, score-0.596]
6 In this paper we present a novel DOP model (Double-DOP) in which we extract a restricted yet representative subset of fragments: those recurring at least twice in the treebank. [sent-30, score-0.105]
7 The explicit representation of the fragments allows us to derive simple ProceEed i n bgusr ogfh t,h Sec 2o0t1la1n Cd,o UnfKer, Jeunlcye o 2n7– E3m1,p 2ir0ic1a1l. [sent-31, score-0.417]
8 c e2th0o1d1s A ins Nocaitautiroanl L foarn Cguoamgpeu Ptartoicoensaslin Lgin,g puaigsetisc 8s4–95, ways ofestimating probabilistic models on top ofthe symbolic grammar. [sent-33, score-0.094]
9 The contributions of this paper are summarized as follows: (i) we describe an efficient tree-kernel algorithm which allows us to extract all recurring fragments, reducing the set of potential elementary units from the astronomical 1048 to around 106. [sent-36, score-0.216]
10 (ii) We implement and compare different DOP estimation techniques to induce a probability model (PTSG) on top of the extracted symbolic grammar. [sent-37, score-0.124]
11 (iii) We present a simple transformation of the extracted fragments into CFG-rules that allows us to use off- the-shelf PCFG parsing and inference. [sent-38, score-0.523]
12 In section 2 we describe the symbolic backbone of the grammar formalism that we will use for parsing. [sent-42, score-0.224]
13 In section 3 we illustrate the probabilistic extension of the grammar, including our transformation of PTSGs to PCFGs that allows us to use a standard PCFG parser, and a different transform that allows us to use a standard implementation of the insideoutside algorithm. [sent-43, score-0.132]
14 2 The symbolic backbone The basic idea behind DOP is to allow arbitrarily large fragments from a treebank to be the elementary units of production of the grammar. [sent-45, score-0.807]
15 Fragments can be combined through substitution to obtain the phrase-structure tree of a new sentence. [sent-46, score-0.139]
16 Figure 1 shows an example of a complete syntactic tree ob- tained by combining three elementary fragments. [sent-47, score-0.16]
17 As in previous work, two fragments fi and fj can be combined (fi ◦ fj) only if the leftmost substitution site X↓ in fi h◦as f the same label as the root node of fj ; i nX t↓h i sn case the resulting tree will correspond to 85 fi with fj replacing X. [sent-48, score-0.81]
18 1 Finding Recurring Fragments The first step to build a DOP model is to define its symbolic grammar, i. [sent-56, score-0.094]
19 In the current work we explicitly extract a subset of fragments from the training treebank. [sent-59, score-0.417]
20 To limit the fragment set size, we use a simple but heretofore unexplored constraint: we extract only those fragments that occur two or more times in the treebank1 . [sent-60, score-0.578]
21 Extracting this particular set of fragments is not trivial, though: a naive approach that filters a complete table of fragments together with their frequencies fails because that set, in a reasonably sized treebank, is astronomically large. [sent-61, score-0.868]
22 The algorithm iterates over every pair of trees in 1More precisely we extract only the largest shared fragments for all pairs of trees in the treebank. [sent-64, score-0.511]
23 All subtrees of these extracted fragments necessarily also occur at least twice, but they are only explicitly represented in our extracted set if they happen to form a largest shared fragment from another pair of trees. [sent-65, score-0.578]
24 Hence, if a large tree occurs twice in the treebank the algorithm will extract from this pair only the full tree as a fragment and not all its (exponentially many) subtrees. [sent-66, score-0.426]
25 t eseT ot mwhfae c ehkminelad r th c nkeoe dld tl wsecse ao. [sent-73, score-0.095]
26 ll-set containing a single index corresponds to the fragment including the node with that index together with all its children. [sent-109, score-0.193]
27 , used more than 5 million fragments up WtoW hdheeenpnt lho lo o1okk4ii)n. [sent-122, score-0.417]
28 p Sreeese anlsot fre- 3This is after the treebank has been preprocessed. [sent-138, score-0.116]
29 quent recurring constructions such as from NP to NP or whether S or not, together with infrequent overspecialized fragments like from Houston to NP, while missing large generic constructions such as everything you always wanted to know about NP but were afraid to ask. [sent-140, score-0.689]
30 These large constructions are excluded completely by models that only allow elementary trees up to a certain depth (typically 4 or 5) into the symbolic grammar (Zollmann and Sima’an, 2005; Zuidema, 2007; Borensztajn et al. [sent-141, score-0.448]
31 , 2009), or only elementary trees with exactly one lexical anchor (Sangati and Zuidema, 2009). [sent-142, score-0.149]
32 Depth / Words / Substitution Sites Figure 3: Distribution of the recurring fragments types according to several features: depth, number of words, and number of substitution sites. [sent-143, score-0.57]
33 Implicit grammars Goodman (1996, 2003) defined a transformation for some versions of DOP to an equivalent PCFG-based model, with the number of rules extracted from each parse tree linear in the size of the trees. [sent-145, score-0.256]
34 This transform, representing larger fragments only implicitly, is used in most recent DOP parsers (e. [sent-146, score-0.453]
35 Moreover, the transformed grammars differ from untransformed DOP grammars in that larger fragments are no longer explicitly represented. [sent-153, score-0.619]
36 Thus, the information that the idiomatic fragment (PP (IN “out”) (PP (IN “of”) (NP (NN “town”))))) occurs 3 times in WSJ sections 2-21, is distributed over 132 rules. [sent-155, score-0.161]
37 In addition, grammars that implicitly encode all fragments found in a treebank are strongly biased to over-represent big constructions: the great majority of the entire set of fragments belongs in fact to the largest tree in the treebank5. [sent-158, score-1.137]
38 In our Double-DOP approach, instead, the number of fragments extracted from each tree varies much less (it ranges between 4 and 1,759). [sent-161, score-0.475]
39 3 The probabilistic model Like CFG grammars, our symbolic model produces extremely many parse trees for a given test sentence. [sent-163, score-0.189]
40 For every nonterminal X in the treebank we have: = X p(f) 1 fX∈FX (2) where FX is the set of fragments bolic grammar , f1 f2, rooted in X. [sent-165, score-0.627]
41 , fn oft is a sequence in our sym- A derivation d = ofthe fragments that through left-most substitution produces t. [sent-169, score-0.545]
42 The probability of a derivation is computed as the product of 4Bansal and Klein (2010) address this issue for contiguous constructions by extending the Goodman transform with a ‘Packed Graph Encoding’ for fragments that “bottom out in terminals”. [sent-170, score-0.646]
43 5In fact, the number of extracted fragments increase exponentially with the size of the tree. [sent-172, score-0.417]
44 Rank of tree from train set Figure 4: Number of fragments extracted from each tree in sections 2-21 of the WSJ treebank, when considering all-fragments (dotted line) and recurring-fragments (solid line). [sent-173, score-0.533]
45 2 we describe ways of obtaining different probability distributions over the fragments in our grammar. [sent-178, score-0.447]
46 1 Parsing It is possible to define a simple transform of our probabilistic fragment grammar, such that off-theshelf parsers can be used. [sent-181, score-0.266]
47 In order to perform the PTSG/PCFG conversion, every fragment in our grammar must be mapped to a CFG rule which will keep the same probability as the original fragment. [sent-182, score-0.32]
48 The corresponding rule will have as the left hand side the root of the fragment and as the right hand side its yield, i. [sent-183, score-0.196]
49 It might occur that several fragments are mapped to the same CFG rule6. [sent-186, score-0.417]
50 In order to resolve this problem we need to map each ambiguous fragment to two unique CFG rules chained 6In binarized treebank we have 31,465 fragments types that are ambiguous in this sense. [sent-188, score-0.728]
51 To the first CFG rule in the chain we assign the probability of the fragment, while the second will receive probability 1, so the product gives back the original probability. [sent-190, score-0.095]
52 Such a transformed PCFG will generate the same derivations as the original PTSG grammar with idenour tical probabilities. [sent-192, score-0.171]
53 VP VBD VP NP NPPP VBD DTNPNN INPPNP DT NN IN NP “with” “with” m m VP VP NODE@7276 NODE@7276 VBD NODE@7277 NODE@7277 DT NN “with” NP VBD DT NN “with” NP Figure 5: Above: example of 2 ambiguous fragments mapping to the same CFG rule VP → VBD DT NN “mwapithp”in NgP t. [sent-196, score-0.452]
54 o oT thhee efi srsatm fragment occurs P5 →time VsB inD th DeT Ttra NinNing treebank, (e. [sent-197, score-0.161]
55 in the sentence was an executive with a manufacturing concern) while the second fragment oc- curs 4 times (e. [sent-199, score-0.161]
56 Below: the two pairs of CFG rules that are used to map the two fragments to separate CFG derivations. [sent-202, score-0.417]
57 2 Inducing probability distributions Relative Frequency Estimate (RFE) The simplest way to assign probabilities to fragments is to make them proportional to their counts7 in the training set. [sent-204, score-0.447]
58 When enforcing equation 2, that gives the 7We refer to the counts of each fragment as returned by extraction algorithm in section 2. [sent-205, score-0.161]
59 In particular, it does not yield the maximum likelihood solution, and when used as an estimator for an allfragments grammar, it is strongly biased since it assigns the great majority of the probability mass to big fragments (Johnson, 2002). [sent-208, score-0.558]
60 As illustrated in figure 4 this bias is much weaker when restricting the set of fragments with our approach. [sent-209, score-0.417]
61 Equal Weights Estimate (EWE) Various other ways of choosing the weights of a DOP grammar have been worked out. [sent-211, score-0.094]
62 Reestimation shifts probability mass between alternative parse trees for a sentence. [sent-217, score-0.158]
63 In contrast, our grammars consist of fragments of various size, and our training set consists of parse trees. [sent-218, score-0.552]
64 Reestimation here shifts probability mass between alternative derivations for a parse tree. [sent-219, score-0.16]
65 In step (b) the fragments in the grammar as well as the original parse trees in the treebank are “flattened” into bracket notation. [sent-221, score-0.773]
66 In step (c) each fragment is transformed into a CFG rule in the transformed meta-grammar, whose righthand side is constituted by the bracket notation of the fragment. [sent-222, score-0.337]
67 The left-hand side of the rule is constituted by the original root symbol R of the fragment raised to a metanonterminal R0. [sent-225, score-0.196]
68 The resulting PCFG generates trees in bracket notation, and we can run an of-the-shelf inside-outside algorithm by presenting it parse trees from the train corpus in bracket In the experiments that we report below we used the RFE from section 3, to generate the initial weights for the grammar. [sent-226, score-0.244]
69 3 Maximizing Objectives MPD The easiest objective in parsing, is to select the most probable derivation (MPD), obtained by maximizing equation 3. [sent-231, score-0.104]
70 MPP A DOP grammar can often generate the same parse tree t through different derivations D(t) = d1, d2, . [sent-232, score-0.249]
71 P(t) = X p(d) d∈XD(t) = X Yp(f) d∈XD(t) Yf∈d (7) An intuitive objective for a parser is to select, for a given sentence, the parse tree with highest probability according to equation 7, i. [sent-237, score-0.168]
72 However, we can approximate the MPP by deriving a list of k-best derivations, summing up the probabilities of those resulting in the same parse tree, and select the tree with maximum probability. [sent-240, score-0.106]
73 L/NC can be in fact maximized in: tˆ = argtmaxXP(lc) (8) where lc ranges over all labeled constituents in t and P(lc) is the marginalized probability of all the derivation trees in the grammar yielding the sentence under consideration which contains lc. [sent-244, score-0.347]
74 We store in each cell the probability of seeing every label in the grammar yielding the corresponding span, by marginalizing the probabilities of all the parse trees in the obtained k-best derivations that contains that label covering the same span. [sent-251, score-0.333]
75 We then compute the Viterbi-best parse maximizing equation 10. [sent-252, score-0.105]
76 S NP|S NP|S@NNP|NP DT|NP NNP|NP The Free VP|S NNP|NP VBD|VP French wore NP|VP NP|VP@NN|NP NNS|NP JJ|NP NN|NP bands black arm Figure 7: The binarized version of the tree in figure 1, with H=1 and P=1. [sent-260, score-0.16]
77 We apply a left binarization of the training treebank as in Matsuzaki et al. [sent-263, score-0.116]
78 Fragment extraction We extract the symbolic grammar and fragment frequencies from this preprocessed treebank as explained in section 2. [sent-275, score-0.465]
79 In the extracted grammar we have in total 1,029,342 recurring fragments and 17,768 unseen CFG rules. [sent-277, score-0.583]
80 We test several probability distributions over the fragments (section 3. [sent-278, score-0.447]
81 Here we compare the maximizing objectives presented in section 3. [sent-290, score-0.118]
82 Our best results with ML are obtained when removing fragments occurring less than 6 times (apart from CFG-rules) and when stopping at the second iteration. [sent-301, score-0.417]
83 This filtering is done in order to limit the number of big fragments in the grammar. [sent-302, score-0.459]
84 It is well known that IO for DOP tends to assign most of the probability mass to big fragments, quickly overfitting the training data. [sent-303, score-0.105]
85 Fragment frequency threshold Figure 9: Performance (on the development set) and size of Double-DOP when considering only fragments whose occurring frequency in the training treebank is above a specific threshold (x-axis). [sent-314, score-0.533]
86 For instance, at the righthand side of the plot a grammar is evaluated which included only 6754 fragments with a frequency > 100 as well as 39227 PCFG rules. [sent-316, score-0.545]
87 We also investigate how a further restriction on the set of extracted fragments influences the performance of our model. [sent-317, score-0.417]
88 In figure 9 we illustrate the performance of Double-DOP when restricting the grammar to fragments having frequencies greater than 1, 2, . [sent-318, score-0.511]
89 We can notice a rather sharp decrease in performance as the grammar becomes more and more compact. [sent-322, score-0.094]
90 Next, we present some results on various DoubleDOP grammars extracted from the same training treebank after refining it using the Berkeley statesplitting model14 (Petrov et al. [sent-323, score-0.237]
91 We observe in figure 10 that our grammar is able to benefit from the state splits for the first four levels of refinement, reaching the maximum score at cycle 4, where we improve over our base model. [sent-326, score-0.094]
92 For the last two data points, the treebank gets too refined, and using Double-DOP model on top of it, no longer improves accuracy. [sent-327, score-0.116]
93 We have also compared our best Double-DOP 14We use the Berkeley grammar labeler following the base settings for the WSJ: trees are right-binarized, H=0, and P=0. [sent-328, score-0.141]
94 5 Conclusions We have described Double-DOP, a novel DOP approach for parsing, which uses all constructions re- curring at least twice in a treebank. [sent-338, score-0.116]
95 This methodology is driven by the linguistic intuition that constructions included in the grammar should prove to be reusable in a representative corpus. [sent-339, score-0.177]
96 The extracted set of fragments is significantly smaller than in previous approaches. [sent-340, score-0.417]
97 Moreover constructions are explicitly represented, which makes them potentially good candidates as semantic or translation units to be used in other applications. [sent-341, score-0.125]
98 In this paper we have addressed all three obstacles: our efficient algorithm for identifying the recurrent fragments in a treebank runs in polynomial time. [sent-347, score-0.533]
99 What is the minimal set of fragments that achieves maximal parse accuracy? [sent-404, score-0.465]
100 Statistical parsing with a context-free grammar and word statistics. [sent-422, score-0.137]
wordName wordTfidf (topN-words)
[('dop', 0.476), ('fragments', 0.417), ('bod', 0.17), ('fragment', 0.161), ('mcp', 0.151), ('sima', 0.144), ('rfe', 0.144), ('np', 0.131), ('pcfg', 0.126), ('cfg', 0.122), ('goodman', 0.118), ('mpp', 0.117), ('treebank', 0.116), ('rens', 0.115), ('elementary', 0.102), ('ptsg', 0.101), ('ewe', 0.101), ('khalil', 0.101), ('grammar', 0.094), ('symbolic', 0.094), ('bansal', 0.092), ('grammars', 0.087), ('vbd', 0.085), ('reestimation', 0.084), ('constructions', 0.083), ('substitution', 0.081), ('zuidema', 0.078), ('berkeley', 0.074), ('willem', 0.072), ('recurring', 0.072), ('vp', 0.071), ('transform', 0.069), ('sangati', 0.065), ('lc', 0.064), ('transformation', 0.063), ('objectives', 0.061), ('petrov', 0.06), ('klein', 0.058), ('tree', 0.058), ('mrs', 0.058), ('tl', 0.057), ('wsj', 0.057), ('maximizing', 0.057), ('nn', 0.056), ('nnp', 0.054), ('bracket', 0.051), ('borensztajn', 0.05), ('remko', 0.05), ('derivations', 0.049), ('parse', 0.048), ('trees', 0.047), ('derivation', 0.047), ('pcfgs', 0.045), ('sites', 0.045), ('parsing', 0.043), ('fx', 0.043), ('fj', 0.043), ('units', 0.042), ('big', 0.042), ('dt', 0.041), ('ml', 0.041), ('arm', 0.039), ('ot', 0.038), ('yielding', 0.037), ('morristown', 0.037), ('estimator', 0.036), ('backbone', 0.036), ('parsers', 0.036), ('io', 0.035), ('rule', 0.035), ('fr', 0.034), ('binarized', 0.034), ('argtmaxxp', 0.034), ('astronomically', 0.034), ('dataoriented', 0.034), ('fir', 0.034), ('mpd', 0.034), ('overspecialized', 0.034), ('righthand', 0.034), ('statesplitting', 0.034), ('twice', 0.033), ('mass', 0.033), ('node', 0.032), ('parser', 0.032), ('fi', 0.031), ('probability', 0.03), ('software', 0.03), ('kernels', 0.029), ('adaptor', 0.029), ('developments', 0.029), ('donnell', 0.029), ('lari', 0.029), ('ptsgs', 0.029), ('scha', 0.029), ('wore', 0.029), ('yamangil', 0.029), ('constituents', 0.028), ('transformed', 0.028), ('depth', 0.028), ('cell', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
Author: Spence Green ; Marie-Catherine de Marneffe ; John Bauer ; Christopher D. Manning
Abstract: Multiword expressions (MWE), a known nuisance for both linguistics and NLP, blur the lines between syntax and semantics. Previous work on MWE identification has relied primarily on surface statistics, which perform poorly for longer MWEs and cannot model discontinuous expressions. To address these problems, we show that even the simplest parsing models can effectively identify MWEs of arbitrary length, and that Tree Substitution Grammars achieve the best results. Our experiments show a 36.4% F1 absolute improvement for French over an n-gram surface statistics baseline, currently the predominant method for MWE identification. Our models are useful for several NLP tasks in which MWE pre-grouping has improved accuracy. 1
3 0.10125594 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
Author: Wenbin Jiang ; Qun Liu ; Yajuan Lv
Abstract: We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy rather than a strained grammar directly projected from the source language syntax. Based on this assumption, a novel constituency projection method is also proposed in order to induce a projected constituent treebank from the source-parsed bilingual corpus. Experiments show that, the parser trained on the projected treebank dramatically outperforms previous projected and unsupervised parsers.
4 0.098850608 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
Author: Jun Xie ; Haitao Mi ; Qun Liu
Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state- of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrasebased model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-theart translation models.
5 0.098279692 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
Author: Benjamin Borschinger ; Bevan K. Jones ; Mark Johnson
Abstract: It is often assumed that ‘grounded’ learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language’s canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.1
6 0.098141626 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
7 0.09036389 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
8 0.088700138 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
9 0.085957564 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
10 0.084774606 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
11 0.082368046 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
12 0.077684574 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
13 0.077066623 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
14 0.076259673 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
15 0.075772002 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
16 0.075125791 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
17 0.074252792 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
18 0.070823282 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
19 0.069289051 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
20 0.066684827 125 emnlp-2011-Statistical Machine Translation with Local Language Models
topicId topicWeight
[(0, 0.213), (1, 0.08), (2, -0.025), (3, 0.088), (4, -0.019), (5, 0.025), (6, -0.172), (7, 0.037), (8, 0.138), (9, -0.068), (10, 0.035), (11, 0.069), (12, -0.021), (13, 0.048), (14, 0.077), (15, -0.121), (16, 0.037), (17, 0.057), (18, -0.139), (19, 0.054), (20, -0.003), (21, -0.013), (22, 0.017), (23, -0.005), (24, -0.049), (25, 0.131), (26, -0.082), (27, 0.121), (28, 0.058), (29, 0.102), (30, -0.099), (31, 0.061), (32, -0.019), (33, -0.051), (34, -0.168), (35, 0.046), (36, 0.064), (37, -0.028), (38, -0.07), (39, -0.093), (40, -0.087), (41, -0.146), (42, -0.049), (43, 0.032), (44, 0.107), (45, 0.039), (46, 0.091), (47, -0.117), (48, 0.012), (49, -0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.94340056 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
2 0.60791922 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
Author: Benjamin Borschinger ; Bevan K. Jones ; Mark Johnson
Abstract: It is often assumed that ‘grounded’ learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language’s canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.1
3 0.56550062 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
4 0.48347571 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
Author: Sze-Meng Jojo Wong ; Mark Dras
Abstract: Attempts to profile authors according to their characteristics extracted from textual data, including native language, have drawn attention in recent years, via various machine learning approaches utilising mostly lexical features. Drawing on the idea of contrastive analysis, which postulates that syntactic errors in a text are to some extent influenced by the native language of an author, this paper explores the usefulness of syntactic features for native language identification. We take two types of parse substructure as features— horizontal slices of trees, and the more general feature schemas from discriminative parse reranking—and show that using this kind of syntactic feature results in an accuracy score in classification of seven native languages of around 80%, an error reduction of more than 30%.
5 0.46560907 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
Author: Wenbin Jiang ; Qun Liu ; Yajuan Lv
Abstract: We propose a relaxed correspondence assumption for cross-lingual projection of constituent syntax, which allows a supposed constituent of the target sentence to correspond to an unrestricted treelet in the source parse. Such a relaxed assumption fundamentally tolerates the syntactic non-isomorphism between languages, and enables us to learn the target-language-specific syntactic idiosyncrasy rather than a strained grammar directly projected from the source language syntax. Based on this assumption, a novel constituency projection method is also proposed in order to induce a projected constituent treebank from the source-parsed bilingual corpus. Experiments show that, the parser trained on the projected treebank dramatically outperforms previous projected and unsupervised parsers.
6 0.45653331 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
8 0.41851526 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
9 0.41423237 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
10 0.40872717 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
11 0.38325351 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
12 0.36705461 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
13 0.36259735 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
14 0.35656482 131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
15 0.34468758 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
16 0.34404337 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
17 0.31461972 74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering
18 0.30492368 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.30392244 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
20 0.2980088 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
topicId topicWeight
[(23, 0.067), (36, 0.029), (37, 0.029), (45, 0.053), (53, 0.031), (54, 0.022), (57, 0.014), (62, 0.021), (64, 0.032), (66, 0.06), (69, 0.027), (79, 0.061), (82, 0.022), (87, 0.013), (90, 0.378), (96, 0.04), (98, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.84840393 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
Author: Federico Sangati ; Willem Zuidema
Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
2 0.81630635 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
3 0.42310375 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
Author: Spence Green ; Marie-Catherine de Marneffe ; John Bauer ; Christopher D. Manning
Abstract: Multiword expressions (MWE), a known nuisance for both linguistics and NLP, blur the lines between syntax and semantics. Previous work on MWE identification has relied primarily on surface statistics, which perform poorly for longer MWEs and cannot model discontinuous expressions. To address these problems, we show that even the simplest parsing models can effectively identify MWEs of arbitrary length, and that Tree Substitution Grammars achieve the best results. Our experiments show a 36.4% F1 absolute improvement for French over an n-gram surface statistics baseline, currently the predominant method for MWE identification. Our models are useful for several NLP tasks in which MWE pre-grouping has improved accuracy. 1
5 0.40634537 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
Author: Benjamin Borschinger ; Bevan K. Jones ; Mark Johnson
Abstract: It is often assumed that ‘grounded’ learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language’s canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.1
6 0.3950378 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
7 0.39057344 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
8 0.39049706 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
9 0.38649756 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
10 0.37048352 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
11 0.36976004 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
12 0.36556754 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
13 0.36469856 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
14 0.36374685 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
15 0.36313909 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
16 0.36001027 107 emnlp-2011-Probabilistic models of similarity in syntactic context
17 0.35663348 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
18 0.35631257 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
19 0.35470444 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
20 0.35442856 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction