acl acl2010 acl2010-46 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
Reference: text
sentIndex sentText sentNum sentScore
1 edu i Abstract We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. [sent-4, score-0.509]
2 These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. [sent-5, score-0.122]
3 We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. [sent-6, score-0.577]
4 We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar. [sent-7, score-0.564]
5 1 Introduction Given an aligned corpus of tree pairs, we might want to learn a mapping between the paired trees. [sent-8, score-0.18]
6 Such induction of tree mappings has application in a variety of natural-language-processing tasks including machine translation, paraphrase, and sentence compression. [sent-9, score-0.232]
7 The induced tree mappings can be expressed by synchronous grammars. [sent-10, score-0.32]
8 Where the tree pairs are isomorphic, synchronous context-free grammars (SCFG) may suffice, but in general, non-isomorphism can make the problem of rule extraction difficult (Galley and McKeown, 2007). [sent-11, score-0.549]
9 More expressive formalisms such as synchronous tree-substitution (Eisner, 2003) or treeadjoining grammars may better capture the pairings. [sent-12, score-0.299]
10 In this work, we explore techniques for inducing synchronous tree-substitution grammars (STSG) using as a testbed application extractive sentence compression. [sent-13, score-0.542]
11 1 These elementary tree pairs serve as the rules of the extracted grammar. [sent-15, score-0.671]
12 For SCFG, segmentation is trivial each parent with its immediate children is an elementary tree but the formalism then restricts us to deriving isomorphic tree pairs. [sent-16, score-0.905]
13 STSG is much more expressive, especially if we allow some elementary trees on the source or target side to be unsynchronized, so that insertions and deletions can be modeled, but the segmentation and alignment problems become nontrivial. [sent-17, score-0.711]
14 Previous approaches to this problem have treated the two steps grammar extraction and weight estimation with a variety of methods. [sent-18, score-0.067]
15 — — — — One approach is to use word alignments (where these can be reliably estimated, as in our testbed application) to align subtrees and extract rules (Och and Ney, 2004; Galley et al. [sent-19, score-0.288]
16 Once a given set of rules is extracted, weights can be imputed using a discriminative approach to maximize the (joint or conditional) likelihood or the classification margin in the training data (taking or not taking into account the derivational ambiguity). [sent-24, score-0.116]
17 First, EM search over the space of all possible rules is computationally impractical. [sent-29, score-0.116]
18 Second, even if such a search were practical, the method is degenerate, pushing the probability mass towards larger rules in order to better approximate the empirical distribution of the data (Goldwater et al. [sent-30, score-0.171]
19 Indeed, the optimal grammar would be one in which each tree pair in the training data is its own rule. [sent-33, score-0.213]
20 Therefore, proposals for using EM for this task start with a precomputed subset of rules, and with EM used just to assign weights within this grammar. [sent-34, score-0.035]
21 In summary, previous methods suffer from problems of narrowness of search, having to restrict the space of possible rules, and overfitting in preferring overly specific grammars. [sent-35, score-0.307]
22 We pursue the use of hierarchical probabilistic models incorporating sparse priors to simultaneously solve both the narrowness and overfitting problems. [sent-36, score-0.336]
23 Such models have been used as generative solutions to several other segmentation problems, ranging from word segmentation (Goldwater et al. [sent-37, score-0.233]
24 Interestingly, samplingbased nonparametric inference further allows the possibility of searching over the infinite space of grammars (and, in machine translation, possible word alignments), thus side-stepping the narrowness problem outlined above as well. [sent-43, score-0.408]
25 In this work, we use an extension of the aforementioned models of generative segmentation for STSG induction, and describe an algorithm for posterior inference under this model that is tailored to the task of extractive sentence compression. [sent-44, score-0.54]
26 This task is characterized by the availability of word alignments, providing a clean testbed for investigating the effects of grammar extraction. [sent-45, score-0.176]
27 We achieve substantial improvements against a number of baselines including EM, support vector machine (SVM) based discriminative training, and variational Bayes (VB). [sent-46, score-0.047]
28 Our results are thus not only encouraging for grammar estimation using sparse priors but also illustrate the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar. [sent-48, score-0.643]
29 In the following, we define the task of extractive sentence compression and the Bayesian STSG model, and algorithms we used for inference and prediction. [sent-49, score-0.481]
30 We then describe the experiments in extractive sentence compression and present our results in contrast with alternative algorithms. [sent-50, score-0.422]
31 We conclude by giving examples of compression patterns learned by the Bayesian method. [sent-51, score-0.217]
32 2 Sentence compression Sentence compression is the task of summarizing a sentence while retaining most of the informational content and remaining grammatical (Jing, 2000). [sent-52, score-0.485]
33 In extractive sentence compression, which we focus on in this paper, an order-preserving subset of the words in the sentence are selected to form the summary, that is, we summarize by deleting words (Knight and Marcu, 2002). [sent-53, score-0.256]
34 An example sentence pair, which we use as a running example, is the following: • Like FaceLift, much of ATM’s screen performance depends on t ohfe A underlying application. [sent-54, score-0.099]
35 Like FaceLift, much of • ATM’s screen performance depends on the underlying application. [sent-55, score-0.048]
36 938 Figure 1: A portion of an STSG derivation of the example sentence and its extractive compression. [sent-56, score-0.306]
37 In supervised sentence compression, the goal is to generalize from a parallel training corpus of sentences (source) and their compressions (target) to unseen sentences in a test set to predict their compressions. [sent-58, score-0.051]
38 3 The STSG Model Synchronous tree-substitution grammar is a formalism for synchronously generating a pair of non-isomorphic source and target trees (Eisner, 2003). [sent-61, score-0.381]
39 We use square bracketed indices to represent the align→ ment γ of frontier nodes NP[1] aligns with NP[1], VP[2] aligns with VP[2], NP[? [sent-65, score-0.395]
40 -aligned target nodes are used to represent insertions into the target tree. [sent-69, score-0.185]
41 can be used to continue deriving the deleted subtree. [sent-72, score-0.091]
42 See Figure 1for an example of how an STSG with these rules would operate in synchronously generating our example sentence pair. [sent-73, score-0.284]
43 STSG is a convenient choice of formalism for a number of reasons. [sent-74, score-0.07]
44 Second, the ability to have rules deeper than one level provides a principled way of modeling lexicalization, whose importance has been emphasized (Galley and McKeown, 2007; Yamangil and Nelken, 2008). [sent-76, score-0.116]
45 Third, we may have our STSG operate on trees instead of sentences, which allows for efficient parsing algorithms, as well as providing syntactic analyses for our predictions, which is desirable for automatic evaluation purposes. [sent-77, score-0.126]
46 A straightforward extension of the popular EM algorithm for probabilistic context free grammars (PCFG), the inside-outside algorithm (Lari and Young, 1990), can be used to estimate the rule weights of a given unweighted STSG based on a corpus ofparallel parse trees t = t1, . [sent-78, score-0.297]
47 Similarly, an 939 Figure 2: Gibbs sampling updates. [sent-85, score-0.058]
48 We illustrate a sampler move to align/unalign a source node with a target node (top row in blue), and split/merge a deletion rule via aligning with ? [sent-86, score-0.228]
49 extension of the Viterbi algorithm is available for finding the maximum probability derivation, useful for predicting the target analysis tN+1,t for a test instance tN+1,s. [sent-88, score-0.076]
50 (Eisner, 2003) However, as noted earlier, EM is subject to the narrowness and overfitting problems. [sent-89, score-0.237]
51 1 The Bayesian generative process Both of these issues can be addressed by taking a nonparametric Bayesian approach, namely, assuming that the elementary tree pairs are sampled from an independent collection of Dirichlet process (DP) priors. [sent-91, score-0.712]
52 We describe such a process for sampling a corpus of tree pairs t. [sent-92, score-0.257]
53 For all pairs of root labels c = cs/ct that we consider, where up to one of cs or ct can be ? [sent-93, score-0.224]
54 We then sample a sequence of elementary tree pairs to serve as a derivation for each observed derived tree pair. [sent-98, score-0.842]
55 , N, we sample elementary tree pairs en = en,1 , . [sent-102, score-0.635]
56 , en,dn in a derivation sequence (where dn is the number of rules used in the derivation), consulting Gc whenever an elementary tree pair with root c is to be sampled. [sent-105, score-0.792]
57 e ∼iid Gc, for all e whose root label is c Given the derivation sequence en, a tree pair tn is determined, that is, p(tn| en) =? [sent-106, score-0.463]
58 n,dnderives tn (2) The hyperparameters αc can be incorporated into the generative model as random variables; however, we opt to fix these at various constants to investigate different levels of sparsity. [sent-111, score-0.238]
59 For the base distribution P0(· | c) there are a variety of choices; we used the( following simple scenario. [sent-112, score-0.121]
60 ) Synchronous rules For the case where neither cs nor ct are the special symbol ? [sent-114, score-0.214]
61 , the base distribution first generates es and et independently, and then samples an alignment between the frontier nodes. [sent-115, score-0.419]
62 Given a nonterminal, an elementary tree is generated by first making a decision to expand the nonterminal (with probability βc) or to leave it as a frontier node (1 βc). [sent-116, score-0.801]
63 If the decision to expand was made, we sample an appropriate − rule from a PCFG which we estimate ahead 940 of time from the training corpus. [sent-117, score-0.174]
64 We expand the nonterminal using this rule, and then repeat the same procedure for every child generated that is a nonterminal until there are no generated nonterminal children left. [sent-118, score-0.328]
65 Finally, we sample an alignment between the frontier nodes uniformly at random out of all possible alingments. [sent-120, score-0.305]
66 , that is, we have a deletion rule, we need to generate e = es/? [sent-122, score-0.063]
67 ) The base distribution generates es using the same process described for synchronous rules above. [sent-125, score-0.496]
68 Then with probability 1we align all frontier nodes in es with ? [sent-126, score-0.339]
69 In essence, this process generates TSG rules, rather than STSG rules, which are used to cover deleted (or inserted) subtrees. [sent-128, score-0.057]
70 This simple base distribution does nothing to enforce an alignment between the internal nodes of es and et. [sent-129, score-0.312]
71 One may come up with more sophisticated base distributions. [sent-130, score-0.066]
72 However the main point of the base distribution is to encode a controllable preference towards simpler rules; we therefore make the simplest possible assumption. [sent-131, score-0.156]
73 2 Posterior inference via Gibbs sampling Assuming fixed hyperparameters α = {αc} and β = {βc}, our hinyfpeererpnacrea problem i s= =to { αfin}d atnhed posterior }di,st oruibru itniofner eonfc tehe p rdoebrlivemati oisn sequences e = e1, . [sent-133, score-0.211]
74 Applying Bayes’ rule, we have p(e | t) ∝ p(t | e)p(e) (3) where p(t | e) is a 0/1 distribution (2) which does nwohte depend on Gc, a/n1d d p(e) can b (e2 )o wbthaiicnhed d by collapsing Gc for all c. [sent-140, score-0.055]
75 The conditional prior of the i-th elementary tree pair given previously generated ones e1, . [sent-146, score-0.502]
wordName wordTfidf (topN-words)
[('stsg', 0.424), ('elementary', 0.356), ('compression', 0.217), ('gc', 0.216), ('synchronous', 0.174), ('narrowness', 0.159), ('frontier', 0.159), ('extractive', 0.154), ('tree', 0.146), ('tn', 0.143), ('facelift', 0.119), ('rules', 0.116), ('dp', 0.105), ('nonparametric', 0.102), ('derivation', 0.101), ('nonterminal', 0.094), ('segmentation', 0.089), ('trees', 0.088), ('rule', 0.088), ('grammars', 0.088), ('em', 0.085), ('es', 0.085), ('np', 0.083), ('atm', 0.079), ('iid', 0.079), ('synchronously', 0.079), ('yamangil', 0.079), ('overfitting', 0.078), ('bayesian', 0.077), ('testbed', 0.075), ('root', 0.073), ('aligns', 0.07), ('formalism', 0.07), ('preferring', 0.07), ('grammar', 0.067), ('base', 0.066), ('isomorphic', 0.064), ('sparse', 0.063), ('deletion', 0.063), ('inference', 0.059), ('sampling', 0.058), ('ct', 0.058), ('ei', 0.058), ('deleted', 0.057), ('galley', 0.056), ('gibbs', 0.056), ('distribution', 0.055), ('generative', 0.055), ('harvard', 0.054), ('merits', 0.054), ('alignment', 0.054), ('posterior', 0.054), ('alignments', 0.054), ('pairs', 0.053), ('scfg', 0.052), ('lexicalization', 0.052), ('parametric', 0.052), ('nodes', 0.052), ('sentence', 0.051), ('eisner', 0.05), ('screen', 0.048), ('denero', 0.048), ('insertions', 0.047), ('variational', 0.047), ('expand', 0.046), ('tailored', 0.045), ('goldwater', 0.045), ('bayes', 0.045), ('vp', 0.045), ('indices', 0.044), ('align', 0.043), ('target', 0.043), ('en', 0.04), ('hyperparameters', 0.04), ('maximization', 0.04), ('generality', 0.04), ('cs', 0.04), ('sample', 0.04), ('mckeown', 0.038), ('operate', 0.038), ('expressive', 0.037), ('priors', 0.036), ('cohn', 0.036), ('simplest', 0.035), ('induction', 0.035), ('minimality', 0.035), ('pnp', 0.035), ('oev', 0.035), ('evidenced', 0.035), ('exemplar', 0.035), ('mitigating', 0.035), ('precomputed', 0.035), ('turner', 0.035), ('characterized', 0.034), ('gildea', 0.034), ('pcfg', 0.034), ('source', 0.034), ('deriving', 0.034), ('aligned', 0.034), ('extension', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
2 0.41292146 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
3 0.27086404 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
4 0.25580779 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
5 0.18027888 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
6 0.14910918 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
7 0.1369759 133 acl-2010-Hierarchical Search for Word Alignment
8 0.13134341 162 acl-2010-Learning Common Grammar from Multilingual Corpus
9 0.1304376 69 acl-2010-Constituency to Dependency Translation with Forests
10 0.12877254 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
11 0.12239082 39 acl-2010-Automatic Generation of Story Highlights
12 0.12171952 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
13 0.11155129 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
14 0.10668432 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
15 0.097710416 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
16 0.093054652 236 acl-2010-Top-Down K-Best A* Parsing
17 0.092486389 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
18 0.091228314 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
19 0.090816624 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
20 0.090399094 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
topicId topicWeight
[(0, -0.247), (1, -0.196), (2, 0.067), (3, -0.027), (4, -0.145), (5, -0.058), (6, 0.197), (7, -0.011), (8, -0.026), (9, -0.233), (10, -0.004), (11, -0.223), (12, 0.071), (13, -0.108), (14, -0.011), (15, -0.104), (16, 0.067), (17, -0.12), (18, 0.078), (19, 0.057), (20, -0.101), (21, 0.091), (22, 0.142), (23, 0.107), (24, 0.086), (25, -0.175), (26, 0.073), (27, -0.068), (28, 0.007), (29, -0.03), (30, 0.037), (31, -0.019), (32, -0.022), (33, -0.007), (34, 0.02), (35, 0.085), (36, 0.029), (37, 0.053), (38, 0.006), (39, 0.147), (40, -0.14), (41, -0.049), (42, -0.032), (43, -0.003), (44, -0.006), (45, -0.063), (46, -0.048), (47, 0.085), (48, -0.059), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.95722884 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
2 0.86910498 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
3 0.85199261 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
4 0.70717216 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
5 0.67040771 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.50996524 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
7 0.50454593 162 acl-2010-Learning Common Grammar from Multilingual Corpus
8 0.49052146 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
9 0.47198406 67 acl-2010-Computing Weakest Readings
10 0.46884501 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
11 0.43664244 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
12 0.42999911 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
13 0.42022637 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
14 0.38233581 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
15 0.37693608 69 acl-2010-Constituency to Dependency Translation with Forests
16 0.37021455 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
17 0.36735162 133 acl-2010-Hierarchical Search for Word Alignment
18 0.35770178 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
19 0.33413598 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
20 0.32272807 39 acl-2010-Automatic Generation of Story Highlights
topicId topicWeight
[(4, 0.289), (14, 0.021), (25, 0.108), (33, 0.022), (42, 0.018), (44, 0.018), (59, 0.093), (73, 0.036), (78, 0.056), (83, 0.076), (84, 0.017), (98, 0.155)]
simIndex simValue paperId paperTitle
1 0.86689067 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
Author: Benoit Sagot ; Giorgio Satta
Abstract: Linear Context-Free Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fan-out (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the length of production right-hand side in LCFRSs with fan-out at most 2. This results in asymptotical running time improvement for known parsing algorithms for this class.
same-paper 2 0.83471215 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
3 0.82967395 61 acl-2010-Combining Data and Mathematical Models of Language Change
Author: Morgan Sonderegger ; Partha Niyogi
Abstract: English noun/verb (N/V) pairs (contract, cement) have undergone complex patterns of change between 3 stress patterns for several centuries. We describe a longitudinal dataset of N/V pair pronunciations, leading to a set of properties to be accounted for by any computational model. We analyze the dynamics of 5 dynamical systems models of linguistic populations, each derived from a model of learning by individuals. We compare each model’s dynamics to a set of properties observed in the N/V data, and reason about how assumptions about individual learning affect population-level dynamics.
4 0.77019274 11 acl-2010-A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm
Author: Marina Litvak ; Mark Last ; Menahem Friedman
Abstract: Automated summarization methods can be defined as “language-independent,” if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as “processing several languages, with summary in the same language as input.” In this paper, we introduce MUSE, a languageindependent approach for extractive summarization based on the linear optimization of several sentence ranking measures using a genetic algorithm. We tested our methodology on two languages—English and Hebrew—and evaluated its performance with ROUGE-1 Recall vs. state- of-the-art extractive summarization approaches. Our results show that MUSE performs better than the best known multilingual approach (TextRank1) in both languages. Moreover, our experimental results on a bilingual (English and Hebrew) document collection suggest that MUSE does not need to be retrained on each language and the same model can be used across at least two different languages.
5 0.65802622 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
6 0.6192376 71 acl-2010-Convolution Kernel over Packed Parse Forest
7 0.6165275 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.61558437 162 acl-2010-Learning Common Grammar from Multilingual Corpus
9 0.60576648 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
10 0.60510671 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
11 0.60501415 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
12 0.60148031 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
13 0.60144609 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
14 0.60009646 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
15 0.59924829 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
16 0.59785938 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
17 0.59779382 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
18 0.59724295 248 acl-2010-Unsupervised Ontology Induction from Text
19 0.59612763 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
20 0.59553117 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names