emnlp emnlp2010 emnlp2010-94 knowledge-graph by maker-knowledge-mining

94 emnlp-2010-SCFG Decoding Without Binarization


Source: pdf

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. [sent-3, score-0.209]

2 For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al. [sent-4, score-0.115]

3 A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. [sent-6, score-0.069]

4 , 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. [sent-8, score-0.254]

5 In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. [sent-9, score-0.03]

6 By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization. [sent-10, score-0.138]

7 1 Introduction At the heart of bottom-up chart parsing (Younger, 1967) is the following combinatorial problem. [sent-11, score-0.153]

8 We have a context-free grammar (CFG) rule (for instance, S → NP VP PP) and an input sentence of length n (for instance, “)o ann dth aen f ianspt jet snktie nocfe mr smith”). [sent-12, score-0.82]

9 During chart parsing, we need to apply the rule to all relevant subspans of the input sen? [sent-13, score-0.204]

10 For this particular rule, there are application contexts, i. [sent-17, score-0.057]

11 t parsing is at least linear in this quantity, it will take ? [sent-24, score-0.038]

12 646 NNonPP t VVhPPePfaPstPPjetskiofmrsmith … … … NPVPPP choice point choice poin…t … … choice point choice point … NPNPVVPPPPPP Figu? [sent-26, score-0.16]

13 application contexts for the CFG rule “S → NP VP P? [sent-31, score-0.199]

14 wahpepreli n ti so tnh ceo length ofofr rt thhee input s reunlete “nSce →. [sent-33, score-0.031]

15 at least O( ) = O(n4) time if we include this rule in our g? [sent-36, score-0.089]

16 In CNF, all rules have the form X → Y Z or X → x, where x is a terminthael faonrdm X, Y, →Z Y are noron Xte →rmi nxa,l ws. [sent-40, score-0.069]

17 rt parsing is = O(n3) when applied to CNF grammars. [sent-46, score-0.069]

18 A disadvantage to CNF conversion is that it increases both the overall number of rules and the ? [sent-47, score-0.069]

19 This inflation of the “grammar constant” does not affect the asymptotic runtime, but can have a significant impact on the performance in practice. [sent-52, score-0.051]

20 There are O(n) application contexts for CFG rule “S → the NPB of NNP”, and Oco(nnte2x) application cruolnete “xSts →for t hCeF GN rBule o f“S N N→P ”th,e a nJdJ NPB o)f a NppNlPica”t, ioifn we assume tohra tC FthGe input “sSent →enc teh eh JasJ length n and contains no repeated words. [sent-56, score-0.298]

21 LNF is a superclass of CNF that also allows rules whose right-hand sides have no consecutive nonterminals. [sent-59, score-0.103]

22 The intuition is that the terminals provide anchors that limit the applicability of a given rule. [sent-60, score-0.023]

23 For instance, consider the rule NP → the NPB of NNP. [sent-61, score-0.089]

24 Because trhulee t NerPmi →na tlsh ec NonPsBtra oinf our choices, uthreer 2e. [sent-63, score-0.045]

25 The implicit assumption is that input sentences will not repeat the same word more than a small constant number of times. [sent-65, score-0.077]

26 If we make the explicit assumption that all words of an input sentence are unique, then there are O(n2) application contexts for a “no consecutive nonterminals” rule. [sent-66, score-0.196]

27 Thus under this assumption, the running time of chart parsing is still O(n3) when applied to LNF grammars. [sent-67, score-0.196]

28 But once we make this assumption explicit, it becomes clear that we can go even further than LNF and still maintain the cubic bound on the runtime. [sent-68, score-0.182]

29 ) T application contexts, due to the anchoring effect of the terminals. [sent-71, score-0.057]

30 In general, for a rule of the form X → γ, ttheremrei are a. [sent-72, score-0.089]

31 t I mn gosent O(np) application contexts, w →he rγe, p is the number of consecutive nonterminal pairs in 647 the string X ·γ· X (where X is an arbitrary nonterminal). [sent-73, score-0.186]

32 Wrineg r eXfe ·rγ t·o X p as tehree scope no fa a i rutrlaer. [sent-74, score-0.054]

33 y T nhounst ecrhmaritparsing runs in time O(nscope(G)), where scope(G) is the maximum scope of any of the rules in CFG G. [sent-75, score-0.123]

34 Specifically, any scope-3 grammar can be decoded in cubic time. [sent-76, score-0.192]

35 , 2009), the target of our interest is synchronous context-free grammar (SCFG) decoding with rules extracted using the GHKM algorithm (Galley et al. [sent-78, score-0.25]

36 In practice, it turns out that only a small percentage of the lexical rules in our system have scope greater than 3. [sent-80, score-0.123]

37 By simply removing these rules from the grammar, we can maintain the cubic running time of chart parsing without any kind of binarization. [sent-81, score-0.42]

38 , 2009), we maintain the synchronous property of the grammar, and thus can integrate language model scoring into chart parsing. [sent-85, score-0.224]

39 Finally, a system without binarized rules is considerably simpler to build and maintain. [sent-86, score-0.092]

40 We show that this approach gives us better practical performance than a mature system that binarizes using the technique of (Zhang et al. [sent-87, score-0.023]

41 2 Preliminaries Assume we have a global vocabulary of symbols, containing the reserved substitution symbol ♦. [sent-89, score-0.031]

42 We will typically use space-delimited quotations to represent example sentences, e. [sent-91, score-0.023]

43 “the fast jet ski” rather than hthe, fast,jet, skii. [sent-93, score-0.54]

44 Define the rank of a sentence as the count of its ♦ symbols. [sent-98, score-0.026]

45 , sk) to denote the substitution of k sentences s1, . [sent-102, score-0.031]

46 For instance, if s = “the ♦ ♦ of ♦”, then SUB (s, “fast”, “jet ski”, “mr smith”) = “the fast jet ski of mr smith”. [sent-106, score-1.263]

47 To refer to a subsentence, define a span as a pair < [a, b] of nonnegative integers such that a b. [sent-107, score-0.042]

48 , sni and a span [a, b] such tah saet nbt ≤ n, sd =efin hes s[a,b] = hsa+1 , . [sent-111, score-0.025]

49 Both derive the sentence SUB(“on ♦”, SUB( “the ♦ ♦ of ♦”, “fast”, “jet ski”, “mr smith”) ) = “on the fast jet ski of mr smith”. [sent-116, score-1.263]

50 The SCFG derivation simultaneously derives the auxiliary sentence “sur le jet ski vite de m smith”. [sent-117, score-1.351]

51 3 Minimum Derivation Cost Chart parsing solves a problem which we will re- fer to as Minimum Derivation Cost. [sent-118, score-0.061]

52 Because we want our results to be applicable to both CFG decoding and SCFG decoding with an integrated language model, we will provide a somewhat more abstract formulation of chart parsing than usual. [sent-119, score-0.273]

53 A derivation is a tree of CFG rules, constructed so that the preconditions (the RHS nonterminals) of any rule match the postconditions (the LHS nonterminal) of its child rules. [sent-121, score-0.369]

54 The purpose of a derivation is to derive a sentence, which is obtained through recursive substitution. [sent-122, score-0.221]

55 In the example, we substitute “fast”, “jet ski”, and “mr smith” into the lexical pattern “the ♦ ♦ of ♦” to obtain “the fast jet ski of mr smith”. [sent-123, score-1.297]

56 Then we substitute this result into the lexical pattern “on ♦” to obtain “on the fast jet ski of mr smith”. [sent-124, score-1.297]

57 The cost of a derivation is simply the sum of the base costs of its rules. [sent-125, score-0.349]

58 Thus the cost of the CFG derivation in Figure 3 is C1 + C2 + C3 + C4 + C5, where C1 is the base cost of rule “PP → on NP”, etc. [sent-126, score-0.544]

59 Notice thaits th thise cboasste can tb oef d riuslteri “bPuPted → locally ”to, ethtce. [sent-127, score-0.07]

60 An SCFG derivation is similar to a CFG deriva- 648 NP->PthPC2e->J oNnN oPfN PC1 J ->Cfa3stN ->jetskCi4N P->mrsmC5ith Figure 4: The cost of the CFG derivation in Figure 3 is C1 + C2 + C3 + C4 + C5, where C1 is the base cost of rule “PP → on NP”, etc. [sent-129, score-0.765]

61 Notice that this cost can be doifs trurilbeut “ePdP locally NtoP Pt”he, entco. [sent-130, score-0.151]

62 For instance, the SCFG derivation in Figure 3 derives the sentence pair h “on the fast jet ski ourfe mr smith”, “hseu sre nlet jet esk pia ivrit eh “doen m sem fiatsht” j i. [sent-133, score-2.089]

63 kIni omfa cmhrin sem translation, o jeftten sk we twea dnet mthe s mcoitsht ”o if. [sent-134, score-0.142]

64 th Ien SCFG derivation to include a language model cost for this second sentence. [sent-136, score-0.327]

65 For example, the cost ofthe SCFG derivation in Figure 3 might be C1+C2+C3+ C4+C5 +LM(sur le) +LM(le jet) +LM(jet ski) + LM(ski de) + LM(de m) + LM(m smith), where LM is the negative log of a 2-gram language model. [sent-137, score-0.327]

66 This new cost function can also be distributed locally to the nodes of the derivation, as shown in Figure 5. [sent-138, score-0.151]

67 Formally, define a carry as a sentence of rank 0. [sent-141, score-0.026]

68 In order to provide a chart parsing formulation that applies to both CFG decoding and SCFG decoding with an integrated language model, we need abstract definitions of rule and derivation that capture the above concepts of pattern, postcondition, preconditions, cost, and carries. [sent-142, score-0.583]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ski', 0.505), ('jet', 0.436), ('cnf', 0.238), ('derivation', 0.221), ('mr', 0.218), ('scfg', 0.212), ('cfg', 0.21), ('lnf', 0.208), ('npb', 0.119), ('chart', 0.115), ('cubic', 0.115), ('smith', 0.114), ('lm', 0.108), ('cost', 0.106), ('fast', 0.104), ('sub', 0.093), ('np', 0.09), ('rule', 0.089), ('sur', 0.089), ('synchronous', 0.069), ('rules', 0.069), ('jj', 0.068), ('nnp', 0.065), ('decoding', 0.06), ('preconditions', 0.059), ('scfgs', 0.059), ('application', 0.057), ('derives', 0.056), ('scope', 0.054), ('contexts', 0.053), ('nonterminal', 0.053), ('grammar', 0.052), ('asymptotic', 0.051), ('vite', 0.051), ('normal', 0.049), ('sk', 0.048), ('pp', 0.048), ('denero', 0.047), ('binarization', 0.046), ('chomsky', 0.046), ('sem', 0.046), ('locally', 0.045), ('running', 0.043), ('eh', 0.042), ('mn', 0.042), ('ont', 0.042), ('nonnegative', 0.042), ('maintain', 0.04), ('choice', 0.04), ('nn', 0.038), ('parsing', 0.038), ('substitute', 0.034), ('nonterminals', 0.034), ('consecutive', 0.034), ('le', 0.032), ('rt', 0.031), ('substitution', 0.031), ('supports', 0.03), ('demonstration', 0.029), ('assumption', 0.027), ('constant', 0.026), ('rank', 0.026), ('saet', 0.025), ('preliminaries', 0.025), ('wisdom', 0.025), ('sre', 0.025), ('decoded', 0.025), ('jst', 0.025), ('younger', 0.025), ('dnet', 0.025), ('greg', 0.025), ('nocfe', 0.025), ('efin', 0.025), ('enc', 0.025), ('gn', 0.025), ('hthe', 0.025), ('ib', 0.025), ('lhs', 0.025), ('rhs', 0.025), ('superset', 0.025), ('thise', 0.025), ('simultaneously', 0.025), ('explicit', 0.025), ('de', 0.025), ('implicit', 0.024), ('tence', 0.023), ('oco', 0.023), ('mthe', 0.023), ('quotations', 0.023), ('binarized', 0.023), ('anchors', 0.023), ('ghkm', 0.023), ('oinf', 0.023), ('fer', 0.023), ('xte', 0.023), ('inflate', 0.023), ('mature', 0.023), ('ec', 0.022), ('vp', 0.022), ('base', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

2 0.15882677 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

3 0.095489763 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

4 0.094806589 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

5 0.091339238 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

Author: Stephen Boxwell ; Dennis Mehay ; Chris Brew

Abstract: In many NLP systems, there is a unidirectional flow of information in which a parser supplies input to a semantic role labeler. In this paper, we build a system that allows information to flow in both directions. We make use of semantic role predictions in choosing a single-best parse. This process relies on an averaged perceptron model to distinguish likely semantic roles from erroneous ones. Our system penalizes parses that give rise to low-scoring semantic roles. To explore the consequences of this we perform two experiments. First, we use a baseline generative model to produce n-best parses, which are then re-ordered by our semantic model. Second, we use a modified version of our semantic role labeler to predict semantic roles at parse time. The performance of this modified labeler is weaker than that of our best full SRL, because it is restricted to features that can be computed directly from the parser’s packed chart. For both experiments, the resulting semantic predictions are then used to select parses. Finally, we feed the selected parses produced by each experiment to the full version of our semantic role labeler. We find that SRL performance can be improved over this baseline by selecting parses with likely semantic roles.

6 0.087964565 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

7 0.067831248 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

8 0.062237728 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

9 0.054344401 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

10 0.043120265 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

11 0.043056894 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

12 0.040881198 39 emnlp-2010-EMNLP 044

13 0.039379571 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

14 0.035701238 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

15 0.033333965 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

16 0.033018254 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

17 0.03261495 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter

18 0.03076851 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

19 0.030229641 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

20 0.029607652 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.122), (1, -0.028), (2, 0.182), (3, -0.003), (4, 0.136), (5, 0.056), (6, 0.011), (7, -0.047), (8, -0.122), (9, -0.093), (10, 0.082), (11, -0.045), (12, -0.054), (13, -0.038), (14, -0.009), (15, -0.082), (16, -0.069), (17, -0.12), (18, -0.004), (19, 0.007), (20, -0.038), (21, -0.098), (22, 0.067), (23, -0.066), (24, -0.071), (25, -0.013), (26, 0.049), (27, 0.157), (28, 0.002), (29, -0.005), (30, 0.156), (31, 0.01), (32, -0.16), (33, 0.03), (34, -0.126), (35, 0.239), (36, 0.011), (37, -0.198), (38, -0.015), (39, -0.171), (40, 0.003), (41, 0.068), (42, 0.069), (43, 0.175), (44, -0.132), (45, 0.048), (46, -0.107), (47, 0.116), (48, 0.008), (49, 0.286)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96896541 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

2 0.52600199 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

3 0.32621861 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

4 0.29394805 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

Author: Phil Blunsom ; Trevor Cohn

Abstract: Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency grammars, however the models employed are overly simplistic, particularly in comparison to supervised parsing models. In this paper we present an approach to dependency grammar induction using tree substitution grammar which is capable of learning large dependency fragments and thereby better modelling the text. We define a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions. This approach significantly improves the state-of-the-art, when measured by head attachment accuracy.

5 0.28025198 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

6 0.2583589 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

7 0.22216396 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

8 0.2212352 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

9 0.21817833 123 emnlp-2010-Word-Based Dialect Identification with Georeferenced Rules

10 0.21001597 101 emnlp-2010-Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval

11 0.17169152 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

12 0.15944941 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

13 0.15406784 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing

14 0.15061694 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

15 0.14761104 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions

16 0.1431904 39 emnlp-2010-EMNLP 044

17 0.13736041 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

18 0.13245721 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

19 0.12841597 61 emnlp-2010-Improving Gender Classification of Blog Authors

20 0.12837379 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(12, 0.02), (29, 0.104), (30, 0.018), (32, 0.017), (35, 0.378), (52, 0.045), (56, 0.072), (62, 0.025), (66, 0.045), (72, 0.022), (76, 0.066), (77, 0.011), (79, 0.013), (87, 0.016), (89, 0.017), (96, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77550966 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

2 0.37393761 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

3 0.33652931 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata

Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

4 0.3349632 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

5 0.33144367 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

Author: Tom Kwiatkowksi ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman

Abstract: This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of their meaning. Previous approaches have been designed for particular natural languages or specific meaning representations; here we present a more general method. The approach induces a probabilistic CCG grammar that represents the meaning of individual words and defines how these meanings can be combined to analyze complete sentences. We use higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develop an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model. Experiments demonstrate high accuracy on benchmark data sets in four languages with two different meaning representations.

6 0.33124039 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

7 0.33059654 40 emnlp-2010-Effects of Empty Categories on Machine Translation

8 0.33001384 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

9 0.32866332 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

10 0.32798696 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

11 0.32796288 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

12 0.32628411 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

13 0.32517752 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.32279661 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

15 0.32161167 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

16 0.32133391 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

17 0.31906459 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

18 0.31818917 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

19 0.31810302 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

20 0.31791094 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media