emnlp emnlp2011 emnlp2011-31 knowledge-graph by maker-knowledge-mining

31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars


Source: pdf

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 mark j an nede rho f@ gmai l com Abstract The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. [sent-3, score-0.732]

2 For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. [sent-4, score-0.771]

3 Here we present a solution that applies to the problem in its full generality. [sent-5, score-0.071]

4 One such problem is the computation of prefix probabilities for PCFGs, where we are given as input a PCFG G and a string w, aerned we are gaisvkeend to compute CthFeG probability rthinagt a sentence generated by G starts with w, that is, has w as a prefix. [sent-8, score-0.532]

5 eTrhaites quantity aisr dse wfiintehd w as hthaet possibly infinite sum of the probabilities of all strings of the form wx, for any string x over the alphabet of G. [sent-9, score-0.368]

6 eT fhoer problem ro afn computation orf t prefix probabilities for PCFGs was first formulated by Persoon and Fu (1975). [sent-10, score-0.345]

7 Efficient algorithms for its solution have been proposed by Jelinek and Lafferty (1991) and Stolcke (1995). [sent-11, score-0.071]

8 Prefix probabilities can be used to compute probability distributions for the next word 1213 Giorgio Satta Dept. [sent-12, score-0.127]

9 or part-of-speech, when a prefix of the input has already been processed, as discussed by Jelinek and Lafferty (1991). [sent-15, score-0.174]

10 Motivated by the above applications, the problem of the computation of infix probabilities for PCFGs has been introduced in the literature as a generalization of the prefix probability problem. [sent-18, score-0.935]

11 We are now given a PCFG G and a string w, and we are asked tgoi compute FthGe probability tinhagt a ,s aenntden wcee generated by G has w as an infix. [sent-19, score-0.157]

12 sum so pfr tohbea probabilities eodf all strings of the form xwy, for any pair of strings x and y over the alphabet of G. [sent-21, score-0.317]

13 (1991) have pointed out that the computation of infix probabilities is more difficult than the computation of prefix probabilities, due to the added ambiguity that several occurrences of the given infix can be found in a single string generated by the PCFG. [sent-25, score-1.682]

14 tc ho2d0s1 in A Nsasotucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag uesis 1ti2c1s3–12 1, the distance of the infix from the sentence boundaries, which is a simplifying assumption. [sent-28, score-0.519]

15 However, the algorithm in (Fred, 2000) does not apply to cases with multiple occurrences of the given infix within a string in the language, which is what was pointed out to be the problematic case. [sent-30, score-0.709]

16 In this paper we adopt a novel approach to the problem of computation of infix probabilities, by removing the ambiguity that would be caused by multiple occurrences of the given infix. [sent-31, score-0.663]

17 Although our result is obtained by a combination of well-known techniques from the literature on PCFG parsing and pattern matching, as far as we know this is the first algorithm for the computation of infix probabilities that works for general PCFG models without any restrictive assumption. [sent-32, score-0.76]

18 In Section 2 we explain how the sum of the probabilities of all trees generated by a PCFG can be computed as the least fixed-point solution of a non-linear system of equations. [sent-34, score-0.211]

19 In Section 4 we show how one can efficiently construct an unambiguous finite automaton that accepts all strings with a given infix. [sent-36, score-0.603]

20 The material from these three sections is combined into a new algorithm in Section 5, which allows computation of the infix probability for PCFGs. [sent-37, score-0.661]

21 The concept of left-most derivation in one step is represented by the notation α ⇒πG β, which means that the left-most occurrence o⇒f any nonterminal in α ∈ (Σ N)∗ is rewritten by means of some rule π ∈ (RΣ. [sent-42, score-0.211]

22 ∪If N Nth)e rewritten nonterminal is A, then π mπu ∈st Rbe. [sent-43, score-0.113]

23 o Iff t thhee f roewrmri (A → γ) maninda β iss Ath,e t hreensul πt mofu replacing hthee occurrence →o fγ A) a nind α by γ. [sent-44, score-0.07]

24 eA r elesuftl-t most derivation with any number of steps, using a ∪ ⇒dG sequence d of rules, is denoted as α β. [sent-45, score-0.102]

25 We tahleso s uwbrsitcer α ⇒G∗ w β wnh tehen tPhCeF iGnv ioslv uendd sequence oef rules is of no ⇒rele βvan wchee. [sent-47, score-0.07]

26 A complete derivation is either the empty sequence of rules, or a sequence d = π1 · · · πm, m ≥ ⇒d 1, of rules such that A w for some A ∈ N and w ∈ Σ∗. [sent-49, score-0.237]

27 In the latter case, we say eth Ae complete dwer ∈ivat Σion starts with A, and in the former case, with d an empty sequence of rules, we assume the complete derivation starts and ends with a single terminal, which is left unspecified. [sent-50, score-0.292]

28 It is well-known that there exists a bijective correspondence between leftmost complete derivations starting with nonterminal A and parse trees derived by the grammar with root A and a yield composed of terminal symbols only. [sent-51, score-0.244]

29 The depth of a complete derivation d is the length of the longest path from the root to a leaf in the parse tree associated with d. [sent-52, score-0.208]

30 The probability p(d) of a complete derivation d = π1 · · · πm, m ≥ 1, is: p(d) = Ym Y p(πi). [sent-55, score-0.165]

31 The probability p(w) of a string w is the sum of all complete derivations deriving that string from the start symbol: p(w) = X p(d). [sent-57, score-0.538]

32 In other words, a PCFG is consistent if the sum of probabilities of all complete derivations starting with S is 1. [sent-59, score-0.255]

33 An equivalent definition of consistency considers the sum of probabilities of all strings: X p(w) Xw = 1. [sent-60, score-0.176]

34 We define the partition function of G as the functionW Ze d dtehfaint assigns troti teiaocnh uAn ∈ Nion nth oef vGa aluse th Z(A) = X p(A⇒d w). [sent-67, score-0.119]

35 NMootree generally, i =n la 1te mr esaencstio tnhsa we wsi cllo nnseisedte to compute the partition function for non-consistent PCFGs. [sent-69, score-0.119]

36 We can characterize the partition function of a PCFG as a solution of a specific system of equations. [sent-70, score-0.19]

37 Following the approach in (Harris, 1963; Chi, 1999), we introduce generating functions associated with the nonterminals of the grammar. [sent-71, score-0.061]

38 For A ∈ N awnitdh α ∈ (N t∪er Σ)∗, we fw trhitee g f(A, α) to odren Aote ∈ t hNe nanudmb αer ∈ o (fN occurrences of symbol A within string α. [sent-72, score-0.189]

39 For each Ak ∈ N, let mk bNe =the { nAumber of rules i}n. [sent-77, score-0.075]

40 ,0) is the sum of the probabilities of all complete derivations from Ak having depth not exceeding i. [sent-104, score-0.29]

41 The above shows that the values of the partition function provide a solution to the system of the following equations, for 1≤ k ≤ |N| : zk = gAk (z1, z2, . [sent-133, score-0.19]

42 (5) In the case of a general PCFG, the above equations are non-linear polynomials with positive (real) coefficients. [sent-137, score-0.077]

43 These systems are called monotone systems of polynomial equations and have been investigated by Etessami and Yannakakis (2009) and Kiefer et al. [sent-139, score-0.129]

44 The sought solution, that is, the partition function, is the least fixed point solution of X = g(X) . [sent-141, score-0.19]

45 connected component, there is a separate system of equations of the form X = g(X). [sent-144, score-0.077]

46 That is, if one strongly connected component contains nonterminal A, and another contains nonterminal B, where A ⇒∗ uBα for some u, α, then the system for the latter component m soumste eb ue sαol,v tehden nfi trhset. [sent-146, score-0.16]

47 The solution for a system of equations such as those described above can be irrational and nonexpressible by radicals, even if we assume that all the probabilities of the rules in the input PCFG are rational numbers, as observed by Etessami and Yannakakis (2009). [sent-147, score-0.277]

48 Nonetheless, the partition function can still be approximated to any degree of precision by iterative computation of the relation in (4), as done for instance by Stolcke (1995) and by Abney et al. [sent-148, score-0.26]

49 This corresponds to the so-called fixed-point iteration method, which is well-known in the numerical calculus literature and is frequently applied to systems of non-linear equations because it can be easily implemented. [sent-150, score-0.158]

50 When a number of standard conditions are met, each iteration of (4) adds a fixed number of bits to the precision of the solution; see Kelley (1995, Chapter 4). [sent-151, score-0.099]

51 Since each iteration can easily be im- plemented to run in polynomial time, this means that we can approximate the partition function of a PCFG in polynomial time in the size of the PCFG itself and in the number of bits of the desired precision. [sent-152, score-0.322]

52 In practical applications where large PCFGs are empirically estimated from data sets, the standard conditions mentioned above for the polynomial time approximation of the partition function are usually 1216 met. [sent-153, score-0.171]

53 However, there are some degenerate cases for which these standard conditions do not hold, resulting in exponential time behaviour of the fixed-point iteration method. [sent-154, score-0.123]

54 An alternative iterative algorithm for the approximation of the partition function has been proposed by Etessami and Yannakakis (2009), based on Newton’s method for the solution of non-linear systems of equations. [sent-156, score-0.222]

55 (2007) have shown that, after a certain number of initial iterations, Newton’s method adds a fixed number of bits to the precision of the approximated solution, even in the above mentioned cases in which the fixed-point iteration method shows exponential time behaviour. [sent-158, score-0.142]

56 However, these authors also show that, in some degenerate cases, the number of iterations needed to compute the first bit of the solution can be at least exponential in the size of the system. [sent-159, score-0.151]

57 Experiments with Newton’s method for the approximation of the partition functions of PCFGs have been carried out in several application-oriented settings, by Wojtczak and Etessami (2007) and by Nederhof and Satta (2008), showing considerable improvements over the fixed-point iteration method. [sent-160, score-0.162]

58 (1964) that contextfree languages are closed under intersection with regular languages. [sent-162, score-0.057]

59 Their proof relied on the construction of a new CFG out of an input CFG and an input finite automaton. [sent-163, score-0.191]

60 Here we extend that construction by letting the input grammar be a probabilistic CFG. [sent-164, score-0.097]

61 To avoid a number of technical complications, we assume the finite automaton has no epsilon transi- × tions, and has only one final state. [sent-166, score-0.44]

62 In the context of our use of this construction in the following sections, these restrictions are without loss of generality. [sent-167, score-0.048]

63 Thus, a finite automaton (FA) M is represented by a 5-tuple (Σ, Q, q0, qf, ∆), w)h Mere i sΣ r apnrde Q are two finite sets of terminals and states, respectively, q0 is the initial state, qf is the final state, and ∆ is a finite set of transitions, each of the form s t, where s, t ∈ Q and a ∈ Σ. [sent-168, score-0.833]

64 A complete computation of M accepting string w = a1 · · · an is a sequence c = τ1 · · · τn of transitions su·c·h· athat τi = (si−1 si) fo·r· e·aτch i(1 ≤ i ≤ n), for some s0, s1, . [sent-169, score-0.538]

65 oTrh eso language of all strings accepted by M is denot. [sent-173, score-0.134]

66 oAf FalAl sitsr unambiguous yif M Mat most one complete computation enxaimstsb figoru oeausch i accepted string. [sent-175, score-0.325]

67 A FA is deterministic if there is at most one transition s t for each s and a. [sent-176, score-0.085]

68 For a FA M as abo7v→e a tn fdo a ePaCchFG s G = (Σ, N, S, R, p) aw FitAh Mthe same vseet onfd terminals, we Σcon, Nstr,u Sct, a new PCFG G0 = (Σ, N0, S0, R0, p0), where N0 = Q (Σ ∪C N) Q, S0 = (q0, S, qf) , and R0 is the set oQf ×rul(eΣs t∪haNt )is× ×obQt,ai Sned as follows. [sent-177, score-0.058]

69 , sm ·w·i·thX si ∈ Q, 0 ≤ i ≤ m, and m ≥ 0, let (s0, A, sm) → (s0, X1, s1) · · · (sm−1 , Xm, sm) sbe in R0;) i →f m = 0, the) new rule is of the form (s0, A, s0) → ? [sent-181, score-0.098]

70 nd O ntra tnhesi btiaosniss of G0, G and M, it is not difficult to see that each doefr Giva,t Gion a nd0d i nM MG,0 deriving string w corresponds atoc a unique derivaitnio Gn d in G deriving the same string and a unique computation c einr vMin accepting tthrien same string. [sent-190, score-0.745]

71 Conversely, inf tch ienre M Mis a dceeprivtinatgion th ed sianm Ge deriving string w, ,an ifd some computation c din i nM G accepting tthrien same string, theen co mthep pair onf cd iann dM c corresponds to a unique derivation d0 in G0 deriving the same string w. [sent-191, score-0.859]

72 Furthermore, the probabilities of d and d0 are equal, by definition of p0. [sent-192, score-0.094]

73 Let us now assume that each string w is accepted by at most one computation, i. [sent-193, score-0.185]

74 M M, t ihsen un tahmerbei are as many d setrriinvgati won iss deriving w yin M MG0, as ethne trhee are rine aGs. [sent-197, score-0.182]

75 mIfa w dise not accepted by gM w, tnhe Gn there are zero dGe. [sent-198, score-0.061]

76 tbhreo input grammar oisr mtra bnse-formed to let each right-hand side have at most two members. [sent-205, score-0.049]

77 4 Obtaining unambiguous FAs In the previous section, we explained that unambiguous finite automata have special properties with respect to the grammar G0 that we may construct out sopf a tF tAo Mthe agnrdam a mPaCrF GG G. [sent-208, score-0.424]

78 be I no tbhtaisin seedc tfioorn t whee special case of finite automata accepting the language of all strings with given infix w ∈ Σ∗: Linfix (w) = {xwy | x, y ∈ Σ∗ }. [sent-210, score-0.928]

79 Furthermore, there seem to be no practical algorithms that turn FAs into equivalent unambiguous FAs other than the algorithms that also determinize them. [sent-212, score-0.09]

80 Therefore, we will henceforth concentrate on deterministic rather than unambiguous automata. [sent-213, score-0.175]

81 Given a string w = a1 · · · an, a finite automaton accepting Linfix (w) can b·e· straightforwardly constructed. [sent-214, score-0.705]

82 , sn, transitions s0 s0 and sn sn for each a ∈ Σ, and transition si−1 si for7→ →eac sh if (1 ≤ hi ≤ n). [sent-218, score-0.246]

83 T anhed initial state is s0 an sd ftoher efianchal ist (a1te ≤ ≤is sn. [sent-219, score-0.053]

84 One way to make this automaton deterministic is →a →ai →a to apply the general algorithm of determinization of finite automata; see e. [sent-221, score-0.525]

85 An alternative approach is to construct a deterministic finite automaton directly from w, in line with the Knuth-Morris-Pratt algorithm (Knuth et al. [sent-225, score-0.525]

86 Both approaches result in the same deterministic FA, which we denote by Iw. [sent-227, score-0.085]

87 However, tdheet lramttienri approach hisi ceha swieer d teon implement in such a way that the time complexity of constructing the automaton is linear in |w| . [sent-228, score-0.297]

88 The initial state is t0 and the final state is tn. [sent-235, score-0.106]

89 If the automaton is in state tn, then this means t·hbat w is an infix of b1 · · · bj. [sent-237, score-0.869]

90 In more detail, for each i(1 ≤ i ≤ n) and each a ∈ Σ, trhee dree aisi a otrarn esaicthion i ti−1 →a i tj, w) hanerde e j cihs tahe ∈ length oerfe t ihse longest string tha7→t →is t both a prefix of w and a suffix of a1 · · · ai−1a. [sent-238, score-0.354]

91 To ensure that we remain in the final state once an occurrence of infix w has been found, we also add transitions tn tn for each a ∈ Σ. [sent-240, score-0.783]

92 →a 5 Infix probability With the material developed in the previous sections, the problem of computing the infix probabilities can be effectively solved. [sent-242, score-0.646]

93 Our goal is to compute for given infix w ∈ Σ∗ and PCFG G = (Σ, N, S, R, p): σinfix(w,G) = X z∈LXinfix p(z). [sent-243, score-0.519]

94 (w) In Section 4 we have shown the construction of finite automaton Iw accepting Linfix (w), by which we obtaauitno: σinfix(w,G) = X p(z). [sent-244, score-0.629]

95 z∈XL(Iw) As Iw is deterministic and therefore unambiguous, tAhes Ir esults from Section 3 apply and if G0 = (Σ, N0, Sth0e, rRes0,u p0) firso tmhe S SPeCctFioGn c 3o napsptrluyct aendd o ifut G of G and Iw then: σinfix(w,G) = Xp0(z). [sent-245, score-0.085]

96 Xz Finally, we can compute the above sum by applying the iterative method discussed in Section 2. [sent-246, score-0.11]

97 First, we can replace the infix w by a sequence of infixes w1, . [sent-248, score-0.609]

98 , wm, which have to occur in the given order, one strictly after the other, with arbitrary infixes in between: σisland(w1, X X. [sent-251, score-0.055]

99 First we construct separate automata Iwj (1 ≤ j ≤ m) as explained in Sseepcatiroante 4 a. [sent-263, score-0.052]

100 u tTomheaseta a Iutom(1at ≤a are ≤th men) composed idn iton a single automaton I(w1,. [sent-264, score-0.33]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('infix', 0.519), ('automaton', 0.297), ('pcfg', 0.261), ('gak', 0.164), ('pcfgs', 0.148), ('finite', 0.143), ('prefix', 0.142), ('accepting', 0.141), ('etessami', 0.137), ('ak', 0.134), ('string', 0.124), ('partition', 0.119), ('yannakakis', 0.109), ('computation', 0.109), ('iw', 0.105), ('deriving', 0.096), ('probabilities', 0.094), ('unambiguous', 0.09), ('deterministic', 0.085), ('fas', 0.082), ('linfix', 0.082), ('nonterminal', 0.08), ('equations', 0.077), ('ai', 0.077), ('fa', 0.074), ('strings', 0.073), ('solution', 0.071), ('aik', 0.071), ('corazza', 0.071), ('derivation', 0.067), ('complete', 0.065), ('sn', 0.065), ('nederhof', 0.064), ('newton', 0.064), ('qf', 0.064), ('transitions', 0.064), ('nonterminals', 0.061), ('accepted', 0.061), ('tn', 0.058), ('intersection', 0.057), ('bits', 0.056), ('infixes', 0.055), ('iwj', 0.055), ('kiefer', 0.055), ('limi', 0.055), ('tthrien', 0.055), ('xwy', 0.055), ('state', 0.053), ('satta', 0.053), ('si', 0.052), ('polynomial', 0.052), ('automata', 0.052), ('derivations', 0.05), ('grammar', 0.049), ('construction', 0.048), ('ga', 0.048), ('fred', 0.047), ('iann', 0.047), ('trhee', 0.047), ('ub', 0.047), ('sum', 0.046), ('sm', 0.046), ('iteration', 0.043), ('exponential', 0.043), ('terminals', 0.043), ('gn', 0.043), ('mthe', 0.043), ('longest', 0.041), ('bj', 0.04), ('mk', 0.04), ('oaf', 0.04), ('iss', 0.039), ('literature', 0.038), ('degenerate', 0.037), ('consistency', 0.036), ('chi', 0.035), ('dw', 0.035), ('xm', 0.035), ('occurrences', 0.035), ('sequence', 0.035), ('rules', 0.035), ('depth', 0.035), ('jelinek', 0.033), ('wm', 0.033), ('rewritten', 0.033), ('xl', 0.033), ('idn', 0.033), ('xw', 0.033), ('probability', 0.033), ('discussed', 0.032), ('iterative', 0.032), ('ims', 0.032), ('occurrence', 0.031), ('pointed', 0.031), ('cfg', 0.031), ('ik', 0.031), ('alphabet', 0.031), ('symbol', 0.03), ('starts', 0.03), ('nm', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.

2 0.11622929 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley

Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.

3 0.10242131 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta

Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.

4 0.090158887 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference

Author: Benjamin Borschinger ; Bevan K. Jones ; Mark Johnson

Abstract: It is often assumed that ‘grounded’ learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language’s canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.1

5 0.088700138 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

Author: Federico Sangati ; Willem Zuidema

Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.

6 0.085572243 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

7 0.075319186 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

8 0.073748127 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax

9 0.056238763 113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns

10 0.053293105 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

11 0.049296364 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

12 0.04895914 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

13 0.047825094 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

14 0.047312513 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies

15 0.043636028 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data

16 0.041025441 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

17 0.040454984 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

18 0.037389707 96 emnlp-2011-Multilayer Sequence Labeling

19 0.03627304 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

20 0.035900608 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, 0.046), (2, -0.017), (3, 0.002), (4, 0.023), (5, -0.02), (6, -0.121), (7, -0.103), (8, 0.045), (9, -0.036), (10, -0.019), (11, 0.117), (12, -0.04), (13, -0.007), (14, 0.06), (15, -0.112), (16, 0.006), (17, 0.066), (18, 0.016), (19, -0.009), (20, 0.174), (21, -0.149), (22, 0.021), (23, -0.044), (24, 0.004), (25, 0.22), (26, -0.046), (27, 0.115), (28, 0.217), (29, 0.015), (30, -0.225), (31, 0.058), (32, 0.07), (33, 0.171), (34, 0.019), (35, 0.186), (36, 0.186), (37, -0.022), (38, -0.087), (39, -0.002), (40, -0.077), (41, -0.053), (42, -0.039), (43, -0.074), (44, 0.153), (45, 0.056), (46, -0.053), (47, -0.068), (48, -0.027), (49, 0.168)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97444677 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.

2 0.58144915 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing

Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta

Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.

3 0.56226218 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley

Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.

4 0.4777329 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP

Author: Federico Sangati ; Willem Zuidema

Abstract: We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.

5 0.44544739 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference

Author: Benjamin Borschinger ; Bevan K. Jones ; Mark Johnson

Abstract: It is often assumed that ‘grounded’ learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language’s canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.1

6 0.27423796 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation

7 0.25822923 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax

8 0.23706448 96 emnlp-2011-Multilayer Sequence Labeling

9 0.22547829 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

10 0.22454989 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

11 0.20221666 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

12 0.1814557 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

13 0.17488585 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction

14 0.16888429 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data

15 0.15886725 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

16 0.14664963 131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?

17 0.14409241 110 emnlp-2011-Ranking Human and Machine Summarization Systems

18 0.14352711 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model

19 0.13774604 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries

20 0.13680781 113 emnlp-2011-Relation Acquisition using Word Classes and Partial Patterns


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.011), (23, 0.084), (36, 0.022), (37, 0.034), (45, 0.046), (53, 0.014), (54, 0.024), (57, 0.021), (62, 0.014), (64, 0.022), (66, 0.035), (68, 0.189), (69, 0.2), (79, 0.077), (82, 0.052), (87, 0.024), (90, 0.029), (96, 0.019), (98, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84270263 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.

2 0.70850247 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Author: Yin-Wen Chang ; Michael Collins

Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.

3 0.70088333 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base

Author: Ni Lao ; Tom Mitchell ; William W. Cohen

Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.

4 0.65303242 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices

Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole

Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.

5 0.56575924 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley

Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.

6 0.47642356 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

7 0.47397548 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction

8 0.47150031 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

9 0.46919015 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference

10 0.46729401 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

11 0.466369 107 emnlp-2011-Probabilistic models of similarity in syntactic context

12 0.46467489 34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images

13 0.46318349 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

14 0.46157002 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing

15 0.46059406 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding

16 0.46015704 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

17 0.45728281 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

18 0.45672449 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

19 0.45440596 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

20 0.4544009 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases