acl acl2011 acl2011-250 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We present a method for the computation of prefix probabilities for synchronous contextfree grammars. [sent-3, score-0.842]
2 Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms. [sent-4, score-0.311]
3 Several such statistical models that have been investigated in the literature are based on synchronous rewriting or tree transduction. [sent-6, score-0.297]
4 Probabilistic synchronous context-free grammars (PSCFGs) are one among the most popular examples of such models. [sent-7, score-0.393]
5 PSCFGs subsume several syntax-based statistical translation models, as for instance the stochastic inversion transduction grammars of Wu (1997), the statistical model used by the Hiero system of Chiang (2007), and systems which extract rules from parsed text, as in Galley et al. [sent-8, score-0.244]
6 We are asked to compute the probability that a sentence generated by our model starts with a prefix string v given as input. [sent-18, score-0.505]
7 This quantity is defined as the (possibly infinite) sum of the probabilities of all strings of the form vw, for any string w over the alphabet of the model. [sent-19, score-0.333]
8 This paper investigates the problem of computing prefix probabilities for PSCFGs. [sent-25, score-0.479]
9 In this context, a pair of strings v1 and v2 is given as input, and we are asked to compute the probability that any string in the source language starting with prefix v1 is translated into any string in the target language starting with prefix v2. [sent-26, score-1.098]
10 This probability is more precisely defined as the sum of the probabilities of translation pairs of the form [v1w1 , v2w2], for any strings w1 and w2. [sent-27, score-0.356]
11 A special case of prefix probability for PSCFGs is the right prefix probability. [sent-28, score-0.776]
12 This is defined as the probability that some (complete) input string w in the source language is translated into a string in the target language starting with an input prefix v. [sent-29, score-0.583]
13 Our solution to the problem of computing prefix probabilities is formulated in quite different terms from the solutions by Jelinek and Lafferty (1991) and by Stolcke (1995) for probabilistic context-free grammars. [sent-33, score-0.546]
14 In this paper we reduce the computation of prefix probabilities for PSCFGs to the computation of inside probabilities under the same model. [sent-34, score-0.836]
15 Computation of inside probabilities for PSCFGs is a well-known problem that can be solved using offthe-shelf algorithms that extend basic parsing algorithms. [sent-35, score-0.233]
16 Our reduction is a novel grammar transformation, and the proof of correctness proceeds by fairly conventional techniques from formal language theory, relying on the correctness of standard methods for the computation of inside probabilities for PSCFG. [sent-36, score-0.443]
17 Our method for computing the prefix probabili- ties for PSCFGs runs in exponential time, since that is the running time of existing methods for computing the inside probabilities for PSCFGs. [sent-38, score-0.633]
18 It is unlikely this can be improved, because the recognition problem for PSCFG is NP-complete, as established by Satta and Peserico (2005), and there is a straightforward reduction from the recognition problem for PSCFGs to the problem of computing the prefix probabilities for PSCFGs. [sent-39, score-0.553]
19 2 Definitions In this section we introduce basic definitions related to synchronous context-free grammars and their probabilistic extension; our notation follows Satta and Peserico (2005). [sent-40, score-0.46]
20 In what follows we need to represent bijections between the occurrences of nonterminals in two strings over N ∪ Σ. [sent-42, score-0.275]
21 Two strings γ1 , γ2 ∈ VI∗ are synchronous if each index from N occurs at most once in γ1 and at most once in γ2, and index(γ1) = index(γ2). [sent-47, score-0.487]
22 synchronous context-free grammar (SCFG) is a tuple G = (N, Σ, P, S), where N and Σ are finite, disjoint sets of nonterminal and terminal symbols, respectively, S ∈ N is the start symbol and bPo liss, a fsipneitcet sveetl yo,f S synchronous r sutalrets. [sent-53, score-0.829]
23 yEmabchol synchronous rule has the form s : [A1 → α1 , A2 → α2], where A1, A2 ∈ N and where α1 , α2 ∈ VI∗ are synchronous strings. [sent-54, score-0.759]
24 We refer to A1 → α1 and A2 → α2, respectively, as the left and right components o αf rule s. [sent-57, score-0.265]
25 This is done in such a way that the result is once more a pair of synchronous strings. [sent-59, score-0.337]
26 L ∈e tV γ1 , γ2 γbe ∈ synchronous strings in VI∗. [sent-63, score-0.403]
27 Note that δ1 , δ2 above are guaranteed to be synchronous strings, because α1 and α2 are synchronous strings and because of (i) above. [sent-66, score-0.7]
28 Note also that, for a given pair [γ1 , γ2] of synchronous strings, an index t and a rule s, there may be infinitely many choices of reindexing f such that the above constraints are satisfied. [sent-67, score-0.673]
29 We say the pair [A1, A2] of nonterminals is linked (in G) if there is a rule of the form s : [A1 → α1 , A2 → α2] . [sent-69, score-0.38]
30 The set of linked nonterminal pairs is denote→d by A derivation is a sequence σ = s1s2 · · · sd of synchronous rules si ∈ P with d ≥ 0 (σ = ε for N[2]. [sent-70, score-0.575]
31 = d 0) such that [γ1i−1, γ2i−1] ⇒sGi [γ1i, γ2i] ffoorr every i with 1 ≤ i ≤ d and synchronous strings [γ1i, γ2i] iwthith 1 0 ≤ ≤ ii ≤ dd . [sent-71, score-0.403]
32 When we want to focus on the specific synchronous strings being derived, we also write derivations in the form [γ10, γ20] ⇒σG [γ1d, γ2d], and we write [γ10, γ20] ⇒∗G [γ1d, γ2d] when σ is not further specified. [sent-73, score-0.495]
33 [S1, S1] 462 Analogously to standard terminology for contextfree grammars, we call a SCFG reduced if every rule occurs in at least one derivation σ ∈ D(G, [w1, w2] ), for some w1, w2 ∈ Σ∗. [sent-75, score-0.216]
34 2 The size of a synchronous rule s : [A1 → α1 , A2 → α2], is defined as |s| P= |A1α1A2α2 |. [sent-79, score-0.462]
35 We say that G is proper if for each pair [A1, A2] ∈ we h thavaet: G N[2] s: XA2→α2] pG(s) [A1→αX1 = 1 X, Intuitively, properness ensures that where a pair of nonterminals in two synchronous strings can be rewritten, there is a probability distribution over the applicable rules. [sent-82, score-0.724]
36 One of its side conditions has a synchronous rule in P of the form: u10A1t11u11 ···u1r−1A1trru1r, A2 → u20A2t1π(1)u21···u2r−1A2trπ(r)u2r] s : [A1 → (2) Observe that, in the right-hand side of the two rule components above, nonterminals A1i and A2π−1(i), 1 ≤ i ≤ r, have both the same index. [sent-95, score-0.851]
37 In the inference rule in figure 1 there are 2(r + 1) variables that can be bound to positions in w1, and as many that can be bound to positions in w2. [sent-100, score-0.217]
38 The recognition algorithm above can easily be turned into a parsing algorithm by letting an implementation keep track of which items were derived from which other items, as instantiations of the consequent and the antecedents, respectively, of the inference rule in figure 1. [sent-106, score-0.239]
39 To explain the basic idea, let us first assume that each item can be inferred in finitely many ways by the inference rule in figure 1. [sent-108, score-0.213]
40 Each instantiation of the inference rule should be associated with a term that is computed by multiplying the probability of the involved rule s and the product of all probabilities previously associated with the instantiations of the antecedents. [sent-109, score-0.496]
41 The probability associated with an item is then computed as the sum of each term resulting from some instantiation of an inference rule deriving that item. [sent-110, score-0.306]
42 This is a generalization to PSCFG of the inside algorithm defined for probabilistic context-free grammars (Manning and Sch u¨tze, 1999), and we can show that the probability associated with item [0, S, |w1| ; 0, S, |w2 |] provides the desired value pG( [w1, w2] ). [sent-111, score-0.353]
43 C1o ≤ns jid ≤er again a synchronous rule s of the form in (2). [sent-119, score-0.462]
44 We say s is an epsilon rule if r = 0 and u10 = u20 = ? [sent-120, score-0.351]
45 We say s is a unit rule if r = 1 and u10 = u11 = u20 = u21 = ? [sent-122, score-0.269]
46 Similarly to context-free grammars, absence of epsilon rules and unit rules guarantees that there are no cyclic dependencies between items and in this case the inside algorithm correctly computes pG([w1 , w2]). [sent-124, score-0.561]
47 Epsilon rules can be eliminated from PSCFGs by a grammar transformation that is very similar to the transformation eliminating epsilon rules from a probabilistic context-free grammar (Abney et al. [sent-125, score-0.858]
48 We first compute the set of all nullable linked pairs of nonterminals of the underlying SCFG, that is, the set of N[2] [A11, A21] all [A1, A2] ∈ such that ⇒∗G [ε, ε]. [sent-128, score-0.305]
49 Next, we identify all occurrences of nullable pairs [A1, A2] in the right-hand side components of a rule s, such that A1 and A2 have the same index. [sent-130, score-0.408]
50 For every possible choice of a subset U of these occurrences, we ablded to our grammar a new r thulees sU constructed by omitting all of the nullable occurrences in U. [sent-131, score-0.299]
51 v Aefntetsr tahded use of epsilon464 generating subderivations, we can safely remove all epsilon rules, with the only exception of a possible rule of the form [S → ? [sent-136, score-0.324]
52 One problem with the above construction is that we have to create new synchronous rules sU for each possible choice of subset U. [sent-142, score-0.398]
53 In the case of context-free grammars, this is usually circumvented by casting the rules in binary form prior to epsilon rule elimination. [sent-144, score-0.425]
54 Then each of the values in (3) is guaranteed to =b e 1, and furthermore we can remove the instances of the nullable pairs in the source rule s all at the same time. [sent-153, score-0.32]
55 This means that the overall construction of elimination of nullable rules from G can be implemented in linear time |G| . [sent-154, score-0.294]
56 After elimination of epsilon rules, one can eliminate unit rules. [sent-157, score-0.301]
57 Consider a pair [A1, A2] ∈ N[2] and let all unit rules with left-hand sides A]1 ∈an Nd A2 be: s1 : [A1,A2] → [A1t11,A2t11] . [sent-162, score-0.218]
58 The elimination of unit rules starts with adding a rule s0 : [A1 → α1 , A2 → α2] for each nonunit rule s : [B1 → α1 , B2 → α2] and pair [A1, A2] such that C→unit ([A1, A2] , [B1, B2]) > 0. [sent-169, score-0.613]
59 We assign to the new rule s0 the probability pG(s) · Cunit ([A1, A2] , [B1, B2] ). [sent-170, score-0.221]
60 Again, in the resulting grammar the translation and the associated probability distribution will be the same as those in the source grammar. [sent-172, score-0.253]
61 O(|G|2), 4 Prefix probabilities The joint prefix probability , v2]) of a pair [v1, v2] of terminal strings is the sum of the ppGrefix([v1 probabilities of all pairs of strings that have v2, respectively, as their prefixes. [sent-177, score-0.976]
62 However, analogously to the case of context-free prefix probabilities (Jelinek and Lafferty, 1991), we can isolate two parts in the computation. [sent-179, score-0.45]
63 Computing pGprefix ([v1, v2] ) directly using a generic probabilistic parsing algorithm for PSCFGs is difficult, due to the presence of epsilon rules and unit rules. [sent-187, score-0.441]
64 d T grammar Gp0refix by eliminating epsilon riuntleos a a thndir du gnirat mrumlaesr Gfrom the underlying SCFG, and preserving the probability distribution over pairs of strings. [sent-189, score-0.338]
65 Using Gp0refix one can then effectively apply generic probabilistic parsing algorithms for PSCFGs, such as the inside algorithm discussed in section 3, in order to compute the desired prefix probabilities for the source PSCFG G. [sent-190, score-0.696]
66 The meaning of A remains unchanged, whereas A↓ is intended to generate a string that is a suffix of a known prefix v1 or v2. [sent-192, score-0.42]
67 The two lefthand sides of a synchronous rule in Gprefix can contain different combinations of nonterminals of the forms A, A↓, or Aε. [sent-194, score-0.583]
68 The structure of the rules from the source grammar is largely retained, except that some terminal symbols are omitted in order to obtain the intended interpretation of A↓ and Aε. [sent-196, score-0.322]
69 The ch→oic αes for i= 1and for i= 2 are independent, so that we can have 3 ∗ 3 = 9 kinds of synchronous rules, t ow bee c afnur hthaevre s3u ∗bd 3iv =ide 9d k i nnd ws ohfat s yfnocllohwrons. [sent-198, score-0.327]
70 uAs unique label s0 is produced for each new rule, and the probability of each new rule equals that of s. [sent-199, score-0.221]
71 In fact, there can be a number of choices for αi↓ and, for each choice, αiε = the transformed grammar contains an instance of the synchronous rule s0 : [B1 → β1 , B2 → β2] as defined above. [sent-201, score-0.585]
72 The reason why dβifferent →cho βices need to be considered is because the boundary between the known prefix vi and the unknown suffix wi can 466 occur at different positions, either within a terminal string uij or else further down in a subderivation involving Aij. [sent-202, score-0.68]
73 In the first case, we have for some j (0 ≤ j ≤ r): αi↓ = ui0Ait1i1ui1Ait2i2 uij−1Aijtijui0jAiεj+tij1+1Aiεjt+ij2+2 ···Aεirtir ··· where ui0j is a choice of a prefix of uij. [sent-203, score-0.34]
74 In words, the known prefix ends after ui0j and, thereafter, no more terminals are generated. [sent-204, score-0.382]
75 In this second case, we have for some j (1 ≤ j ≤ r): αi↓ = ui0Ait1i1ui1Ait2i2 uij−1Ai↓jtijAiεjt+ij1+1Aiεjt+ij2+2 ···Aεirtir ··· Here the known prefix ofthe input ends within a subderivation involving Aij, and further to the right no more terminals are generated. [sent-207, score-0.477]
76 Example 3 Consider the synchronous rule s : [A → D → The first component of a synchronous reuf Ele derived from this can be one of the following eight: aB1bc C2d, Aε A↓ A↓ A↓ A↓ A↓ → → → → → → ef E2F1]. [sent-208, score-0.804]
77 ∗ 2 For each synchronous rule s, the above grammar transformation produces O( |s|) left rule components asnfodr as many right rsul Oe components. [sent-210, score-0.942]
78 Tomhismeans the number of new synchronous rules is O( and the size of each such rule is O(|s|). [sent-211, score-0.563]
79 We now investigate formal properties of our grammar transformation, in order to relate it to prefix probabilities. [sent-214, score-0.463]
80 This follows from the observation that the length of v1 in v1w1 uniquely determines how occurrences of left components of rules in P found in σ are mapped to occurrences of left components of rules in Pprefix found in σ0. [sent-222, score-0.418]
81 Lemma 2 is easy to prove as the structure of the transformation ensures that the terminals that are in rules from P but not in the corresponding rules from Pprefix occur at the end of a string v1 (and v2) to form the longer string v1w1 (and v2w2, respectively). [sent-224, score-0.496]
82 2 467 Because ofthe introduction ofrules with left-hand sides ofthe form Aε in both the left and right components of synchronous rules, it is not straightforward to do effective probabilistic parsing with the grammar Gprefix. [sent-228, score-0.624]
83 We can however apply the transformamtioanrs G from section 3 to eliminate epsilon rules and thereafter eliminate unit rules, in a way that leaves the derived string pairs and their probabilities unchanged. [sent-229, score-0.569]
84 The simplest case is when the source grammar G is reduced, proper asned is consistent, saonudr cheas g no epsilon rules. [sent-230, score-0.309]
85 The only nullable pairs of nonterminals in Gprefix will then be of the form [Aε1 , A2ε] . [sent-231, score-0.249]
86 [A11, A21]⇒σG[w1, w2] Because of the structure of the grammar transformation by which Gprefix was obtained from G, we also htioavne: b X σ∈PX∗ pGprefix(σ) =1 s. [sent-235, score-0.215]
87 [Aε11, A2ε1]⇒σGprefix[ε, ε] Therefore pairs of occurrences of A1ε and A2ε with the same index in synchronous rules of Gprefix can be systematically removed without affecting the probability of the resulting rule, as outlined in section 3. [sent-237, score-0.586]
88 Thereafter, unit rules can be removed to allow parsing by the inside algorithm for PSCFGs. [sent-238, score-0.301]
89 t T ihni sth ree introduction, spienccete dth,e a recognition problem for PSCFGs is NP-complete, as established by Satta and Peserico (2005), and there is a straightforward reduction from the recognition problem for PSCFGs to the problem of computing the prefix probabilities for PSCFGs. [sent-241, score-0.553]
90 One should add that, in real world machine translation applications, it has been observed that recognition (and computation of inside probabilities) for SCFGs can typically be carried out in low-degree polynomial time, and the worst cases mentioned above are not observed with real data. [sent-242, score-0.299]
91 5 Discussion We have shown that the computation of joint prefix probabilities for PSCFGs can be reduced to the computation of inside probabilities for the same model. [sent-245, score-0.836]
92 Our reduction relies on a novel grammar transformation, followed by elimination of epsilon rules and unit rules. [sent-246, score-0.525]
93 This can be computed as a special case of the joint prefix probability. [sent-248, score-0.34]
94 We are interested in the probability that the next terminal in the target translation is a ∈ Σ, after having processed a prefix v1 of tlhatei source sentence ra hndav having produced a prefix v2 468 of the target translation. [sent-255, score-0.881]
95 This can be realised by adding a rule s0 : [B → b, A → cA] for each rule s : [B → b, A → a] fr →om b ,thAe source grammar, uwlehe sre : [AB Bis → a bn,oAnte →rm ain]al f representing a part-of-speech and cA is a (pre-)terminal specific to A. [sent-263, score-0.357]
96 Here we are interested in the probability that any string in the source language with infix v1 is translated into any string in the target language with infix v2. [sent-267, score-0.499]
97 However, just as infix probabilities are difficult to compute for probabilistic context-free grammars (Corazza et al. [sent-268, score-0.43]
98 , 1991 ; Nederhof and Satta, 2008) so (joint) infix probabilities are difficult to compute for PSCFGs. [sent-269, score-0.267]
99 The computation of infix probabilities can be reduced to that of solving non-linear systems of equations, which can be approximated using for instance Newton’s algorithm. [sent-271, score-0.367]
100 An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. [sent-392, score-0.444]
wordName wordTfidf (topN-words)
[('prefix', 0.34), ('pg', 0.327), ('synchronous', 0.297), ('pscfgs', 0.29), ('gprefix', 0.22), ('ppgrefix', 0.201), ('scfg', 0.166), ('rule', 0.165), ('epsilon', 0.159), ('cunit', 0.146), ('infix', 0.128), ('nullable', 0.128), ('grammar', 0.123), ('nonterminals', 0.121), ('probabilities', 0.11), ('pgprefix', 0.11), ('strings', 0.106), ('pscfg', 0.104), ('rules', 0.101), ('grammars', 0.096), ('computation', 0.095), ('transformation', 0.092), ('satta', 0.089), ('jelinek', 0.086), ('inside', 0.086), ('index', 0.084), ('string', 0.08), ('unit', 0.077), ('vi', 0.075), ('pprefix', 0.073), ('terminal', 0.071), ('probabilistic', 0.067), ('elimination', 0.065), ('peserico', 0.065), ('lafferty', 0.06), ('uij', 0.059), ('probability', 0.056), ('corazza', 0.055), ('irtir', 0.055), ('reindexing', 0.055), ('rmax', 0.055), ('subderivation', 0.055), ('deduction', 0.053), ('derivation', 0.051), ('prg', 0.048), ('nederhof', 0.048), ('item', 0.048), ('occurrences', 0.048), ('translation', 0.047), ('ef', 0.045), ('jt', 0.045), ('terminals', 0.042), ('thereafter', 0.042), ('nonterminal', 0.041), ('pair', 0.04), ('right', 0.04), ('ab', 0.039), ('exponential', 0.039), ('equations', 0.038), ('derivations', 0.038), ('parsing', 0.037), ('ai', 0.037), ('sum', 0.037), ('recognition', 0.037), ('andrews', 0.037), ('properness', 0.037), ('refix', 0.037), ('sippu', 0.037), ('xpg', 0.037), ('cyclic', 0.037), ('side', 0.036), ('solving', 0.034), ('lemma', 0.034), ('worst', 0.034), ('abney', 0.033), ('infinitely', 0.032), ('kiefer', 0.032), ('infinite', 0.032), ('components', 0.031), ('si', 0.03), ('aho', 0.03), ('bd', 0.03), ('nnd', 0.03), ('precomputed', 0.03), ('newton', 0.03), ('proof', 0.029), ('complexity', 0.029), ('computing', 0.029), ('compute', 0.029), ('left', 0.029), ('sd', 0.028), ('tha', 0.028), ('proofs', 0.028), ('source', 0.027), ('stolcke', 0.027), ('linked', 0.027), ('say', 0.027), ('write', 0.027), ('mcallester', 0.027), ('bound', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
2 0.20313028 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
Author: Andreas Zollmann ; Stephan Vogel
Abstract: In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
3 0.20150819 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
4 0.16136278 296 acl-2011-Terminal-Aware Synchronous Binarization
Author: Licheng Fang ; Tagyoung Chung ; Daniel Gildea
Abstract: We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality.
5 0.14241628 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
6 0.14103349 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
7 0.13701476 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
8 0.13662902 61 acl-2011-Binarized Forest to String Translation
9 0.13512219 30 acl-2011-Adjoining Tree-to-String Translation
10 0.12422217 232 acl-2011-Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars
11 0.10982665 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
12 0.10871977 44 acl-2011-An exponential translation model for target language morphology
13 0.099977516 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
14 0.095500186 11 acl-2011-A Fast and Accurate Method for Approximate String Search
15 0.095379092 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
16 0.08309406 154 acl-2011-How to train your multi bottom-up tree transducer
17 0.082419485 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
18 0.081479669 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
19 0.08110369 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
20 0.079970099 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
topicId topicWeight
[(0, 0.182), (1, -0.143), (2, 0.051), (3, -0.073), (4, -0.0), (5, -0.004), (6, -0.187), (7, -0.047), (8, -0.089), (9, -0.049), (10, -0.069), (11, -0.01), (12, -0.006), (13, 0.131), (14, 0.056), (15, -0.045), (16, -0.002), (17, 0.044), (18, 0.057), (19, 0.012), (20, 0.011), (21, -0.019), (22, -0.039), (23, -0.091), (24, -0.053), (25, -0.027), (26, -0.028), (27, -0.048), (28, 0.038), (29, -0.006), (30, 0.032), (31, 0.061), (32, -0.024), (33, 0.03), (34, -0.061), (35, 0.057), (36, 0.115), (37, 0.099), (38, 0.027), (39, 0.101), (40, -0.081), (41, -0.013), (42, -0.04), (43, 0.002), (44, -0.052), (45, -0.037), (46, -0.018), (47, 0.01), (48, 0.035), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.96312988 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
2 0.86114275 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
3 0.81310582 154 acl-2011-How to train your multi bottom-up tree transducer
Author: Andreas Maletti
Abstract: The local multi bottom-up tree transducer is introduced and related to the (non-contiguous) synchronous tree sequence substitution grammar. It is then shown how to obtain a weighted local multi bottom-up tree transducer from a bilingual and biparsed corpus. Finally, the problem of non-preservation of regularity is addressed. Three properties that ensure preservation are introduced, and it is discussed how to adjust the rule extraction process such that they are automatically fulfilled.
4 0.77401632 296 acl-2011-Terminal-Aware Synchronous Binarization
Author: Licheng Fang ; Tagyoung Chung ; Daniel Gildea
Abstract: We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality.
5 0.69954211 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
Author: Pierluigi Crescenzi ; Daniel Gildea ; Andrea Marino ; Gianluca Rossi ; Giorgio Satta
Abstract: We study the problem offinding the best headdriven parsing strategy for Linear ContextFree Rewriting System productions. A headdriven strategy must begin with a specified righthand-side nonterminal (the head) and add the remaining nonterminals one at a time in any order. We show that it is NP-hard to find the best head-driven strategy in terms of either the time or space complexity of parsing.
6 0.68951553 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
7 0.67939401 61 acl-2011-Binarized Forest to String Translation
8 0.64304978 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
9 0.57783717 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
10 0.57625687 232 acl-2011-Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars
11 0.57371593 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
12 0.57129115 30 acl-2011-Adjoining Tree-to-String Translation
13 0.56197727 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
14 0.54518634 11 acl-2011-A Fast and Accurate Method for Approximate String Search
16 0.52507609 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
17 0.50972867 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
18 0.5089578 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
19 0.49884695 44 acl-2011-An exponential translation model for target language morphology
20 0.48714259 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
topicId topicWeight
[(5, 0.019), (17, 0.086), (26, 0.019), (37, 0.498), (39, 0.043), (41, 0.065), (55, 0.016), (59, 0.025), (72, 0.015), (91, 0.028), (96, 0.093)]
simIndex simValue paperId paperTitle
1 0.97148204 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?
Author: Kevin Duh ; Akinori Fujino ; Masaaki Nagata
Abstract: Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about crosslingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefullydesigned experiments that led us to these conclusions. 1 Summary Question 1: If MT gave perfect translations (semantically), do we still have a domain adaptation challenge in cross-lingual sentiment classification? Answer: Yes. The reason is that while many lations of a word may be valid, the MT system have a systematic bias. For example, the word some” might be prevalent in English reviews, transmight “awebut in 429 translated reviews, the word “excellent” is generated instead. From the perspective of MT, this translation is correct and preserves sentiment polarity. But from the perspective of a classifier, there is a domain mismatch due to differences in word distributions. Question 2: Can we apply standard adaptation algorithms developed for other (monolingual) adaptation problems to cross-lingual adaptation? Answer: No. It appears that the interaction between target unlabeled data and source data can be rather unexpected in the case of cross-lingual adaptation. We do not know the reason, but our experiments show that the accuracy of adaptation algorithms in cross-lingual scenarios have much higher variance than monolingual scenarios. The goal of this opinion piece is to argue the need to better understand the characteristics of domain adaptation in cross-lingual problems. We invite the reader to disagree with our conclusion (that the true barrier to good performance is not insufficient MT quality, but inappropriate domain adaptation methods). Here we present a series of experiments that led us to this conclusion. First we describe the experiment design (§2) and baselines (§3), before answering Question §12 (§4) dan bda Question 32) (§5). 2 Experiment Design The cross-lingual setup is this: we have labeled data from source domain S and wish to build a sentiment classifier for target domain T. Domain mismatch can arise from language differences (e.g. English vs. translated text) or market differences (e.g. DVD vs. Book reviews). Our experiments will involve fixing Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 429–433, T to a common testset and varying S. This allows us to experiment with different settings for adaptation. We use the Amazon review dataset of Prettenhofer (2010)1 , due to its wide range of languages (English [EN], Japanese [JP], French [FR], German [DE]) and markets (music, DVD, books). Unlike Prettenhofer (2010), we reverse the direction of cross-lingual adaptation and consider English as target. English is not a low-resource language, but this setting allows for more comparisons. Each source dataset has 2000 reviews, equally balanced between positive and negative. The target has 2000 test samples, large unlabeled data (25k, 30k, 50k samples respectively for Music, DVD, and Books), and an additional 2000 labeled data reserved for oracle experiments. Texts in JP, FR, and DE are translated word-by-word into English with Google Translate.2 We perform three sets of experiments, shown in Table 1. Table 2 lists all the results; we will interpret them in the following sections. Target (T) Source (S) 312BDMToVuasbDkil-ecE1N:ExpDMB eorVuimsDkice-JEnPtN,s eBD,MtuoVBDpuoVsk:-iFDck-iERxFN,T DB,vVoMaDruky-sSiDc.E-, 3 How much performance degradation occurs in cross-lingual adaptation? First, we need to quantify the accuracy degradation under different source data, without consideration of domain adaptation methods. So we train a SVM classifier on labeled source data3, and directly apply it on test data. The oracle setting, which has no domain-mismatch (e.g. train on Music-EN, test on Music-EN), achieves an average test accuracy of (81.6 + 80.9 + 80.0)/3 = 80.8%4. Aver1http://www.webis.de/research/corpora/webis-cls-10 2This is done by querying foreign words to build a bilingual dictionary. The words are converted to tfidf unigram features. 3For all methods we try here, 5% of the 2000 labeled source samples are held-out for parameter tuning. 4See column EN of Table 2, Supervised SVM results. 430 age cross-lingual accuracies are: 69.4% (JP), 75.6% (FR), 77.0% (DE), so degradations compared to oracle are: -11% (JP), -5% (FR), -4% (DE).5 Crossmarket degradations are around -6%6. Observation 1: Degradations due to market and language mismatch are comparable in several cases (e.g. MUSIC-DE and DVD-EN perform similarly for target MUSIC-EN). Observation 2: The ranking of source language by decreasing accuracy is DE > FR > JP. Does this mean JP-EN is a more difficult language pair for MT? The next section will show that this is not necessarily the case. Certainly, the domain mismatch for JP is larger than DE, but this could be due to phenomenon other than MT errors. 4 Where exactly is the domain mismatch? 4.1 Theory of Domain Adaptation We analyze domain adaptation by the concepts of labeling and instance mismatch (Jiang and Zhai, 2007). Let pt(x, y) = pt (y|x)pt (x) be the target distribution of samples x (e.g. unigram feature vec- tor) and labels y (positive / negative). Let ps (x, y) = ps (y|x)ps (x) be the corresponding source distributio(ny. Wx)pe assume that one (or both) of the following distributions differ between source and target: • Instance mismatch: ps (x) pt (x). • Labeling mismatch: ps (y|x) pt(y|x). Instance mismatch implies that the input feature vectors have different distribution (e.g. one dataset uses the word “excellent” often, while the other uses the word “awesome”). This degrades performance because classifiers trained on “excellent” might not know how to classify texts with the word “awesome.” The solution is to tie together these features (Blitzer et al., 2006) or re-weight the input distribution (Sugiyama et al., 2008). Under some assumptions (i.e. covariate shift), oracle accuracy can be achieved theoretically (Shimodaira, 2000). Labeling mismatch implies the same input has different labels in different domains. For example, the JP word meaning “excellent” may be mistranslated as “bad” in English. Then, positive JP = = 5See “Adapt by Language” columns of Table 2. Note JP+FR+DE condition has 6000 labeled samples, so is not directly comparable to other adaptation scenarios (2000 samples). Nevertheless, mixing languages seem to give good results. 6See “Adapt by Market” columns of Table 2. TargetClassifierOEraNcleJPAFdaRpt bDyE LanJgPu+agFeR+DEMUASdIaCpt D byV MDar BkeOtOK MUSIC-ENSAudpaeprtvedise TdS SVVMM8719..666783..50 7745..62 7 776..937880..36--7768..847745..16 DVD-ENSAudpaeprtveidse TdS SVVMM8801..907701..14 7765..54 7 767..347789..477754..28--7746..57 BOOK-ENSAudpaeprtveidse TdS SVVMM8801..026793..68 7775..64 7 767..747799..957735..417767..24-Table 2: Test accuracies (%) for English Music/DVD/Book reviews. Each column is an adaptation scenario using different source data. The source data may vary by language or by market. For example, the first row shows that for the target of Music-EN, the accuracy of a SVM trained on translated JP reviews (in the same market) is 68.5, while the accuracy of a SVM trained on DVD reviews (in the same language) is 76.8. “Oracle” indicates training on the same market and same language domain as the target. “JP+FR+DE” indicates the concatenation of JP, FR, DE as source data. Boldface shows the winner of Supervised vs. Adapted. reviews ps (y will be associated = +1|x = bad) co(nydit =io +na1l − |x = 1 will be high, whereas the true xdis =tr bibaudti)o wn bad) instead. labeling mismatch, with the word “bad”: lslh boeu hldi hha,v we high pt(y = There are several cases for depending on sheovwe tahle c polarity changes (Table 3). The solution is to filter out these noisy samples (Jiang and Zhai, 2007) or optimize loosely-linked objectives through shared parameters or Bayesian priors (Finkel and Manning, 2009). Which mismatch is responsible for accuracy degradations in cross-lingual adaptation? • Instance mismatch: Systematic Iantessta nwcoerd m diissmtraibtcuhti:on Ssy MT bias gener- sdtiefmferaetinct MfroTm b naturally- occurring English. (Translation may be valid.) Label mismatch: MT error mis-translates a word iLnatob something w: MithT Td eifrfreorren mti polarity. Conclusion from §4.2 and §4.3: Instance mismaCtcohn occurs often; M §4T. error appears Imnisntainmcael. • Mis-translated polarity Effect Taeb0+±.lge→ .3(:±“ 0−tgLhoae b”nd →l m− i“sg→m otbah+dce”h):mIfpoLAinse ca-ptsoriuaesncvieatl /ndioeansgbvcaewrptlimovaeshipntdvaei(+), negative (−), or neutral (0) words have different effects. Wnege athtiivnek ( −th)e, foirrs nt tuwtroa cases hoardves graceful degradation, but the third case may be catastrophic. 431 4.2 Analysis of Instance Mismatch To measure instance mismatch, we compute statistics between ps (x) and pt(x), or approximations thereof: First, we calculate a (normalized) average feature from all samples of source S, which represents the unigram distribution of MT output. Simi- larly, the average feature vector for target T approximates the unigram distribution of English reviews pt(x). Then we measure: • KL Divergence between Avg(S) and Avg(T), wKhLer De Avg() nisc eth bee average Avvegct(oSr.) • Set Coverage of Avg(T) on Avg(S): how many Sweotrd C (type) ien o Tf appears oatn le Aavsgt once ionw wS .m Both measures correlate strongly with final accuracy, as seen in Figure 1. The correlation coefficients are r = −0.78 for KL Divergence and r = 0.71 for Coverage, 0 b.7o8th statistically significant (p < 0.05). This implies that instance mismatch is an important reason for the degradations seen in Section 3.7 4.3 Analysis of Labeling Mismatch We measure labeling mismatch by looking at differences in the weight vectors of oracle SVM and adapted SVM. Intuitively, if a feature has positive weight in the oracle SVM, but negative weight in the adapted SVM, then it is likely a MT mis-translation 7The observant reader may notice that cross-market points exhibit higher coverage but equal accuracy (74-78%) to some cross-lingual points. This suggests that MT output may be more constrained in vocabulary than naturally-occurring English. 0.35 0.3 gnvLrDeiceKe0 0 0. 120.25 510 erts TeCovega0 0 0. .98657 68 70 72 7A4ccuracy76 78 80 82 0.4 68 70 72 7A4ccuracy76 78 80 82 Figure 1: KL Divergence and Coverage vs. accuracy. (o) are cross-lingual and (x) are cross-market data points. is causing the polarity flip. Algorithm 1 (with K=2000) shows how we compute polarity flip rate.8 We found that the polarity flip rate does not correlate well with accuracy at all (r = 0.04). Conclusion: Labeling mismatch is not a factor in performance degradation. Nevertheless, we note there is a surprising large number of flips (24% on average). A manual check of the flipped words in BOOK-JP revealed few MT mistakes. Only 3.7% of 450 random EN-JP word pairs checked can be judged as blatantly incorrect (without sentence context). The majority of flipped words do not have a clear sentiment orientation (e.g. “amazon”, “human”, “moreover”). 5 Are standard adaptation algorithms applicable to cross-lingual problems? One of the breakthroughs in cross-lingual text classification is the realization that it can be cast as domain adaptation. This makes available a host of preexisting adaptation algorithms for improving over supervised results. However, we argue that it may be 8The feature normalization in Step 1 is important that the weight magnitudes are comparable. to ensure 432 Algorithm 1 Measuring labeling mismatch Input: Weight vectors for source wsand target wt Input: Target data average sample vector avg(T) Output: Polarity flip rate f 1: Normalize: ws = avg(T) * ws ; wt = avg(T) * wt 2: Set S+ = { K most positive features in ws} 3: Set S− == {{ KK mmoosstt negative ffeeaattuurreess inn wws}} 4: Set T+ == {{ KK m moosstt npoesgiatitivvee f efeaatuturreess i inn w wt}} 5: Set T− == {{ KK mmoosstt negative ffeeaattuurreess inn wwt}} 6: for each= f{e a Ktur me io ∈t T+ adtiov 7: rif e ia c∈h S fe−a ttuhreen i if ∈ = T f + 1 8: enidf fio ∈r 9: for each feature j ∈ T− do 10: rif e j ∈h Sfe+a uthreen j f ∈ = T f + 1 11: enidf fjo r∈ 12: f = 2Kf better to “adapt” the standard adaptation algorithm to the cross-lingual setting. We arrived at this conclusion by trying the adapted counterpart of SVMs off-the-shelf. Recently, (Bergamo and Torresani, 2010) showed that Transductive SVMs (TSVM), originally developed for semi-supervised learning, are also strong adaptation methods. The idea is to train on source data like a SVM, but encourage the classification boundary to divide through low density regions in the unlabeled target data. Table 2 shows that TSVM outperforms SVM in all but one case for cross-market adaptation, but gives mixed results for cross-lingual adaptation. This is a puzzling result considering that both use the same unlabeled data. Why does TSVM exhibit such a large variance on cross-lingual problems, but not on cross-market problems? Is unlabeled target data interacting with source data in some unexpected way? Certainly there are several successful studies (Wan, 2009; Wei and Pal, 2010; Banea et al., 2008), but we think it is important to consider the possibility that cross-lingual adaptation has some fundamental differences. We conjecture that adapting from artificially-generated text (e.g. MT output) is a different story than adapting from naturallyoccurring text (e.g. cross-market). In short, MT is ripe for cross-lingual adaptation; what is not ripe is probably our understanding of the special characteristics of the adaptation problem. References Carmen Banea, Rada Mihalcea, Janyce Wiebe, and Samer Hassan. 2008. Multilingual subjectivity analysis using machine translation. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS). John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Jenny Rose Finkel and Chris Manning. 2009. Hierarchical Bayesian domain adaptation. In Proc. of NAACL Human Language Technologies (HLT). Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proc. of the Association for Computational Linguistics (ACL). Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In Proc. of the Association for Computational Linguistics (ACL). Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inferenc, 90. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B ¨unau, and Motoaki Kawanabe. 2008. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4). Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proc. of the Association for Computational Linguistics (ACL). Bin Wei and Chris Pal. 2010. Cross lingual adaptation: an experiment on sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers. 433
same-paper 2 0.95208883 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
3 0.93698776 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
Author: Roy Schwartz ; Omri Abend ; Roi Reichart ; Ari Rappoport
Abstract: Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon.
4 0.93648815 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai
Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.
5 0.92219967 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
Author: Bing Xiang ; Abraham Ittycheriah
Abstract: In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination ofmultiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways, such as based on feature types, word alignments, or domains, for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task.
6 0.92044306 122 acl-2011-Event Extraction as Dependency Parsing
7 0.91826552 334 acl-2011-Which Noun Phrases Denote Which Concepts?
8 0.91485429 204 acl-2011-Learning Word Vectors for Sentiment Analysis
10 0.81311309 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
11 0.81294811 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification
12 0.81019533 256 acl-2011-Query Weighting for Ranking Model Adaptation
13 0.80976856 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
14 0.79424828 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
15 0.79314959 85 acl-2011-Coreference Resolution with World Knowledge
16 0.78088117 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
17 0.78007674 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
18 0.77572078 199 acl-2011-Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning
19 0.77255344 292 acl-2011-Target-dependent Twitter Sentiment Classification
20 0.77141535 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation