acl acl2011 acl2011-235 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. [sent-3, score-0.203]
2 In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. [sent-4, score-0.644]
3 We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. [sent-5, score-0.692]
4 This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model. [sent-6, score-0.646]
5 1 Introduction Natural language processing problems frequently involve scenarios in which a pair or group of related sentences need to be aligned to each other, establishing links between their common words or phrases. [sent-7, score-0.029]
6 For instance, most approaches for natural language inference (NLI) rely on alignment techniques to establish the overlap between the given premise and a hypothesis before determining if the former entails the latter. [sent-8, score-0.341]
7 Such monolingual alignment techniques are also frequently employed in systems for paraphrase generation, multi-document summarization, sentence fusion and question answering. [sent-9, score-0.487]
8 , 2008) has presented a phrase-based monolingual aligner for NLI 254 (MANLI) that has been shown to significantly outperform a token-based NLI aligner (Chambers et al. [sent-11, score-0.23]
9 , 2007) as well as popular alignment techniques borrowed from machine translation (Och and Ney, 2003; Liang et al. [sent-12, score-0.266]
10 However, MANLI’s use of a phrase-based alignment representation appears to pose a challenge to the decoding task, i. [sent-14, score-0.483]
11 the task of recovering the highest-scoring alignment un- der some parameters. [sent-16, score-0.266]
12 (2008) employ a stochastic search algorithm to decode alignments approximately while remaining consistent with regard to phrase segmentation. [sent-18, score-0.3]
13 In this paper, we propose an exact decoding technique for MANLI that retrieves the globally optimal alignment for a sentence pair given some parameters. [sent-19, score-0.586]
14 Our approach is based on integer linear programming (ILP) and can leverage optimized general-purpose LP solvers to recover exact solutions. [sent-20, score-0.224]
15 This strategy boosts decoding speed by an order of magnitude over stochastic search in our experiments. [sent-21, score-0.307]
16 Additionally, we introduce hard syntactic constraints on alignments produced by the model, yielding better precision and a large increase in the number of perfect alignments produced over our evaluation corpus. [sent-22, score-0.601]
17 , 2006) but the task is often substantively different from monolingual alignment, which poses unique challenges depending on the application (MacCart- ney et al. [sent-25, score-0.095]
18 Outside of NLI, prior research has also explored the task of monolingual word alignProceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-27, score-0.06]
19 It has also been recently employed for finding phrase-based MT alignments (DeNero and Klein, 2008) in a manner similar to this work; however, we further build upon this model through syntactic constraints on the words participating in alignments. [sent-34, score-0.344]
20 3 The MANLI Aligner Our alignment system is structured identically to MANLI (MacCartney et al. [sent-35, score-0.332]
21 An alignment E between two fragments of text T1 and T2 is represented by a set of edits {e1, e2, . [sent-37, score-0.509]
22 }, each belonging eton one oyf ath see following types: INS and DEL edits covering unaligned words in T1 and T2 respectively • SUB and EQ edits connecting a phrase in T1 to a phrase in T2. [sent-40, score-0.54]
23 EQ edits are a specific case of SUB edits that denote a word/lemma match; we refer to both types as SUB edits in this paper. [sent-41, score-0.729]
24 Every token in T1 and T2 participates in exactly one edit. [sent-42, score-0.088]
25 While alignments are one-to-one at the phrase level, a phrase-based representation effectively permits many-to-many alignments at the token level. [sent-43, score-0.419]
26 This enables the aligner to properly link paraphrases such as death penalty and capital punishment by exploiting lexical resources. [sent-44, score-0.085]
27 1 Dataset MANLI was trained and evaluated on a corpus of human-generated alignment annotations produced by Microsoft Research (Brockett, 2007) for inference problems from the second Recognizing Textual Entailment (RTE2) challenge (Bar-Haim et al. [sent-46, score-0.368]
28 The corpus consists of a development set and test set that both feature 800 inference problems, each of which consists of a premise, a hypothesis and three independently-annotated human alignments. [sent-48, score-0.037]
29 2 Features A MANLI alignment is scored as a sum of weighted feature values over the edits that it contains. [sent-52, score-0.509]
30 Features encode the type of edit, the size of the phrases involved in SUB edits, whether the phrases are constituents and their similarity (determined by leveraging various lexical resources). [sent-53, score-0.117]
31 Additionally, contextual features note the similarity of neighboring words and the relative positions of phrases while a positional distortion feature accounts for the difference between the relative positions of SUB edit phrases in their respective sentences. [sent-54, score-0.205]
32 (2008) with some minor changes: we use a shallow parser (Daum ´e and Marcu, 2005) for detecting constituents and employ only string similarity and WordNet for determining semantic relatedness, forgoing NomBank and the distributional similarity resources used in the original MANLI implementation. [sent-56, score-0.066]
33 3 Parameter Inference Feature weights are learned using the averaged structured perceptron algorithm (Collins, 2002), an intuitive structured prediction technique. [sent-58, score-0.195]
34 4 Decoding The decoding problem is that of finding the highestscoring alignment under some parameter values for the model. [sent-64, score-0.483]
35 MANLI’s phrase-based representation makes decoding more complex because the segmentation of T1 and T2 into phrases is not known beforehand. [sent-65, score-0.259]
36 Every pair of phrases considered for inclusion in an alignment must adhere to some consistent segmentation so that overlapping edits and uncovered words are avoided. [sent-66, score-0.578]
37 Consequently, the decoding problem cannot be factored into a number of independent decisions and MANLI searches for a good alignment using a stochastic simulated annealing strategy. [sent-67, score-0.579]
38 While seemingly quite effective at avoiding local maxima, SystemDataP%R%F1%E% ( M r e A ipmoN rpL tleIedm2e0n0t8ed) td e e s v t8 8 7535. [sent-68, score-0.028]
39 86 Table 1: Performance ofaligners in terms ofprecision, recall, F-measure and number of perfect alignments (E%). [sent-80, score-0.191]
40 this iterative search strategy is computationally expensive and moreover is not guaranteed to return the highest-scoring alignment under the parameters. [sent-81, score-0.326]
41 4 Exact Decoding via ILP Instead of resorting to approximate solutions, we can simply reformulate the decoding problem as the optimization of a linear objective function with linear constraints, which can be solved by well-studied algorithms using off-the-shelf solvers1 . [sent-82, score-0.365]
42 We first de- × fine boolean indicator variables xe for every possible edit e between T1 and T2 that indicate whether e is present in the alignment or not. [sent-83, score-0.541]
43 The linear objective that maximizes the score of edits for a given parameter vector w is expressed as follows: f(w) = maxXxe XXe = maxXxe Xe scorew(e) w · Φ(e) (1) where Φ(e) is the feature vector over an edit. [sent-84, score-0.283]
44 This expresses the score of an alignment as the sum of scores of edits that are present in it, i. [sent-85, score-0.509]
45 In order to address the phrase segmentation issue discussed in §3. [sent-88, score-0.027]
46 4, we merely need to add linear consdtirsaciunstss ensuring ,th waet every yto nkeeend participates irn c exactly one edit. [sent-89, score-0.105]
47 t e covers itnogke thn et nino one onf e eits ≺ phrases, this constraint can be encoded as: X xe = 1 ∀t ∈ Ti, i= {1,2} eX: Xe≺t On solving this integer program, the values of the variables xe indicate which edits are present in the 1We use LPsolve: http : / / lp s o lve . [sent-91, score-0.696]
48 45 Table 2: Approximate running time per decoding task in seconds for the search-based approximate decoder and the ILP-based exact decoder on various corpora (see text for details). [sent-100, score-0.354]
49 A similar approach is employed by DeNero and Klein (2008) for finding optimal phrase-based alignments for MT. [sent-102, score-0.189]
50 1 Alignment experiments For evaluation purposes, we compare the performance of approximate search decoding against exact ILP-based decoding on a reimplementation of MANLI as described in §3. [sent-104, score-0.658]
51 n Aolfl mtheo Meliscr aroeso trfat Rneedsearch RTE2 alignment corpus (cf. [sent-106, score-0.266]
52 Aligner performance is determined by counting aligned token pairs per problem and macro-averaging over all problems. [sent-111, score-0.049]
53 (2008), gaining 2% in precision, 1% in recall and 2-3% in the fraction of alignments that exactly matched human annotations. [sent-114, score-0.155]
54 We attribute at least some part of this gain to our modified parameter inference (cf. [sent-115, score-0.037]
55 3) which avoids normalizing tehtee rst inrufecrtuernecde perceptron weights iadnsd n ionrsmteaaldadheres closely to the algorithm of Collins (2002). [sent-117, score-0.036]
56 Although exact decoding improves alignment performance over the approximate search approach, the gain is marginal and not significant. [sent-118, score-0.648]
57 This seems to indicate that the simulated annealing search strategy is fairly effective at avoiding local maxima and finding the highest-scoring alignments. [sent-119, score-0.228]
58 2 Runtime experiments Table 2 contains the results from timing alignment tasks over various corpora on the same machine using the models trained as per §4. [sent-121, score-0.266]
59 The short hypotheses featured in the RTE2 corpus (averaging 11 words) dampen the effect of the quadratic growth in number of edits with sentence length. [sent-125, score-0.243]
60 The large difference in decoding time illustrates the scaling limitations of the searchbased decoder. [sent-128, score-0.243]
61 5 Syntactically-Informed Constraints The use of an integer program for decoding provides us with a convenient mechanism to prevent common alignment errors by introducting additional constraints on edits. [sent-129, score-0.721]
62 For example, function words such as determiners and prepositions are often mis- aligned just because they occur frequently in many different contexts. [sent-130, score-0.061]
63 Although MANLI makes use of contextual features which consider the similarity of neighboring words around phrase pairs, outof-context alignments of function words often appear in the output. [sent-131, score-0.242]
64 We address this issue by adding constraints to the integer program from §4 that look at nthster syntactic structure porfo T1 man fdr T2 a§n4d t prevent matching function words from appearing in an alignment unless they are syntactically linked with other words that are aligned. [sent-132, score-0.504]
65 To enforce token-based constraints, we define boolean indicator variables yt for each token t in text snippets T1 and T2 that indicate whether t is involved in a SUB edit or not. [sent-133, score-0.325]
66 The following constraint ensures that yt = 1if and only if it is covered by a SUB edit that is present in the alignment. [sent-134, score-0.163]
67 yt−X ee:X is e≺SUtB, xe = 0 ∀t ∈ Ti, i = {1,2} We refer to tokens t with yt = 1as being active in the alignment. [sent-135, score-0.308]
68 Constraints can now be applied over any token with specific part-of-speech (POS) tag in 2Our Python reimplementation closely follows the original Java implementation of MANLI and was optimized for performance. [sent-136, score-0.108]
69 (2008) report a decoding time of about 2 seconds per problem. [sent-138, score-0.217]
70 904 Table 3: Performance of MANLI-Exact featuring additional modifier (M) and lineage (L) constraints. [sent-151, score-0.104]
71 Figures in boldface are statistically significant over the unconstrained MANLI reimplementation (p ≤ 0. [sent-152, score-0.096]
72 order to ensure that it can only be active if a different token related to it in a dependency parse of the sentence is also active. [sent-154, score-0.124]
73 We consider the following classes of constraints: Modifier constraints: Tokens t that represent conjunctions, determiners, modals and cardinals can only be active if their parent tokens π(t) are active. [sent-155, score-0.077]
74 <= 0 if POS(t) ∈ {CC, CD, MD, DT, PDT, WDT} yt − yπ(t) Lineage constraints: Tokens t that represent prepositions and particles (which are often confused by parsers) can only be active if one of their ancestors α(t) or descendants δ(t) is active. [sent-156, score-0.166]
75 These constraints are less restrictive than the modifier constraints in order to account for attachment errors. [sent-157, score-0.355]
76 1 Alignment experiments A TAG-based probabilistic dependency parser (Bangalore et al. [sent-159, score-0.038]
77 , 2009) is used to formulate the above constraints in our experiments. [sent-160, score-0.155]
78 The results are shown in Table 3 and indicate a notable increase in alignment precision, which is to be expected as the constraints specifically seek to exclude poor edits. [sent-161, score-0.478]
79 Most compellingly, the number of perfect alignments produced by the system increases significantly when compared to the unconstrained models from Table 1 (a relative increase of 35% on the test corpus). [sent-163, score-0.292]
80 6 Discussion The results of our evaluation indicate that exact decoding via ILP is a robust and efficient technique for solving alignment problems. [sent-164, score-0.615]
81 Furthermore, the incorporation of simple constraints over a dependency parse can help to shape more accurate alignments. [sent-165, score-0.193]
82 An examination of the alignments produced by our system reveals that many remaining errors can be tackled by the use of named-entity recognition and better paraphrase corpora; this was also noted by MacCartney et al. [sent-166, score-0.25]
83 In addition, stricter constraints that enforce the alignment of syntactically-related tokens (rather than just their inclusion in the solution) may also yield performance gains. [sent-168, score-0.516]
84 The interaction between the selection of soft features for structured prediction and hard constraints for decoding is an interesting avenue for further research on this task. [sent-170, score-0.465]
85 Specific features that approximate soft variants of these constraints could also be devised but this was not explored here. [sent-172, score-0.223]
86 In addition to the NLI applications considered in this work, we have also employed the MANLI alignment technique to tackle alignment problems that are not inherently asymmetric such as the sentence fusion problems from McKeown et al. [sent-173, score-0.763]
87 Although the absence of asymmetric alignment features affects performance marginally over the RTE2 dataset, all the performance gains exhibited by exact decoding with constraints appear to be preserved in symmetric settings. [sent-175, score-0.744]
88 258 7 Conclusion We present a simple exact decoding technique as an alternative to approximate search-based decoding in MANLI that exhibits a twenty-fold improvement in runtime performance in our experiments. [sent-176, score-0.641]
89 In addition, we propose novel syntactically-informed constraints to increase precision. [sent-177, score-0.183]
90 5% in precision and 1% in recall, with a large gain in the number of perfect alignments over the test corpus. [sent-180, score-0.191]
91 Finally, we analyze the alignments produced and suggest that further improvements are possible through careful feature/constraint design, as well as the use of named-entity recognition and additional resources. [sent-181, score-0.191]
92 MICA: a probabilistic dependency parser based on tree insertion grammars. [sent-189, score-0.038]
93 Global inference for sentence compression: an integer linear programming approach. [sent-210, score-0.192]
94 Learning as search optimization: approximate large margin methods for structured prediction. [sent-218, score-0.162]
wordName wordTfidf (topN-words)
[('manli', 0.622), ('alignment', 0.266), ('edits', 0.243), ('maccartney', 0.232), ('decoding', 0.217), ('constraints', 0.155), ('alignments', 0.155), ('nli', 0.148), ('xe', 0.129), ('sub', 0.116), ('yt', 0.102), ('aligner', 0.085), ('integer', 0.083), ('ilp', 0.072), ('exact', 0.069), ('approximate', 0.068), ('fusion', 0.068), ('structured', 0.066), ('edit', 0.061), ('monolingual', 0.06), ('lineage', 0.059), ('maxxxe', 0.059), ('systemdatap', 0.059), ('reimplementation', 0.059), ('paraphrase', 0.059), ('bill', 0.055), ('thadani', 0.052), ('woodsend', 0.052), ('mckeown', 0.05), ('token', 0.049), ('filippova', 0.048), ('kapil', 0.045), ('maxima', 0.045), ('entailment', 0.045), ('modifier', 0.045), ('denero', 0.043), ('textual', 0.043), ('phrases', 0.042), ('chambers', 0.041), ('tokens', 0.04), ('linear', 0.04), ('recognising', 0.039), ('participates', 0.039), ('premise', 0.038), ('dependency', 0.038), ('inference', 0.037), ('unconstrained', 0.037), ('asymmetric', 0.037), ('active', 0.037), ('perfect', 0.036), ('produced', 0.036), ('clarke', 0.036), ('runtime', 0.036), ('annealing', 0.036), ('eq', 0.036), ('perceptron', 0.036), ('ney', 0.035), ('pascal', 0.035), ('employed', 0.034), ('decode', 0.034), ('determiners', 0.034), ('technique', 0.034), ('brockett', 0.033), ('permits', 0.033), ('similarity', 0.033), ('bangalore', 0.032), ('programming', 0.032), ('strategy', 0.032), ('lp', 0.031), ('martins', 0.031), ('boolean', 0.03), ('simulated', 0.03), ('aligning', 0.03), ('stochastic', 0.03), ('indicate', 0.029), ('mirella', 0.029), ('problems', 0.029), ('search', 0.028), ('enforce', 0.028), ('avoiding', 0.028), ('emnlp', 0.028), ('increase', 0.028), ('liang', 0.028), ('daum', 0.027), ('prepositions', 0.027), ('vogel', 0.027), ('quirk', 0.027), ('prediction', 0.027), ('phrase', 0.027), ('neighboring', 0.027), ('inclusion', 0.027), ('variables', 0.026), ('regard', 0.026), ('searchbased', 0.026), ('kapi', 0.026), ('lilian', 0.026), ('ferro', 0.026), ('syntacticallyinformed', 0.026), ('yto', 0.026), ('thn', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
2 0.20832606 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
3 0.14677142 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
Author: Coskun Mermer ; Murat Saraclar
Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r
4 0.14601485 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
Author: John DeNero ; Klaus Macherey
Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.
5 0.14093204 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
6 0.14009386 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
7 0.13031662 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
8 0.12621294 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
9 0.12006404 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
10 0.11324801 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
11 0.10784341 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
12 0.10613239 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
13 0.10565855 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
14 0.094466947 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
15 0.094254464 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
16 0.093742907 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
17 0.091995351 106 acl-2011-Dual Decomposition for Natural Language Processing
18 0.089807853 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
19 0.087039985 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
20 0.086740837 144 acl-2011-Global Learning of Typed Entailment Rules
topicId topicWeight
[(0, 0.207), (1, -0.123), (2, 0.052), (3, 0.049), (4, 0.031), (5, 0.015), (6, 0.048), (7, 0.042), (8, -0.038), (9, -0.015), (10, 0.118), (11, 0.199), (12, 0.016), (13, 0.101), (14, -0.147), (15, 0.034), (16, 0.075), (17, -0.036), (18, -0.135), (19, 0.007), (20, -0.011), (21, -0.068), (22, -0.058), (23, 0.071), (24, -0.01), (25, -0.049), (26, 0.004), (27, 0.056), (28, -0.041), (29, 0.046), (30, -0.026), (31, 0.069), (32, -0.01), (33, -0.024), (34, -0.041), (35, 0.021), (36, -0.041), (37, 0.004), (38, -0.024), (39, -0.06), (40, 0.109), (41, 0.055), (42, -0.022), (43, -0.019), (44, 0.087), (45, 0.016), (46, 0.049), (47, -0.047), (48, -0.003), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.95995563 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
2 0.79481375 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
Author: John DeNero ; Klaus Macherey
Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.
3 0.70092762 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
Author: Jinxi Xu ; Jinying Chen
Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1
4 0.69974387 141 acl-2011-Gappy Phrasal Alignment By Agreement
Author: Mohit Bansal ; Chris Quirk ; Robert Moore
Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.
5 0.6838811 106 acl-2011-Dual Decomposition for Natural Language Processing
Author: Alexander M. Rush and Michael Collins
Abstract: unkown-abstract
6 0.68321645 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
7 0.68251359 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
8 0.63499427 340 acl-2011-Word Alignment via Submodular Maximization over Matroids
9 0.60933405 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
10 0.59515643 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
11 0.57809162 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
12 0.57647038 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation
13 0.56519663 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
14 0.56282628 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
15 0.53469241 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
16 0.53451186 220 acl-2011-Minimum Bayes-risk System Combination
17 0.52764666 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
18 0.52421635 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
19 0.45976114 217 acl-2011-Machine Translation System Combination by Confusion Forest
20 0.45537764 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
topicId topicWeight
[(5, 0.047), (17, 0.049), (26, 0.02), (28, 0.205), (31, 0.012), (37, 0.08), (39, 0.056), (41, 0.078), (53, 0.028), (55, 0.041), (59, 0.075), (72, 0.046), (91, 0.033), (96, 0.122), (97, 0.011), (98, 0.025)]
simIndex simValue paperId paperTitle
1 0.80644441 267 acl-2011-Reversible Stochastic Attribute-Value Grammars
Author: Daniel de Kok ; Barbara Plank ; Gertjan van Noord
Abstract: An attractive property of attribute-value grammars is their reversibility. Attribute-value grammars are usually coupled with separate statistical components for parse selection and fluency ranking. We propose reversible stochastic attribute-value grammars, in which a single statistical model is employed both for parse selection and fluency ranking.
same-paper 2 0.79832768 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
3 0.78810811 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
4 0.73075366 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
Author: Yue Zhang ; Joakim Nivre
Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.
5 0.68154472 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
6 0.6803031 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
7 0.67702264 282 acl-2011-Shift-Reduce CCG Parsing
8 0.67336738 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
9 0.67025554 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
10 0.67019248 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
11 0.66992486 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
12 0.66899002 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
13 0.66867906 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
14 0.66762769 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
15 0.6663537 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
16 0.66594625 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
17 0.66536814 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
18 0.66308367 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
19 0.66242796 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
20 0.66203701 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics