acl acl2013 acl2013-9 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.
Reference: text
sentIndex sentText sentNum sentScore
1 A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao and Benjamin Van Durme Johns Hopkins University Baltimore, MD, USA Chris Callison-Burch∗ University of Pennsylvania Philadelphia, PA, USA Abstract Fast alignment is essential for many natural language tasks. [sent-1, score-0.292]
2 But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. [sent-2, score-0.296]
3 We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. [sent-3, score-1.018]
4 Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system. [sent-4, score-0.535]
5 1 Introduction In statistical machine translation, alignment is typically done as a one-off task during training. [sent-5, score-0.292]
6 However for monolingual tasks, like recognizing textual entailment or question answering, alignment happens repeatedly: once or multiple times per test item. [sent-6, score-0.596]
7 Therefore, the efficiency of the aligner is of utmost importance for monolingual alignment tasks. [sent-7, score-0.781]
8 These distinctions suggest a model design that utilizes arbitrary features (to make use of word similarity measure and lexical resources) and exploits deeper sentence structures (especially in the case of major languages where robust parsers are available). [sent-9, score-0.159]
9 , 2008), used roughly 5GB of lexical resources and took 2 seconds per alignment, making it hard to be deployed and run in large scale. [sent-14, score-0.128]
10 2) is able to align 10, 000 pairs per second wh§e4n. [sent-18, score-0.126]
11 2 t)h ies sse anbtleen ctoes a are pre-parsed, a birust with significantly reduced performance. [sent-19, score-0.035]
12 Trying to embrace the merits of both worlds, we introduce a discriminative aligner that is able to align tens to hundreds of sentence pairs per second, and needs access only to a POS tagger and WordNet. [sent-20, score-0.585]
13 This aligner gives state-of-the-art performance on the MSR RTE2 alignment dataset (Brockett, 2007), is faster than previous work, and we release it publicly as the first open-source monolingual word aligner: Jacana. [sent-21, score-0.871]
14 1 2 Related Work The MANLI aligner (MacCartney et al. [sent-23, score-0.354]
15 , 2008) was first proposed to align premise and hypothesis sentences for the task of natural language inference. [sent-24, score-0.236]
16 It applies perceptron learning and handles phrase-based alignment of arbitrary phrase lengths. [sent-25, score-0.292]
17 Thadani and McKeown (201 1) optimized this model by decoding via Integer Linear Programming (ILP). [sent-26, score-0.069]
18 With extra syntactic constraints added, the exact alignment match rate for whole sentence pairs was also significantly improved. [sent-28, score-0.478]
19 Heilman and Smith (2010) used tree kernels to search for the alignment that 1http : / / code . [sent-32, score-0.359]
20 com/p / j acana / 702 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-34, score-0.035]
21 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 702–707, yields the lowest tree edit distance. [sent-36, score-0.192]
22 Other tree or graph matching work for alignment includes that of (Punyakanok et al. [sent-37, score-0.359]
23 Finally, feature and model design in monolingual alignment is often inspired by bilingual work, including distortion modeling, phrasal alignment, syntactic constraints, etc (Och and Ney, 2003; DeNero and Klein, 2007; Bansal et al. [sent-40, score-0.527]
24 1 Model Design Our work is heavily influenced by the bilingual alignment literature, especially the discriminative model proposed by Blunsom and Cohn (2006). [sent-43, score-0.292]
25 Given a source sentence s of length M, and a target sentence t of length N, the alignment from s to t is a sequence of target word indices a, where ∈ [0, N] . [sent-44, score-0.767]
26 We specify that when am = 0, source wo∈rd [ st i]s. [sent-45, score-0.069]
27 This models a many-to-one alignment from source to target. [sent-49, score-0.361]
28 Multiple source words can be aligned to the same target word, but not vice versa. [sent-50, score-0.241]
29 One-to-many alignment can be obtained by running the aligner in the other direction. [sent-51, score-0.646]
30 The probability of alignment sequence a conditioned on both s and t is then: am∈[1,M] p(a | s,t) =exp(Pm,kλkZfk(s(a,tm)−1,am,s,t)) This assumes a first-order Conditional Random Field (Lafferty et al. [sent-52, score-0.292]
31 Instead of directly optimizing F1, we employ softmax-margin training (Gimpel and Smith, 2010) and add a cost function to the normalizing function Z(s, t) in the denominator, which becomes: Xexp(Xλkfk( aˆm−1, aˆm,s,t) + cost(at, aˆ)) Xa Xm,k where at is the true alignments. [sent-55, score-0.095]
32 It is only computed during training in the denominator because cost(at , at) = 0 in the numerator. [sent-57, score-0.047]
33 One distinction of this alignment model compared to other commonly defined CRFs is that the input is two dimensional: at each position m, the model inspects both the entire sequence of source words (as the observation) and target words (whose offset indices are states). [sent-59, score-0.509]
34 The other distinction is that the size of its state space is not fixed (e. [sent-60, score-0.038]
35 , unlike POS tagging, where states are for instance 45 Penn Treebank tags), but depends on N, the length of target sentence. [sent-62, score-0.139]
36 Thus we can not “memorize” what features are mostly associated with what states. [sent-63, score-0.039]
37 For instance, in the task of tagging mail addresses, a feature of “5 consecutive digits” is highly indicative of a POSTCODE. [sent-64, score-0.056]
38 However, in the alignment model, it does not make sense to design features based on a hard-coded state, say, a feature of “source word lemma match- ing target word lemma” fires for state index 6. [sent-65, score-0.59]
39 To avoid this data sparsity problem, all features are defined implicitly with respect to the state. [sent-66, score-0.039]
40 For instance: fk(am−1,am,s,t) =(01 loethmemrwasis ematch: sm,tam Thus this feature fires for, e. [sent-67, score-0.087]
41 2 Also, two binary features are added for identical match and identical match ignoring case. [sent-72, score-0.181]
42 POS Tags Features are binary indicators of whether the POS tags of two words match. [sent-73, score-0.05]
43 Also, a “possrc2postgt” feature fires for each word pair, with respect to their POS tags. [sent-74, score-0.087]
44 , “vbz2nn”, when a verb such as arrests aligns with a noun such as custody. [sent-77, score-0.043]
45 Positional Feature is a real-valued feature for the positional difference of the source and target word (abs(Mm −aNm)). [sent-78, score-0.23]
46 3 Distortion Features measure how far apart the aligned target words of two consecutive source words are: abs(am 1 − am−1) . [sent-81, score-0.297]
47 This learns a general pattern of w+het 1he −r t haese two target words aligned with two consecutive source words are usually far away from each other, or very close. [sent-82, score-0.297]
48 We also added special features for corner cases where the current word starts or ends the source sentence, or both the previous and current words + are deleted (a transition from NULL to NULL). [sent-83, score-0.108]
49 Contextual Features indicate whether the left or the right neighbor of the source word and aligned target word are identical or similar. [sent-84, score-0.241]
50 This helps especially when aligning functional words, which usually have multiple candidate target functional words to align to and string similarity features cannot help. [sent-85, score-0.347]
51 We also added features for neighboring POS tags matching. [sent-86, score-0.089]
52 3 Symmetrization To expand from many-to-one alignment to manyto-many, we ran the model in both directions and applied the following symmetrization heuristics (Koehn, 2010): INTERSECTION, UNION, GROWDIAG-FINAL. [sent-88, score-0.372]
53 1 Setup Since no generic off-the-shelf CRF software is designed to handle the special case of dynamic state indices and feature functions (Blunsom and Cohn, 2006), we implemented this aligner model in the Scala programming language, which is fully interoperable with Java. [sent-90, score-0.45]
54 OpenNLP4 provided the POS tagger and JWNL5 interfaced with WordNet (Fellbaum, 1998). [sent-92, score-0.07]
55 Training and test data (Brockett, 2007) each contains 800 manually aligned premise and hypothesis pairs from RTE2. [sent-95, score-0.235]
56 We take the premise as the source and hypothesis as the target, and use S2T to indicate the model aligns from 3We found that each word has to be POS tagged to get an accurate relation, otherwise this feature will not help. [sent-97, score-0.265]
57 net / source to target and T2S from target to source. [sent-102, score-0.249]
58 One was GIZA++, with the IN- TERSECTION tricks post-applied, which worked the best among all other symmetrization heuristics. [sent-105, score-0.08]
59 We used uniform cost for deletion, insertion and substitutions, and applied a dynamic program algorithm (Zhang and Shasha, 1989) to decode the tree edit sequence with the minimal cost, based on the Stanford dependency tree (De Marneffe and Manning, 2008). [sent-108, score-0.393]
60 This non-probabilistic approach turned out to be extremely fast, processing about 10,000 sentence pairs per second with preparsed trees, performing quantitatively better than the Stanford RTE aligner (Chambers et al. [sent-109, score-0.432]
61 (2008), and then improved by Thadani and McKeown (201 1) with faster and exact decoding via ILP. [sent-113, score-0.239]
62 4 Results Following Thadani and McKeown (201 1), performance is evaluated by macro-averaged precision, recall, F1 of aligned token pairs, and exact (perfect) match rate for a whole pair, shown in Table 1. [sent-121, score-0.233]
63 As our baselines, GIZA++ (with alignment intersection of two directions) and TED are on par with previously reported results using the Stanford RTE aligner. [sent-122, score-0.363]
64 The MANLI-family of systems provide stronger baselines, notably MANLIconstraint, which has the best F1 and exact match rate among themselves. [sent-123, score-0.151]
65 We ran our aligner in two directions: S2T and T2S, then merged the results with INTERSECTION, UNION and GROW-DIAG-FINAL. [sent-124, score-0.354]
66 Systems marked with ∗ are rfeorpo erxteadc t b (yp MerfaeccCt)ar mtnaetcy het r aatl. [sent-175, score-0.058]
67 Imbalance of exact match rate between S2T and T2S with a difference of 9. [sent-180, score-0.151]
68 When aligning from source (longer) to target (shorter), multiple source words can align to the same target word. [sent-182, score-0.494]
69 This is not desirable since multiple duplicate “light” words are aligned to the same “light” word in the target, which breaks perfect match. [sent-183, score-0.17]
70 When aligning T2S, this problem goes away: the shorter target sentence contains less duplicate words, and in most cases there is an one-to-one mapping. [sent-184, score-0.263]
71 5 Runtime Test Table 2 shows the runtime comparison. [sent-187, score-0.073]
72 Since the RTE2 corpus is imbalanced, with premise length (words) of 29 and hypothesis length of 11, we also compare on the corpus of FUSION (McKeown et al. [sent-188, score-0.251]
73 is the slowest, with quadratic growth in the number of edits with sen- tence length. [sent-191, score-0.052]
74 This work has a precise O(MN2) decoding time, with M the source sen- tence length and N the target sentence length. [sent-193, score-0.364]
75 096 Table 2: Alignment runtime in seconds per sentence pair on two corpora: RTE2 (Cohn et al. [sent-203, score-0.192]
76 The runtime for this work takes the longest timing from S2T and T2S, on a Xeon 2. [sent-208, score-0.073]
77 2GHz with 4MB cache (the closest we can find to match their hardware). [sent-209, score-0.071]
78 Horizontally in a realworld application where sentences have similar length, this work is roughly 20x faster (0. [sent-210, score-0.09]
79 Vertically, the decoding time for our work increases less dramatically when sentence length increases (0. [sent-214, score-0.153]
80 , our aligner is at least another twenty-fold faster than MANLI-exact when the sentences are longer and balanced. [sent-238, score-0.444]
81 We also benefit from shallower pre-processing (no parsing) and can store all resources in main memory. [sent-239, score-0.044]
82 6 Ablation Test Since WordNet and the POS tagger is the only used external resource, we removed them8 from the feature sets and reported performance in Table 3. [sent-241, score-0.111]
83 At this time, the model falls back to relying on string similarities, distortion, positional and contextual features, which are almost language-independent. [sent-243, score-0.113]
84 A loss of less than 1% in F1 suggests that the aligner can still run reasonably well without a POS tagger and WordNet. [sent-244, score-0.424]
85 7WordNet (˜30MB) is a smaller footprint than the 5GB of external resources used by MANLI. [sent-245, score-0.085]
86 When we removed the POS tagger, we enumerated all POS tags for a word to find its hypernym/synonym/. [sent-248, score-0.05]
87 Token-based paraphrases that are not covered by WordNet, such as program and software, business and venture. [sent-254, score-0.052]
88 larger, noisier resources in exchange of higher precision vs. [sent-263, score-0.044]
89 We think this is an application-specific decision; other resources could be easily incorporated into our model, which we may explore in the future to explore the trade-off in addressing items 1 and 2. [sent-265, score-0.044]
90 5 Conclusion We presented a model for monolingual sentence alignment that gives state-of-the-art performance, and is significantly faster than prior work. [sent-266, score-0.552]
91 We release our implementation as the first open-source monolingual aligner, which we hope to be of benefit to other researchers in the rapidly expanding area of natural language inference. [sent-267, score-0.135]
92 html) in the supporting material that compares the gold alignment and test output; readers are encouraged to try it out. [sent-276, score-0.292]
93 Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. [sent-332, score-0.206]
94 Automatic cost estimation for tree edit distance using particle swarm optimization. [sent-368, score-0.323]
95 Aligning predicates across monolingual comparable texts using graph-based clustering. [sent-381, score-0.135]
96 Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. [sent-390, score-0.091]
97 Simple fast algorithms for the editing distance between trees and related problems. [sent-396, score-0.036]
wordName wordTfidf (topN-words)
[('aligner', 0.354), ('alignment', 0.292), ('thadani', 0.291), ('manli', 0.238), ('mckeown', 0.173), ('monolingual', 0.135), ('maccartney', 0.134), ('edit', 0.125), ('pos', 0.121), ('premise', 0.116), ('rte', 0.113), ('wordnet', 0.113), ('kouylekov', 0.105), ('fusion', 0.097), ('cost', 0.095), ('aligning', 0.093), ('target', 0.09), ('faster', 0.09), ('fires', 0.087), ('punyakanok', 0.087), ('align', 0.083), ('aligned', 0.082), ('exact', 0.08), ('symmetrization', 0.08), ('ilp', 0.078), ('cohn', 0.073), ('runtime', 0.073), ('heilman', 0.071), ('positional', 0.071), ('intersection', 0.071), ('match', 0.071), ('tagger', 0.07), ('decoding', 0.069), ('source', 0.069), ('tree', 0.067), ('calls', 0.065), ('abs', 0.065), ('substances', 0.065), ('vulcan', 0.065), ('blunsom', 0.064), ('hamming', 0.061), ('chambers', 0.06), ('indices', 0.058), ('het', 0.058), ('brockett', 0.058), ('kapil', 0.058), ('sorensen', 0.058), ('consecutive', 0.056), ('distortion', 0.056), ('xeon', 0.056), ('ted', 0.055), ('dice', 0.053), ('paraphrases', 0.052), ('marneffe', 0.052), ('tence', 0.052), ('tags', 0.05), ('bansal', 0.05), ('length', 0.049), ('null', 0.048), ('roth', 0.048), ('denominator', 0.047), ('conditional', 0.047), ('smith', 0.046), ('union', 0.046), ('gimpel', 0.046), ('textual', 0.046), ('light', 0.046), ('stanford', 0.046), ('magnini', 0.045), ('duplicate', 0.045), ('entailment', 0.045), ('giza', 0.044), ('johns', 0.044), ('resources', 0.044), ('design', 0.044), ('aligns', 0.043), ('denero', 0.043), ('perfect', 0.043), ('per', 0.043), ('string', 0.042), ('mccallum', 0.042), ('distinctions', 0.041), ('seconds', 0.041), ('wo', 0.041), ('external', 0.041), ('decode', 0.039), ('baselines', 0.039), ('features', 0.039), ('state', 0.038), ('hypothesis', 0.037), ('chris', 0.037), ('crfs', 0.037), ('kathleen', 0.037), ('distance', 0.036), ('hopkins', 0.036), ('sentence', 0.035), ('recognizing', 0.035), ('acana', 0.035), ('benefiting', 0.035), ('ctoes', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.
2 0.28169578 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
3 0.17322887 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning
Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
4 0.15428641 267 acl-2013-PARMA: A Predicate Argument Aligner
Author: Travis Wolfe ; Benjamin Van Durme ; Mark Dredze ; Nicholas Andrews ; Charley Beller ; Chris Callison-Burch ; Jay DeYoung ; Justin Snyder ; Jonathan Weese ; Tan Xu ; Xuchen Yao
Abstract: We introduce PARMA, a system for crossdocument, semantic predicate and argument alignment. Our system combines a number of linguistic resources familiar to researchers in areas such as recognizing textual entailment and question answering, integrating them into a simple discriminative model. PARMA achieves state of the art results on an existing and a new dataset. We suggest that previous efforts have focussed on data that is biased and too easy, and we provide a more difficult dataset based on translation data with a low baseline which we beat by 17% F1.
5 0.14201215 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
7 0.12934071 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
8 0.12325139 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
9 0.11681475 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
10 0.11414399 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
11 0.11307477 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
12 0.10998521 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
13 0.10890546 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.10732938 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
15 0.10683675 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment
16 0.10382136 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
17 0.098161317 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text
18 0.096610725 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
19 0.095424369 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
20 0.093918785 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
topicId topicWeight
[(0, 0.264), (1, -0.102), (2, 0.055), (3, 0.008), (4, -0.016), (5, 0.001), (6, -0.107), (7, -0.036), (8, -0.01), (9, -0.062), (10, 0.035), (11, -0.147), (12, -0.024), (13, -0.087), (14, 0.055), (15, 0.045), (16, 0.133), (17, 0.035), (18, 0.031), (19, -0.137), (20, -0.076), (21, -0.012), (22, 0.024), (23, 0.011), (24, -0.011), (25, 0.116), (26, -0.151), (27, 0.026), (28, 0.04), (29, -0.011), (30, -0.136), (31, 0.118), (32, -0.064), (33, 0.013), (34, -0.074), (35, -0.029), (36, 0.021), (37, 0.046), (38, 0.042), (39, -0.002), (40, -0.032), (41, 0.043), (42, -0.005), (43, 0.051), (44, 0.064), (45, -0.012), (46, 0.027), (47, -0.17), (48, 0.089), (49, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.94944388 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.
2 0.89639777 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
3 0.78051031 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
Author: Jose G.C. de Souza ; Miquel Espla-Gomis ; Marco Turchi ; Matteo Negri
Abstract: The use of automatic word alignment to capture sentence-level semantic relations is common to a number of cross-lingual NLP applications. Despite its proved usefulness, however, word alignment information is typically considered from a quantitative point of view (e.g. the number of alignments), disregarding qualitative aspects (the importance of aligned terms). In this paper we demonstrate that integrating qualitative information can bring significant performance improvements with negligible impact on system complexity. Focusing on the cross-lingual textual en- tailment task, we contribute with a novel method that: i) significantly outperforms the state of the art, and ii) is portable, with limited loss in performance, to language pairs where training data are not available.
4 0.7674762 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment
Author: Qun Liu ; Zhaopeng Tu ; Shouxun Lin
Abstract: In this paper, we propose a novel compact representation called weighted bipartite hypergraph to exploit the fertility model, which plays a critical role in word alignment. However, estimating the probabilities of rules extracted from hypergraphs is an NP-complete problem, which is computationally infeasible. Therefore, we propose a divide-and-conquer strategy by decomposing a hypergraph into a set of independent subhypergraphs. The experiments show that our approach outperforms both 1-best and n-best alignments.
5 0.76524884 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning
Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
6 0.76213181 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
7 0.66856402 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
8 0.637716 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model
9 0.60308707 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
10 0.596008 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
11 0.56134737 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
12 0.55077058 267 acl-2013-PARMA: A Predicate Argument Aligner
13 0.54604995 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
14 0.52449048 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
15 0.49447227 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
16 0.48860079 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
17 0.48074558 382 acl-2013-Variational Inference for Structured NLP Models
18 0.4773356 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
19 0.46792412 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
20 0.46576247 240 acl-2013-Microblogs as Parallel Corpora
topicId topicWeight
[(0, 0.081), (6, 0.079), (11, 0.067), (15, 0.014), (24, 0.037), (26, 0.073), (28, 0.018), (35, 0.074), (42, 0.064), (48, 0.061), (70, 0.059), (84, 0.171), (88, 0.036), (90, 0.024), (95, 0.085)]
simIndex simValue paperId paperTitle
1 0.87553346 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory
Author: Mei Tu ; Yu Zhou ; Chengqing Zong
Abstract: Rhetorical structure theory (RST) is widely used for discourse understanding, which represents a discourse as a hierarchically semantic structure. In this paper, we propose a novel translation framework with the help of RST. In our framework, the translation process mainly includes three steps: 1) Source RST-tree acquisition: a source sentence is parsed into an RST tree; 2) Rule extraction: translation rules are extracted from the source tree and the target string via bilingual word alignment; 3) RST-based translation: the source RST-tree is translated with translation rules. Experiments on Chinese-to-English show that our RST-based approach achieves improvements of 2.3/0.77/1.43 BLEU points on NIST04/NIST05/CWMT2008 respectively. 1
same-paper 2 0.85427469 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.
3 0.85282588 297 acl-2013-Recognizing Partial Textual Entailment
Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych
Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.
4 0.81015927 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain
Author: Marine Carpuat ; Hal Daume III ; Katharine Henry ; Ann Irvine ; Jagadeesh Jagarlamudi ; Rachel Rudinger
Abstract: Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SENSESPOTTING, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.
5 0.76096469 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
Author: Ulle Endriss ; Raquel Fernandez
Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.
6 0.75854123 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
7 0.75756246 333 acl-2013-Summarization Through Submodularity and Dispersion
8 0.75146449 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
9 0.75074941 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
10 0.74990588 275 acl-2013-Parsing with Compositional Vector Grammars
11 0.74984533 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
12 0.74868923 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
13 0.74642837 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
14 0.74596256 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
15 0.7449919 318 acl-2013-Sentiment Relevance
16 0.7446357 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
17 0.74393696 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
18 0.74383229 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
19 0.7424944 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
20 0.74227786 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation