acl acl2013 acl2013-9 knowledge-graph by maker-knowledge-mining

9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

Source: pdf

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao and Benjamin Van Durme Johns Hopkins University Baltimore, MD, USA Chris Callison-Burch∗ University of Pennsylvania Philadelphia, PA, USA Abstract Fast alignment is essential for many natural language tasks. [sent-1, score-0.292]

2 But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. [sent-2, score-0.296]

3 We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. [sent-3, score-1.018]

4 Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system. [sent-4, score-0.535]

5 1 Introduction In statistical machine translation, alignment is typically done as a one-off task during training. [sent-5, score-0.292]

6 However for monolingual tasks, like recognizing textual entailment or question answering, alignment happens repeatedly: once or multiple times per test item. [sent-6, score-0.596]

7 Therefore, the efficiency of the aligner is of utmost importance for monolingual alignment tasks. [sent-7, score-0.781]

8 These distinctions suggest a model design that utilizes arbitrary features (to make use of word similarity measure and lexical resources) and exploits deeper sentence structures (especially in the case of major languages where robust parsers are available). [sent-9, score-0.159]

9 , 2008), used roughly 5GB of lexical resources and took 2 seconds per alignment, making it hard to be deployed and run in large scale. [sent-14, score-0.128]

10 2) is able to align 10, 000 pairs per second wh§e4n. [sent-18, score-0.126]

11 2 t)h ies sse anbtleen ctoes a are pre-parsed, a birust with significantly reduced performance. [sent-19, score-0.035]

12 Trying to embrace the merits of both worlds, we introduce a discriminative aligner that is able to align tens to hundreds of sentence pairs per second, and needs access only to a POS tagger and WordNet. [sent-20, score-0.585]

13 This aligner gives state-of-the-art performance on the MSR RTE2 alignment dataset (Brockett, 2007), is faster than previous work, and we release it publicly as the first open-source monolingual word aligner: Jacana. [sent-21, score-0.871]

14 1 2 Related Work The MANLI aligner (MacCartney et al. [sent-23, score-0.354]

15 , 2008) was first proposed to align premise and hypothesis sentences for the task of natural language inference. [sent-24, score-0.236]

16 It applies perceptron learning and handles phrase-based alignment of arbitrary phrase lengths. [sent-25, score-0.292]

17 Thadani and McKeown (201 1) optimized this model by decoding via Integer Linear Programming (ILP). [sent-26, score-0.069]

18 With extra syntactic constraints added, the exact alignment match rate for whole sentence pairs was also significantly improved. [sent-28, score-0.478]

19 Heilman and Smith (2010) used tree kernels to search for the alignment that 1http : / / code . [sent-32, score-0.359]

20 com/p / j acana / 702 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-34, score-0.035]

21 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 702–707, yields the lowest tree edit distance. [sent-36, score-0.192]

22 Other tree or graph matching work for alignment includes that of (Punyakanok et al. [sent-37, score-0.359]

23 Finally, feature and model design in monolingual alignment is often inspired by bilingual work, including distortion modeling, phrasal alignment, syntactic constraints, etc (Och and Ney, 2003; DeNero and Klein, 2007; Bansal et al. [sent-40, score-0.527]

24 1 Model Design Our work is heavily influenced by the bilingual alignment literature, especially the discriminative model proposed by Blunsom and Cohn (2006). [sent-43, score-0.292]

25 Given a source sentence s of length M, and a target sentence t of length N, the alignment from s to t is a sequence of target word indices a, where ∈ [0, N] . [sent-44, score-0.767]

26 We specify that when am = 0, source wo∈rd [ st i]s. [sent-45, score-0.069]

27 This models a many-to-one alignment from source to target. [sent-49, score-0.361]

28 Multiple source words can be aligned to the same target word, but not vice versa. [sent-50, score-0.241]

29 One-to-many alignment can be obtained by running the aligner in the other direction. [sent-51, score-0.646]

30 The probability of alignment sequence a conditioned on both s and t is then: am∈[1,M] p(a | s,t) =exp(Pm,kλkZfk(s(a,tm)−1,am,s,t)) This assumes a first-order Conditional Random Field (Lafferty et al. [sent-52, score-0.292]

31 Instead of directly optimizing F1, we employ softmax-margin training (Gimpel and Smith, 2010) and add a cost function to the normalizing function Z(s, t) in the denominator, which becomes: Xexp(Xλkfk( aˆm−1, aˆm,s,t) + cost(at, aˆ)) Xa Xm,k where at is the true alignments. [sent-55, score-0.095]

32 It is only computed during training in the denominator because cost(at , at) = 0 in the numerator. [sent-57, score-0.047]

33 One distinction of this alignment model compared to other commonly defined CRFs is that the input is two dimensional: at each position m, the model inspects both the entire sequence of source words (as the observation) and target words (whose offset indices are states). [sent-59, score-0.509]

34 The other distinction is that the size of its state space is not fixed (e. [sent-60, score-0.038]

35 , unlike POS tagging, where states are for instance 45 Penn Treebank tags), but depends on N, the length of target sentence. [sent-62, score-0.139]

36 Thus we can not “memorize” what features are mostly associated with what states. [sent-63, score-0.039]

37 For instance, in the task of tagging mail addresses, a feature of “5 consecutive digits” is highly indicative of a POSTCODE. [sent-64, score-0.056]

38 However, in the alignment model, it does not make sense to design features based on a hard-coded state, say, a feature of “source word lemma match- ing target word lemma” fires for state index 6. [sent-65, score-0.59]

39 To avoid this data sparsity problem, all features are defined implicitly with respect to the state. [sent-66, score-0.039]

40 For instance: fk(am−1,am,s,t) =(01 loethmemrwasis ematch: sm,tam Thus this feature fires for, e. [sent-67, score-0.087]

41 2 Also, two binary features are added for identical match and identical match ignoring case. [sent-72, score-0.181]

42 POS Tags Features are binary indicators of whether the POS tags of two words match. [sent-73, score-0.05]

43 Also, a “possrc2postgt” feature fires for each word pair, with respect to their POS tags. [sent-74, score-0.087]

44 , “vbz2nn”, when a verb such as arrests aligns with a noun such as custody. [sent-77, score-0.043]

45 Positional Feature is a real-valued feature for the positional difference of the source and target word (abs(Mm −aNm)). [sent-78, score-0.23]

46 3 Distortion Features measure how far apart the aligned target words of two consecutive source words are: abs(am 1 − am−1) . [sent-81, score-0.297]

47 This learns a general pattern of w+het 1he −r t haese two target words aligned with two consecutive source words are usually far away from each other, or very close. [sent-82, score-0.297]

48 We also added special features for corner cases where the current word starts or ends the source sentence, or both the previous and current words + are deleted (a transition from NULL to NULL). [sent-83, score-0.108]

49 Contextual Features indicate whether the left or the right neighbor of the source word and aligned target word are identical or similar. [sent-84, score-0.241]

50 This helps especially when aligning functional words, which usually have multiple candidate target functional words to align to and string similarity features cannot help. [sent-85, score-0.347]

51 We also added features for neighboring POS tags matching. [sent-86, score-0.089]

52 3 Symmetrization To expand from many-to-one alignment to manyto-many, we ran the model in both directions and applied the following symmetrization heuristics (Koehn, 2010): INTERSECTION, UNION, GROWDIAG-FINAL. [sent-88, score-0.372]

53 1 Setup Since no generic off-the-shelf CRF software is designed to handle the special case of dynamic state indices and feature functions (Blunsom and Cohn, 2006), we implemented this aligner model in the Scala programming language, which is fully interoperable with Java. [sent-90, score-0.45]

54 OpenNLP4 provided the POS tagger and JWNL5 interfaced with WordNet (Fellbaum, 1998). [sent-92, score-0.07]

55 Training and test data (Brockett, 2007) each contains 800 manually aligned premise and hypothesis pairs from RTE2. [sent-95, score-0.235]

56 We take the premise as the source and hypothesis as the target, and use S2T to indicate the model aligns from 3We found that each word has to be POS tagged to get an accurate relation, otherwise this feature will not help. [sent-97, score-0.265]

57 net / source to target and T2S from target to source. [sent-102, score-0.249]

58 One was GIZA++, with the IN- TERSECTION tricks post-applied, which worked the best among all other symmetrization heuristics. [sent-105, score-0.08]

59 We used uniform cost for deletion, insertion and substitutions, and applied a dynamic program algorithm (Zhang and Shasha, 1989) to decode the tree edit sequence with the minimal cost, based on the Stanford dependency tree (De Marneffe and Manning, 2008). [sent-108, score-0.393]

60 This non-probabilistic approach turned out to be extremely fast, processing about 10,000 sentence pairs per second with preparsed trees, performing quantitatively better than the Stanford RTE aligner (Chambers et al. [sent-109, score-0.432]

61 (2008), and then improved by Thadani and McKeown (201 1) with faster and exact decoding via ILP. [sent-113, score-0.239]

62 4 Results Following Thadani and McKeown (201 1), performance is evaluated by macro-averaged precision, recall, F1 of aligned token pairs, and exact (perfect) match rate for a whole pair, shown in Table 1. [sent-121, score-0.233]

63 As our baselines, GIZA++ (with alignment intersection of two directions) and TED are on par with previously reported results using the Stanford RTE aligner. [sent-122, score-0.363]

64 The MANLI-family of systems provide stronger baselines, notably MANLIconstraint, which has the best F1 and exact match rate among themselves. [sent-123, score-0.151]

65 We ran our aligner in two directions: S2T and T2S, then merged the results with INTERSECTION, UNION and GROW-DIAG-FINAL. [sent-124, score-0.354]

66 Systems marked with ∗ are rfeorpo erxteadc t b (yp MerfaeccCt)ar mtnaetcy het r aatl. [sent-175, score-0.058]

67 Imbalance of exact match rate between S2T and T2S with a difference of 9. [sent-180, score-0.151]

68 When aligning from source (longer) to target (shorter), multiple source words can align to the same target word. [sent-182, score-0.494]

69 This is not desirable since multiple duplicate “light” words are aligned to the same “light” word in the target, which breaks perfect match. [sent-183, score-0.17]

70 When aligning T2S, this problem goes away: the shorter target sentence contains less duplicate words, and in most cases there is an one-to-one mapping. [sent-184, score-0.263]

71 5 Runtime Test Table 2 shows the runtime comparison. [sent-187, score-0.073]

72 Since the RTE2 corpus is imbalanced, with premise length (words) of 29 and hypothesis length of 11, we also compare on the corpus of FUSION (McKeown et al. [sent-188, score-0.251]

73 is the slowest, with quadratic growth in the number of edits with sen- tence length. [sent-191, score-0.052]

74 This work has a precise O(MN2) decoding time, with M the source sen- tence length and N the target sentence length. [sent-193, score-0.364]

75 096 Table 2: Alignment runtime in seconds per sentence pair on two corpora: RTE2 (Cohn et al. [sent-203, score-0.192]

76 The runtime for this work takes the longest timing from S2T and T2S, on a Xeon 2. [sent-208, score-0.073]

77 2GHz with 4MB cache (the closest we can find to match their hardware). [sent-209, score-0.071]

78 Horizontally in a realworld application where sentences have similar length, this work is roughly 20x faster (0. [sent-210, score-0.09]

79 Vertically, the decoding time for our work increases less dramatically when sentence length increases (0. [sent-214, score-0.153]

80 , our aligner is at least another twenty-fold faster than MANLI-exact when the sentences are longer and balanced. [sent-238, score-0.444]

81 We also benefit from shallower pre-processing (no parsing) and can store all resources in main memory. [sent-239, score-0.044]

82 6 Ablation Test Since WordNet and the POS tagger is the only used external resource, we removed them8 from the feature sets and reported performance in Table 3. [sent-241, score-0.111]

83 At this time, the model falls back to relying on string similarities, distortion, positional and contextual features, which are almost language-independent. [sent-243, score-0.113]

84 A loss of less than 1% in F1 suggests that the aligner can still run reasonably well without a POS tagger and WordNet. [sent-244, score-0.424]

85 7WordNet (˜30MB) is a smaller footprint than the 5GB of external resources used by MANLI. [sent-245, score-0.085]

86 When we removed the POS tagger, we enumerated all POS tags for a word to find its hypernym/synonym/. [sent-248, score-0.05]

87 Token-based paraphrases that are not covered by WordNet, such as program and software, business and venture. [sent-254, score-0.052]

88 larger, noisier resources in exchange of higher precision vs. [sent-263, score-0.044]

89 We think this is an application-specific decision; other resources could be easily incorporated into our model, which we may explore in the future to explore the trade-off in addressing items 1 and 2. [sent-265, score-0.044]

90 5 Conclusion We presented a model for monolingual sentence alignment that gives state-of-the-art performance, and is significantly faster than prior work. [sent-266, score-0.552]

91 We release our implementation as the first open-source monolingual aligner, which we hope to be of benefit to other researchers in the rapidly expanding area of natural language inference. [sent-267, score-0.135]

92 html) in the supporting material that compares the gold alignment and test output; readers are encouraged to try it out. [sent-276, score-0.292]

93 Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. [sent-332, score-0.206]

94 Automatic cost estimation for tree edit distance using particle swarm optimization. [sent-368, score-0.323]

95 Aligning predicates across monolingual comparable texts using graph-based clustering. [sent-381, score-0.135]

96 Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. [sent-390, score-0.091]

97 Simple fast algorithms for the editing distance between trees and related problems. [sent-396, score-0.036]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('aligner', 0.354), ('alignment', 0.292), ('thadani', 0.291), ('manli', 0.238), ('mckeown', 0.173), ('monolingual', 0.135), ('maccartney', 0.134), ('edit', 0.125), ('pos', 0.121), ('premise', 0.116), ('rte', 0.113), ('wordnet', 0.113), ('kouylekov', 0.105), ('fusion', 0.097), ('cost', 0.095), ('aligning', 0.093), ('target', 0.09), ('faster', 0.09), ('fires', 0.087), ('punyakanok', 0.087), ('align', 0.083), ('aligned', 0.082), ('exact', 0.08), ('symmetrization', 0.08), ('ilp', 0.078), ('cohn', 0.073), ('runtime', 0.073), ('heilman', 0.071), ('positional', 0.071), ('intersection', 0.071), ('match', 0.071), ('tagger', 0.07), ('decoding', 0.069), ('source', 0.069), ('tree', 0.067), ('calls', 0.065), ('abs', 0.065), ('substances', 0.065), ('vulcan', 0.065), ('blunsom', 0.064), ('hamming', 0.061), ('chambers', 0.06), ('indices', 0.058), ('het', 0.058), ('brockett', 0.058), ('kapil', 0.058), ('sorensen', 0.058), ('consecutive', 0.056), ('distortion', 0.056), ('xeon', 0.056), ('ted', 0.055), ('dice', 0.053), ('paraphrases', 0.052), ('marneffe', 0.052), ('tence', 0.052), ('tags', 0.05), ('bansal', 0.05), ('length', 0.049), ('null', 0.048), ('roth', 0.048), ('denominator', 0.047), ('conditional', 0.047), ('smith', 0.046), ('union', 0.046), ('gimpel', 0.046), ('textual', 0.046), ('light', 0.046), ('stanford', 0.046), ('magnini', 0.045), ('duplicate', 0.045), ('entailment', 0.045), ('giza', 0.044), ('johns', 0.044), ('resources', 0.044), ('design', 0.044), ('aligns', 0.043), ('denero', 0.043), ('perfect', 0.043), ('per', 0.043), ('string', 0.042), ('mccallum', 0.042), ('distinctions', 0.041), ('seconds', 0.041), ('wo', 0.041), ('external', 0.041), ('decode', 0.039), ('baselines', 0.039), ('features', 0.039), ('state', 0.038), ('hypothesis', 0.037), ('chris', 0.037), ('crfs', 0.037), ('kathleen', 0.037), ('distance', 0.036), ('hopkins', 0.036), ('sentence', 0.035), ('recognizing', 0.035), ('acana', 0.035), ('benefiting', 0.035), ('ctoes', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

2 0.28169578 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning

Author: Xiaojun Quan ; Chunyu Kit ; Yan Song

Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.

3 0.17322887 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning

Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.

4 0.15428641 267 acl-2013-PARMA: A Predicate Argument Aligner

Author: Travis Wolfe ; Benjamin Van Durme ; Mark Dredze ; Nicholas Andrews ; Charley Beller ; Chris Callison-Burch ; Jay DeYoung ; Justin Snyder ; Jonathan Weese ; Tan Xu ; Xuchen Yao

Abstract: We introduce PARMA, a system for crossdocument, semantic predicate and argument alignment. Our system combines a number of linguistic resources familiar to researchers in areas such as recognizing textual entailment and question answering, integrating them into a simple discriminative model. PARMA achieves state of the art results on an existing and a new dataset. We suggest that previous efforts have focussed on data that is biased and too easy, and we provide a more difficult dataset based on translation data with a low baseline which we beat by 17% F1.

5 0.14201215 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

6 0.1339166 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

7 0.12934071 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

8 0.12325139 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

9 0.11681475 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

10 0.11414399 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections

11 0.11307477 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

12 0.10998521 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

13 0.10890546 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

14 0.10732938 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

15 0.10683675 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment

16 0.10382136 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

17 0.098161317 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text

18 0.096610725 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment

19 0.095424369 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl

20 0.093918785 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.264), (1, -0.102), (2, 0.055), (3, 0.008), (4, -0.016), (5, 0.001), (6, -0.107), (7, -0.036), (8, -0.01), (9, -0.062), (10, 0.035), (11, -0.147), (12, -0.024), (13, -0.087), (14, 0.055), (15, 0.045), (16, 0.133), (17, 0.035), (18, 0.031), (19, -0.137), (20, -0.076), (21, -0.012), (22, 0.024), (23, 0.011), (24, -0.011), (25, 0.116), (26, -0.151), (27, 0.026), (28, 0.04), (29, -0.011), (30, -0.136), (31, 0.118), (32, -0.064), (33, 0.013), (34, -0.074), (35, -0.029), (36, 0.021), (37, 0.046), (38, 0.042), (39, -0.002), (40, -0.032), (41, 0.043), (42, -0.005), (43, 0.051), (44, 0.064), (45, -0.012), (46, 0.027), (47, -0.17), (48, 0.089), (49, 0.09)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94944388 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

2 0.89639777 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning

Author: Xiaojun Quan ; Chunyu Kit ; Yan Song

3 0.78051031 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

Author: Jose G.C. de Souza ; Miquel Espla-Gomis ; Marco Turchi ; Matteo Negri

Abstract: The use of automatic word alignment to capture sentence-level semantic relations is common to a number of cross-lingual NLP applications. Despite its proved usefulness, however, word alignment information is typically considered from a quantitative point of view (e.g. the number of alignments), disregarding qualitative aspects (the importance of aligned terms). In this paper we demonstrate that integrating qualitative information can bring significant performance improvements with negligible impact on system complexity. Focusing on the cross-lingual textual en- tailment task, we contribute with a novel method that: i) significantly outperforms the state of the art, and ii) is portable, with limited loss in performance, to language pairs where training data are not available.

4 0.7674762 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment

Author: Qun Liu ; Zhaopeng Tu ; Shouxun Lin

Abstract: In this paper, we propose a novel compact representation called weighted bipartite hypergraph to exploit the fertility model, which plays a critical role in word alignment. However, estimating the probabilities of rules extracted from hypergraphs is an NP-complete problem, which is computationally infeasible. Therefore, we propose a divide-and-conquer strategy by decomposing a hypergraph into a set of independent subhypergraphs. The experiments show that our approach outperforms both 1-best and n-best alignments.

5 0.76524884 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition

Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning

6 0.76213181 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment

7 0.66856402 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration

8 0.637716 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model

9 0.60308707 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

10 0.596008 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

11 0.56134737 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics

12 0.55077058 267 acl-2013-PARMA: A Predicate Argument Aligner

13 0.54604995 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections

14 0.52449048 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

15 0.49447227 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

16 0.48860079 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

17 0.48074558 382 acl-2013-Variational Inference for Structured NLP Models

18 0.4773356 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl

19 0.46792412 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?

20 0.46576247 240 acl-2013-Microblogs as Parallel Corpora

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.081), (6, 0.079), (11, 0.067), (15, 0.014), (24, 0.037), (26, 0.073), (28, 0.018), (35, 0.074), (42, 0.064), (48, 0.061), (70, 0.059), (84, 0.171), (88, 0.036), (90, 0.024), (95, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87553346 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory

Author: Mei Tu ; Yu Zhou ; Chengqing Zong

Abstract: Rhetorical structure theory (RST) is widely used for discourse understanding, which represents a discourse as a hierarchically semantic structure. In this paper, we propose a novel translation framework with the help of RST. In our framework, the translation process mainly includes three steps: 1) Source RST-tree acquisition: a source sentence is parsed into an RST tree; 2) Rule extraction: translation rules are extracted from the source tree and the target string via bilingual word alignment; 3) RST-based translation: the source RST-tree is translated with translation rules. Experiments on Chinese-to-English show that our RST-based approach achieves improvements of 2.3/0.77/1.43 BLEU points on NIST04/NIST05/CWMT2008 respectively. 1

same-paper 2 0.85427469 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

3 0.85282588 297 acl-2013-Recognizing Partial Textual Entailment

Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych

Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.

4 0.81015927 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain

Author: Marine Carpuat ; Hal Daume III ; Katharine Henry ; Ann Irvine ; Jagadeesh Jagarlamudi ; Rachel Rudinger

Abstract: Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SENSESPOTTING, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.

5 0.76096469 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

Author: Ulle Endriss ; Raquel Fernandez

Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.

6 0.75854123 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

7 0.75756246 333 acl-2013-Summarization Through Submodularity and Dispersion

8 0.75146449 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

9 0.75074941 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

10 0.74990588 275 acl-2013-Parsing with Compositional Vector Grammars

11 0.74984533 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

12 0.74868923 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

13 0.74642837 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

14 0.74596256 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

15 0.7449919 318 acl-2013-Sentiment Relevance

16 0.7446357 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

17 0.74393696 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

18 0.74383229 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning

19 0.7424944 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

20 0.74227786 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation