emnlp emnlp2013 emnlp2013-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.
Reference: text
sentIndex sentText sentNum sentScore
1 Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. [sent-2, score-1.24]
2 Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art. [sent-3, score-1.074]
3 Even though most of these tasks involve only a single language, alignment research has primarily focused on the bilingual setting (i. [sent-5, score-0.402]
4 1In this paper we use the term token-based alignment for one-to-one alignment and phrase-based for non one-to-one alignment, and word alignment in general for both. [sent-11, score-1.206]
5 590 Peter Clark Allen Institute for Artificial Intelligence Seattle, WA, USA Most token-based alignment models can extrin- sically handle phrase-based alignment to some extent. [sent-12, score-0.804]
6 The problem is more prominent when aligning phrasal paraphrases or multiword expressions, such as pas s away and kick the bucket . [sent-17, score-0.524]
7 , 2001) to align tokens from the source sentence to tokens in the target sentence, by treating source tokens as “observation” and target tokens as “hidden states”. [sent-22, score-0.529]
8 We extend this model by introducing semiMarkov states for phrase-based alignment: a state can instead span multiple consecutive time steps, thus aligning phrases on the source side. [sent-26, score-0.517]
9 Also, we merge phrases on the target side to phrasal states, allowing the model to align phrases on the target side as well. [sent-27, score-0.687]
10 2 2 Related Work Most work in monolingual alignment employs dependency tree/graph matching algorithms, including tree edit distance (Punyakanok et al. [sent-32, score-0.514]
11 These works inherently only support token-based alignment, with phrase-like alignment achieved by first merging tokens to phrases as a preprocessing step. [sent-36, score-0.489]
12 It applies discriminative perceptron learning with various features and handles phrase-based alignment of arbitrary phrase lengths. [sent-40, score-0.493]
13 Also, various syntactic constraints can be easily added, significantly improving exact alignment match rate for whole sentence pairs. [sent-43, score-0.402]
14 Besides the common application of textual entailment and question answering, monolingual alignment has also been applied in the field of text generation (Barzilay and Lee, 2003; Pang et al. [sent-44, score-0.62]
15 Word alignment has been more explored in machine translation. [sent-46, score-0.402]
16 Phrase-based MT historically relied on heuristics (Koehn, 2010) to merge two sets of word alignment in opposite directions to yield phrasal alignment. [sent-49, score-0.738]
17 Deng and Byrne (2008) explored token-to-phrase alignment based on HMM models (Vogel et al. [sent-55, score-0.402]
18 However, the token-to-phrase alignment is only in one direction: each target state still only spans one source word, and thus alignment on the source side is limited to tokens. [sent-57, score-1.187]
19 (201 1) unified the HSMM models with the alignment by agreement framework (Liang et al. [sent-61, score-0.402]
20 , 2006), achieving phrasal alignment that agreed in both directions. [sent-62, score-0.738]
21 Essentially monolingual alignment would benefit more from discriminative models with various feature extractions (just like those defined in MANLI) than generative models without any predefined feature (just like how they were used in bilingual alignment). [sent-64, score-0.478]
22 3 The Alignment Model Our objective is to define a model that supports phrase-based alignment of arbitrary phrase length. [sent-67, score-0.532]
23 In this section we first describe a regular CRF model that supports one-to-one token-based alignment (Blunsom and Cohn, 2006; Yao et al. [sent-68, score-0.441]
24 , 2013a), then extend it to phrase-based alignment with the semi-Markov model. [sent-69, score-0.402]
25 1 Token-based Model Given a source sentence s of length M, and a target sentence t of length N, the alignment from s to t is a sequence of target word indices a, where ai∈[1,M] ∈ [0, N] . [sent-71, score-0.678]
26 This models a many-to-one alignment from source to target: multiple source words can be aligned to the same target word, but not vice versa. [sent-75, score-0.718]
27 One-to-many alignment can be obtained by running the aligner in the other direction. [sent-76, score-0.798]
28 The probability of alignment sequence a conditioned on both s and t is then: p(a | s,t) =exp(Pi,kλkZf(ks(,ati)−1,ai,s,t)) This assumes a first-order Conditional Random Field (Lafferty et al. [sent-77, score-0.402]
29 We first extend it in the direction of ls : 1, where a target state spans ls words on the source side (ls source words align to 1 target word). [sent-88, score-0.896]
30 Then we extend it in the direction of 1 : lt, where lt is the target phrase length a source word aligns to (1 source word aligns to lt target words). [sent-89, score-0.924]
31 Shaded horizontal circles represent the source sentence (Shops are clo sed up for now unt i March) and hollow vertical circles reprel sent the hidden states with state IDs for the target sentence (Shops are temporari ly clo s ed down). [sent-100, score-0.904]
32 state 3 and 15) can span multiple consecutive source words (a semi-Markov property) for aligning phrases on the source side. [sent-104, score-0.52]
33 States with an ID larger than the target sentence length × indicate “phrasal states” (states 6-15 in this example), where consecutive target tokens are merged for aligning phrases on the target side. [sent-105, score-0.483]
34 Combining the semi-Markov property and phrasal states yields for instance, a 2 2 alignment between closed up in tihnset source 2an×d 2c alloig gsnemde dt boewtwn eine nth cel target. [sent-106, score-0.992]
35 Throughout this section we use Figure 1 as an illustrative example, which shows phrasal alignment between the source sentence: (Shops are clo sed up for now unt i March) and the target sentence: (Shops l are temporari ly clo s ed down). [sent-108, score-1.452]
36 1 : 1 alignment is a special case of ls : 1 alignment where the target side state spans ls = 1source word, i. [sent-109, score-1.407]
37 , at each time step i, the source side word si aligns to one state ai and the next aligned state ai+1 only depends on the current state ai. [sent-111, score-0.631]
38 xLs[Vi−ls(a0 +Ψi (a0 , a, ls , s, t)] | s,t) with factor: Ψi(a0,a,ls,s,t) =Xλkfk(ai0−ls,ai,s,t) Xk and the best alignment a can be obtained by backtracking the last state aM from VM(aM | s, t). [sent-139, score-0.705]
39 To extend from 1: 1alignment to 1: lt alignment × × with one source word aligning to lt target words, we simply explode the state space by Lt times with Lt the maximal allowed target phrase length. [sent-143, score-1.421]
40 In this paper we distinguish states by three types: NULL state (j = 0, lt = 0), token state (lt = 1) and phrasal state (lt > 1). [sent-149, score-1.088]
41 Suppose the target phrase tj of length ltj ∈ [1, Lt] holds a position ptj ∈ [1, N], and the source ,wLord si is aligned to this ∈stat [e1 (ptj , ltj ), a tuple frcore (position, span). [sent-151, score-0.555]
42 Thus during decoding, if one output state is 15, we would know that it uniquely comes from the phrasal state (5,2), representing the target phrase clo sed down. [sent-155, score-1.029]
43 Now we have defined separately the ls : 1model and the 1 : lt model. [sent-158, score-0.41]
44 We can simply merge them to have an ls : lt alignment model. [sent-159, score-0.812]
45 The semi-Markov property makes it possible for any target states to align phrases on the source side, while the two dimensional state mapping makes it possible for any source words to align phrases on the target side. [sent-160, score-0.732]
46 For instance, in Figure 1, the phrasal state a15 represents the two-word phrase closed down on the target side, while still spanning for two words on the source side, allowing a 2 2 alignment. [sent-161, score-0.735]
47 Also, we added indicators for mappings between source phrase types and target phrase types, such as “vp2np”, meaning that a verb phrase in the source is mapped to a noun phrase in the target. [sent-169, score-0.604]
48 , 2013) include various similarity scores derived from a paraphrase database with 73 million phrasal and 8 million lexical paraphrases. [sent-171, score-0.495]
49 We found it critical to assign feature values “fairly” among tokens and phrases to make sure that semi-Markov states and phrasal states fire up often enough for phrasal alignments. [sent-207, score-0.925]
50 To illustrate this in a simplified way, take clo sed up↔clo sed down in Figure 1, and assume ethde only fcealtoursee isd th deo wnonrm ina Fli zgeudr en u1m, abnedr a osf- matching tokens in the pair. [sent-213, score-0.5]
51 The desired alignment clo s ed up↔clo sed doTwhne wdeosuirlded n aoltig nhmaveen s curlvoivseedd th uep ↔stactel competition due to its weak feature value. [sent-227, score-0.722]
52 In this case the model would simply prefer a token alignment clo sed↔clo sed and up↔ . [sent-228, score-0.856]
53 Thus we upweighted feature values by the maximum source or target phrase length to encourage phrasal alignments, in this case closed up ↔closed down: 1. [sent-232, score-0.686]
54 Then this alignment would h↔avcel a s beetdter cdhoawnnce:1 t. [sent-234, score-0.402]
55 1 Table 2: Percentage of various alignment sizes (undirectional, e. [sent-259, score-0.402]
56 , 1x2 and 2x1 are merged) after synthesizing phrasal alignment from token alignment in the training portion of two corpora. [sent-261, score-1.274]
57 Statistics shows that single token alignment counts 96% and 95% oftotal alignments in these two corpora separately. [sent-279, score-0.692]
58 With such a heavy imbalance towards only token-based alignment, a phrase-based aligner would learn feature weights that award token alignments more than phrasal alignments. [sent-280, score-0.98]
59 Thus we synthesized phrasal alignments from continuous monotonic token alignments in these two corpora. [sent-281, score-0.738]
60 Then for each phrase pair, if each token in the source phrase is aligned to a token in the target phrase in a monotonic way, and vice versa, we 4http : / /www . [sent-283, score-0.771]
61 pdf merge these alignments to form one single phrasal alignment. [sent-292, score-0.45]
62 6 Table 2 lists the percentage of various alignment sizes after the merge. [sent-293, score-0.402]
63 (2013a) showed that the traditional MT bilingual aligner GIZA++ (Och and Ney, 2003) presented weak results on the task of monolingual alignment. [sent-298, score-0.472]
64 It is designed for the task of monolingual alignment and supports phrasal alignment. [sent-300, score-0.853]
65 , 2012): an improved version of MANLI-constraint that not only models phrasal alignments, but also alignments between dependency arcs, with reported numbers on the original Edinburgh paraphrase corpus. [sent-305, score-0.583]
66 , 2013a): a tokenbased aligner with state-of-the-art performance on MSR06. [sent-307, score-0.457]
67 (2012), we only report the results based on token alignments (which allows a partial credit if their containing phrases are not aligned), even for the phrase-based alignment task. [sent-315, score-0.695]
68 The reasoning is that if a phrase-based aligner is already doing better than a token aligner in terms of token alignment scores, then the difference in terms of phrase alignment scores will be even larger. [sent-316, score-1.981]
69 Thus showing the superiority of token alignment scores is sufficient. [sent-317, score-0.562]
70 3 Implementation and Training The elements in the phrase-based model: dynamic state indices, semi-Markov and phrasal states, are not typically found in standard CRF implementations. [sent-319, score-0.443]
71 4 Results Table 3 gives scores (in bigger fonts) of different aligners on MSR06 and Edinburgh++ and their corresponding phrasal versions. [sent-324, score-0.485]
72 Overall, the tokenbased aligner did the best on the original corpora, in which single token alignment counts more than 95% of total alignment. [sent-325, score-0.993]
73 net / a gap that would make the phrase aligner (85. [sent-336, score-0.487]
74 On the phrasal alignment corpora (represented by MSR06P and EDB++P in Table 3), the phrase-based aligner did significantly better. [sent-339, score-1.176]
75 Note that the overall F1 and exact match rate are still much lower than those scores obtained from the original corpora, suggesting that the phrasal corpora present a much harder task. [sent-340, score-0.404]
76 Furthermore, as a more “fair” comparison between the two aligners, we synthesized phrasal alignments from the output of the tokenbased aligner, just as how the phrased-based corpora were prepared, then evaluated its performance again. [sent-341, score-0.593]
77 Still, on the EDB++P corpus, the token aligner was about 1. [sent-342, score-0.53]
78 nInd tceartmeds bofy i thdeent siucbalalignment, most aligners were able to score more than 90%, but for non-identical alignment there was noticeable decrease. [sent-349, score-0.525]
79 Still, on the phrasal alignment corpora, the phrase-based model has a much larger recall score for non-identical alignment than others. [sent-350, score-1.14]
80 We also divided scores with respect to token-only alignment and phrase-only alignment. [sent-351, score-0.428]
81 Meteor and the token aligner inherently have either very limited or no support for phrasal alignment, thus they had very low scores on phrase-only alignment. [sent-353, score-0.892]
82 4 MANLI-jnt* token phrase Meteor token phrase 80. [sent-429, score-0.45]
83 6 Table 3: Results on original (mostly token) and phrasal (P) alignment corpora, where (x%) indicates how much alignment is identical alignment, such as New↔New. [sent-495, score-1.14]
84 a nE%ds for corresponding scores for “identical” alignment and n for “non-identical”. [sent-498, score-0.428]
85 5 Applications Natural language alignment can be applied to various NLP tasks. [sent-502, score-0.402]
86 6 Table 4: Same results on the phrasal Edinburgh++ corpus but with scores divided by token-only alignment (subscript t) and phrase-only alignment (subscript p). [sent-534, score-1.166]
87 it is another topic, we simply show in this section using just alignment scores in binary prediction problems. [sent-535, score-0.428]
88 We could not report on Meteor as Meteor does not explicitly output alignment scores. [sent-549, score-0.402]
89 (2006) MacCartney and Manning (2008) Heilman and Smith (2010) the token aligner A% 60. [sent-551, score-0.53]
90 (2006) Das and Smith (2009) Heilman and Smith (2010) the token aligner our phrasal aligner A% 75. [sent-567, score-1.262]
91 (2013b) the token aligner our phrasal aligner MAP 0. [sent-583, score-1.262]
92 state-of-the-art result since no sophisticated models were additionally used but only the alignment score. [sent-596, score-0.402]
93 It still follows the pattern from the alignment experiment that the phrasal aligner had higher recall and lower precision than the token aligner in the task of RTE and PP. [sent-598, score-1.664]
94 In the QA task, the phrasal aligner performed better than all systems except for the top one. [sent-599, score-0.732]
95 The combination of semi-Markov states and phrasal states makes phrasal alignment on both the source and target sides possible. [sent-601, score-1.394]
96 The final phrasebased aligner performed the best on two phrasal alignment corpora and showed its potential usage in three NLP tasks. [sent-602, score-1.21]
97 Future work includes aligning discontinuous (gappy) phrases and integrating alignment more closely in NLP applications. [sent-603, score-0.586]
98 HMM word and phrase alignment for statistical machine translation. [sent-661, score-0.493]
99 Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. [sent-739, score-0.451]
100 A joint phrasal and dependency model for paraphrase alignment. [sent-760, score-0.469]
wordName wordTfidf (topN-words)
[('alignment', 0.402), ('aligner', 0.396), ('phrasal', 0.336), ('lt', 0.214), ('ls', 0.196), ('meteor', 0.187), ('clo', 0.182), ('aligning', 0.139), ('sed', 0.138), ('token', 0.134), ('paraphrase', 0.133), ('aligners', 0.123), ('thadani', 0.12), ('alignments', 0.114), ('state', 0.107), ('maccartney', 0.096), ('rte', 0.093), ('phrase', 0.091), ('source', 0.086), ('states', 0.083), ('lhs', 0.079), ('ai', 0.077), ('manli', 0.077), ('aligned', 0.076), ('monolingual', 0.076), ('edinburgh', 0.07), ('yao', 0.069), ('target', 0.068), ('ppdb', 0.067), ('nyc', 0.067), ('ay', 0.064), ('entailment', 0.062), ('ltj', 0.061), ('ptj', 0.061), ('tokenbased', 0.061), ('heilman', 0.056), ('decoding', 0.055), ('shops', 0.053), ('align', 0.053), ('textual', 0.05), ('paraphrases', 0.049), ('semimarkov', 0.049), ('closed', 0.047), ('edb', 0.046), ('sarawagi', 0.046), ('phrases', 0.045), ('markovian', 0.043), ('york', 0.042), ('tokens', 0.042), ('corpora', 0.042), ('bi', 0.041), ('answering', 0.04), ('xuchen', 0.04), ('synthesized', 0.04), ('cohn', 0.04), ('crf', 0.039), ('smith', 0.039), ('supports', 0.039), ('property', 0.038), ('bill', 0.038), ('durme', 0.037), ('null', 0.037), ('asi', 0.036), ('premise', 0.036), ('side', 0.036), ('edit', 0.036), ('chris', 0.035), ('qa', 0.035), ('aligns', 0.035), ('cost', 0.034), ('chunker', 0.034), ('phrasebased', 0.034), ('maximal', 0.032), ('span', 0.031), ('companie', 0.031), ('gappy', 0.031), ('kouylekov', 0.031), ('phylogeny', 0.031), ('sded', 0.031), ('swarm', 0.031), ('temporari', 0.031), ('upweighted', 0.031), ('dolan', 0.03), ('question', 0.03), ('city', 0.03), ('march', 0.028), ('ly', 0.027), ('vi', 0.027), ('length', 0.027), ('andrews', 0.027), ('kapil', 0.027), ('hsmm', 0.027), ('denkowski', 0.027), ('conditional', 0.026), ('consecutive', 0.026), ('marneffe', 0.026), ('scores', 0.026), ('maximization', 0.025), ('benjamin', 0.025), ('tj', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.
2 0.25793517 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora
Author: Karl Pichotta ; John DeNero
Abstract: We address the problem of identifying multiword expressions in a language, focusing on English phrasal verbs. Our polyglot ranking approach integrates frequency statistics from translated corpora in 50 different languages. Our experimental evaluation demonstrates that combining statistical evidence from many parallel corpora using a novel ranking-oriented boosting algorithm produces a comprehensive set ofEnglish phrasal verbs, achieving performance comparable to a human-curated set.
3 0.17625031 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu
Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.
4 0.15201946 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization
Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang
Abstract: This paper proposes a multi-objective optimization framework which supports heterogeneous information sources to improve alignment in machine translation system combination techniques. In this area, most of techniques usually utilize confusion networks (CN) as their central data structure to compact an exponential number of an potential hypotheses, and because better hypothesis alignment may benefit constructing better quality confusion networks, it is natural to add more useful information to improve alignment results. However, these information may be heterogeneous, so the widely-used Viterbi algorithm for searching the best alignment may not apply here. In the multi-objective optimization framework, each information source is viewed as an independent objective, and a new goal of improving all objectives can be searched by mature algorithms. The solutions from this framework, termed Pareto optimal solutions, are then combined to construct confusion networks. Experiments on two Chinese-to-English translation datasets show significant improvements, 0.97 and 1.06 BLEU points over a strong Indirected Hidden Markov Model-based (IHMM) system, and 4.75 and 3.53 points over the best single machine translation systems.
5 0.14686808 151 emnlp-2013-Paraphrasing 4 Microblog Normalization
Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso
Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.
6 0.11995734 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
8 0.11385678 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
9 0.099062197 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
10 0.096996278 2 emnlp-2013-A Convex Alternative to IBM Model 2
11 0.09286461 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
12 0.091237679 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
13 0.08016371 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
14 0.079458274 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
15 0.078393288 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
16 0.074836679 166 emnlp-2013-Semantic Parsing on Freebase from Question-Answer Pairs
17 0.072142616 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
18 0.070072696 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training
19 0.068296537 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
20 0.067086644 31 emnlp-2013-Automatic Feature Engineering for Answer Selection and Extraction
topicId topicWeight
[(0, -0.225), (1, -0.141), (2, 0.028), (3, 0.029), (4, 0.027), (5, 0.062), (6, -0.014), (7, 0.03), (8, -0.004), (9, -0.052), (10, 0.114), (11, 0.144), (12, 0.183), (13, 0.27), (14, 0.073), (15, -0.004), (16, 0.027), (17, 0.139), (18, 0.185), (19, -0.244), (20, 0.041), (21, -0.079), (22, -0.042), (23, -0.002), (24, 0.056), (25, 0.134), (26, 0.029), (27, -0.057), (28, 0.096), (29, 0.088), (30, -0.016), (31, 0.15), (32, 0.044), (33, 0.146), (34, -0.058), (35, 0.063), (36, 0.015), (37, -0.05), (38, -0.035), (39, 0.022), (40, 0.039), (41, 0.008), (42, 0.031), (43, -0.03), (44, 0.107), (45, -0.001), (46, -0.017), (47, -0.088), (48, 0.132), (49, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.96724451 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.
2 0.74045277 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization
Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang
Abstract: This paper proposes a multi-objective optimization framework which supports heterogeneous information sources to improve alignment in machine translation system combination techniques. In this area, most of techniques usually utilize confusion networks (CN) as their central data structure to compact an exponential number of an potential hypotheses, and because better hypothesis alignment may benefit constructing better quality confusion networks, it is natural to add more useful information to improve alignment results. However, these information may be heterogeneous, so the widely-used Viterbi algorithm for searching the best alignment may not apply here. In the multi-objective optimization framework, each information source is viewed as an independent objective, and a new goal of improving all objectives can be searched by mature algorithms. The solutions from this framework, termed Pareto optimal solutions, are then combined to construct confusion networks. Experiments on two Chinese-to-English translation datasets show significant improvements, 0.97 and 1.06 BLEU points over a strong Indirected Hidden Markov Model-based (IHMM) system, and 4.75 and 3.53 points over the best single machine translation systems.
Author: Katsuhito Sudoh ; Shinsuke Mori ; Masaaki Nagata
Abstract: This paper proposes a novel noise-aware character alignment method for bootstrapping statistical machine transliteration from automatically extracted phrase pairs. The model is an extension of a Bayesian many-to-many alignment method for distinguishing nontransliteration (noise) parts in phrase pairs. It worked effectively in the experiments of bootstrapping Japanese-to-English statistical machine transliteration in patent domain using patent bilingual corpora.
4 0.60336232 96 emnlp-2013-Identifying Phrasal Verbs Using Many Bilingual Corpora
Author: Karl Pichotta ; John DeNero
Abstract: We address the problem of identifying multiword expressions in a language, focusing on English phrasal verbs. Our polyglot ranking approach integrates frequency statistics from translated corpora in 50 different languages. Our experimental evaluation demonstrates that combining statistical evidence from many parallel corpora using a novel ranking-oriented boosting algorithm produces a comprehensive set ofEnglish phrasal verbs, achieving performance comparable to a human-curated set.
5 0.54176795 151 emnlp-2013-Paraphrasing 4 Microblog Normalization
Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso
Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.
6 0.52579576 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
7 0.48469889 2 emnlp-2013-A Convex Alternative to IBM Model 2
8 0.39200491 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
9 0.36735761 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese
10 0.35220334 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
11 0.34409842 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
12 0.34301874 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
13 0.33183303 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
14 0.32383826 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment
15 0.32351181 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
16 0.32337031 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels
17 0.30956429 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
18 0.30945751 32 emnlp-2013-Automatic Idiom Identification in Wiktionary
19 0.2782442 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
20 0.27320221 156 emnlp-2013-Recurrent Continuous Translation Models
topicId topicWeight
[(3, 0.034), (10, 0.011), (18, 0.051), (22, 0.031), (30, 0.104), (47, 0.013), (50, 0.012), (51, 0.222), (66, 0.044), (71, 0.024), (75, 0.054), (77, 0.019), (90, 0.012), (95, 0.015), (96, 0.021), (99, 0.223)]
simIndex simValue paperId paperTitle
same-paper 1 0.86296177 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment
Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark
Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.
2 0.74866235 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu
Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.
3 0.74844462 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
Author: Kuzman Ganchev ; Dipanjan Das
Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.
4 0.74567705 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
Author: Yangfeng Ji ; Jacob Eisenstein
Abstract: Matrix and tensor factorization have been applied to a number of semantic relatedness tasks, including paraphrase identification. The key idea is that similarity in the latent space implies semantic relatedness. We describe three ways in which labeled data can improve the accuracy of these approaches on paraphrase classification. First, we design a new discriminative term-weighting metric called TF-KLD, which outperforms TF-IDF. Next, we show that using the latent representation from matrix factorization as features in a classification algorithm substantially improves accuracy. Finally, we combine latent features with fine-grained n-gram overlap features, yielding performance that is 3% more accurate than the prior state-of-the-art.
5 0.74321598 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Author: Yiping Jin ; Min-Yen Kan ; Jun-Ping Ng ; Xiangnan He
Abstract: This paper presents DefMiner, a supervised sequence labeling system that identifies scientific terms and their accompanying definitions. DefMiner achieves 85% F1 on a Wikipedia benchmark corpus, significantly improving the previous state-of-the-art by 8%. We exploit DefMiner to process the ACL Anthology Reference Corpus (ARC) – a large, real-world digital library of scientific articles in computational linguistics. The resulting automatically-acquired glossary represents the terminology defined over several thousand individual research articles. We highlight several interesting observations: more definitions are introduced for conference and workshop papers over the years and that multiword terms account for slightly less than half of all terms. Obtaining a list of popular , defined terms in a corpus ofcomputational linguistics papers, we find that concepts can often be categorized into one of three categories: resources, methodologies and evaluation metrics.
6 0.74271315 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
7 0.74260116 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
8 0.74140477 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
9 0.74021083 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
10 0.74018478 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
11 0.7396853 143 emnlp-2013-Open Domain Targeted Sentiment
12 0.73870432 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
13 0.73869526 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching
14 0.73836917 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
15 0.73786771 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
16 0.73661965 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
17 0.7363168 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
18 0.73601782 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
19 0.73541409 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
20 0.73508841 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation