acl acl2011 acl2011-265 knowledge-graph by maker-knowledge-mining

265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Source: pdf

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

Abstract: In most statistical machine translation systems, the phrase/rule extraction algorithm uses alignments in the 1-best form, which might contain spurious alignment points. The usage ofweighted alignment matrices that encode all possible alignments has been shown to generate better phrase tables for phrase-based systems. We propose two algorithms to generate the well known MSD reordering model using weighted alignment matrices. Experiments on the IWSLT 2010 evaluation datasets for two language pairs with different alignment algorithms show that our methods produce more accurate reordering models, as can be shown by an increase over the regular MSD models of 0.4 BLEU points in the BTEC French to English test set, and of 1.5 BLEU points in the DIALOG Chinese to English test set.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 sl aubies sl, Abstract In most statistical machine translation systems, the phrase/rule extraction algorithm uses alignments in the 1-best form, which might contain spurious alignment points. [sent-9, score-0.432]

2 The usage ofweighted alignment matrices that encode all possible alignments has been shown to generate better phrase tables for phrase-based systems. [sent-10, score-0.549]

3 We propose two algorithms to generate the well known MSD reordering model using weighted alignment matrices. [sent-11, score-0.622]

4 Experiments on the IWSLT 2010 evaluation datasets for two language pairs with different alignment algorithms show that our methods produce more accurate reordering models, as can be shown by an increase over the regular MSD models of 0. [sent-12, score-0.562]

5 4 BLEU points in the BTEC French to English test set, and of 1. [sent-13, score-0.026]

6 5 BLEU points in the DIALOG Chinese to English test set. [sent-14, score-0.026]

7 The translation quality of statistical phrase-based systems (Koehn et al. [sent-16, score-0.037]

8 , 2003) is heavily dependent on the quality of the translation and reordering models generated during the phrase extraction algorithm (Ling et al. [sent-17, score-0.386]

9 The basic phrase extraction algorithm uses word alignment information to constraint the possible phrases that can be extracted. [sent-19, score-0.358]

10 It has been shown that better alignment quality generally leads to better results (Ganchev et al. [sent-20, score-0.243]

11 However the relationship between the word alignment quality and the results is not straightforward, and it was shown in (Vilar et al. [sent-22, score-0.243]

12 , 2006) that better alignments in terms of F-measure do not always lead to better translation quality. [sent-23, score-0.12]

13 pt The fact that spurious word alignments might occur leads to the use of alternative representations for word alignments that allow multiple alignment hypotheses, rather than the 1-best alignment (Venugopal et al. [sent-26, score-0.693]

14 While using n-best alignments yields improvements over using the 1-best alignment, these methods are computationally expen- sive. [sent-30, score-0.083]

15 , 2009) produces improvements over the methods above, while reducing the computational cost by using weighted alignment matrices to represent the alignment distribution over each parallel sentence. [sent-32, score-0.688]

16 However, their results were limited by the fact that they had no method for extracting a reordering model from these matrices, and used a simple distance-based model. [sent-33, score-0.26]

17 In this paper, we propose two methods for generating the MSD (Mono Swap Discontinuous) reordering model from the weighted alignment matrices. [sent-34, score-0.597]

18 First, we test a simple approach by using the 1-best alignment to generate the reordering model, while using the alignment matrix to produce the translation model. [sent-35, score-0.821]

19 This reordering model is a simple adaptation of the MSD model to read from alignment matrices. [sent-36, score-0.529]

20 Secondly, we develop two algorithms to infer the reordering model from the weighted alignment matrix probabilities. [sent-37, score-0.686]

21 The first one uses the alignment information within phrase pairs, while the second uses contextual information of the phrase pairs. [sent-38, score-0.473]

22 , 2007) allows many configurations for the reordering model to be used. [sent-43, score-0.28]

23 In this work, we will only refer to the default configuration (msd-bidirectional-fe), which uses the MSD model, and calculates the reordering orientation for the previous and the next word, for each phrase pair. [sent-44, score-0.648]

24 Other possible configurations are simpler than the default one. [sent-45, score-0.043]

25 For instance, the monotonicity model only considers monotone and non-monotone orientation types, whereas the MSD model also considers the monotone orientation type, but distinguishes the non-monotone orientation type between swap and discontinuous. [sent-46, score-0.885]

26 an−1 • The orientation is swap, if only the next word iTnh tehe o source oisn aligned w, iifth o tnhley previous wwoorrdd in the target, or more formally, if ajn+−11 ∈ A ∧ an−1 • ajn+−11 ∈/ A. [sent-49, score-0.352]

27 The orientation is discontinuous if neither of tThhee a obroivenet are true, swcohincthin means, n(aeint−he1r ∈ A∧ ajn+−11 ∈ A) ∨ (an−1 ∈/ A ∧ ajn+−11 ∈/ A). [sent-50, score-0.315]

28 The orientations with respect to the next word are given analogously. [sent-51, score-0.095]

29 The reordering model is generated by grouping the phrase pairs that are equal, and calculating the probabilities of the grouped phrase pair being associated each orientation type and di- rection, based on the orientations for each direction that are extracted. [sent-52, score-0.911]

30 Case a) is classified as monotonous, case b) is classified as swap and cases c) and d) are classified as discontinuous. [sent-54, score-0.164]

31 given by: P(p,mono) = C(mono)+CC(m(swonaop))+C(disc) (1) Where C(o) is the number of times a phrase is ex- tracted with the orientation o in that group of phrase pairs. [sent-55, score-0.463]

32 We use the default smoothing configuration which adds the fixed value of 0. [sent-57, score-0.064]

33 3 Weighted MSD Model When using a weighted alignment matrix, rather than working with alignments points, we use the probability of each word in the source aligning with each word in the target. [sent-59, score-0.465]

34 Thus, the regular MSD model cannot be directly applied here. [sent-60, score-0.086]

35 One obvious solution to solve this problem is to produce a 1-best alignment set along with the alignment matrix, and use the 1-best alignment to generate the reordering model, while using the alignment matrix to produce the translation model. [sent-61, score-1.307]

36 However, this method would not be taking advantage of the weighted alignment matrix. [sent-62, score-0.337]

37 The following subsections describe two algorithms that are proposed to make use of the alignment probabilities. [sent-63, score-0.268]

38 1 Score-based Each phrase pair that is extracted using the algorithm described in (Liu et al. [sent-65, score-0.176]

39 This score is higher if the alignment points in the phrase pair have high probabilities, and if the alignment is consistent. [sent-67, score-0.688]

40 Thus, if an extracted phrase pair has better quality, its orientation should have more weight than phrase pairs with worse quality. [sent-68, score-0.524]

41 We implement this by changing the C(o) function in equation 1 from being the number of the phrase pairs with the orientation o, to the sum of the scores of those phrases. [sent-69, score-0.438]

42 We also need to normalize the scores for each group, due to the fixed smoothing that is applied, since if the sum of the scores is much lower (e. [sent-70, score-0.121]

43 5), the latter will overshadow the weight of the phrase pairs. [sent-74, score-0.115]

44 The normalization is done by setting the phrase pair with the highest value of the sum of all MSD probabilities to 1, and readjusting other phrase pairs accordingly. [sent-75, score-0.364]

45 Thus, a group of 3 phrase pairs that have the MSD probability sums of 0. [sent-76, score-0.139]

46 2 Context-based We propose an alternative algorithm to calculate the reordering orientations for each phrase pair. [sent-82, score-0.421]

47 Rather than classifying each phrase pair with either × monotonous (M), swap (S) or discontinuous (D), we calculate the probability for each orientation, and use these as weighted counts when creating the reordering model. [sent-83, score-0.888]

48 In the regular MSD model, the previous orientation for a phrase pair is monotonous if the previous word in the source phrase is aligned with the previous word in the target phrase and not aligned with the next word. [sent-85, score-1.091]

49 Also, the sum of wtheo probabilities eotf a1ll − −or Wientations (Pc(M), Pc(S), Pc(D)) for a given phrase pair can be trivially shown to be 1. [sent-87, score-0.249]

50 The probabilities for the next word are given analogously. [sent-88, score-0.058]

51 Following equation 1, the function C(o) is changed to be the sum of all Pc(o), from the grouped phrase pairs. [sent-89, score-0.173]

52 The development corpus for the BTEC task was the CSTAR03 test set composed by 506 sentences, and the test set was the IWSLT04 test set composed by 500 sentences and 16 references. [sent-97, score-0.048]

53 As for the DIALOG task, the development set was the IWSLT09 devset composed by 200 sentences, and the test set was the CSTAR03 test set with 506 sentences and 16 references. [sent-98, score-0.024]

54 2 Setup We use weighted alignment matrices based on Hidden Markov Models (HMMs), which are produced by the the PostCAT toolkit1 , based on the posterior regularization framework (V. [sent-100, score-0.445]

55 The extraction algorithm using weighted alignment matrices employs the same method described in (Liu et al. [sent-103, score-0.445]

56 , 2009), and the phrase pruning threshold was set to 0. [sent-104, score-0.115]

57 For the reordering model, we use the distance-based reordering, and compare the results with the MSD model using the 1-best alignment. [sent-106, score-0.26]

58 Then, we apply our two methods based on alignment matrices. [sent-107, score-0.243]

59 html sum of all Pc(o), weighted by the scores of the respective phrase pairs. [sent-112, score-0.278]

60 The optimization of the translation model weights was done using MERT, and each experiment was run 5 times, and the final score is calculated as the average of the 5 runs, in order to stabilize the results. [sent-113, score-0.083]

61 The BLEU-4 and METEOR scores were computed using 16 references. [sent-115, score-0.031]

62 3 Reordering model comparison Tables 1 and 2 show the scores using the different reordering models. [sent-118, score-0.291]

63 Consistent improvements in the BLEU scores may be observed when changing from the MSD model to the models generated using alignment matrices. [sent-119, score-0.321]

64 The results were consistently better using our models in the DIALOG task, since the English-Chinese language pair is more dependent on the reordering model. [sent-120, score-0.295]

65 This is evident if we look at the difference in the scores between the distance-based and the MSD models. [sent-121, score-0.031]

66 Furthermore, in this task, we observe an improvement on all scores from the MSD model to our weighted MSD models, which suggests that the usage of alignment matrices helps predict the reordering probabilities more accurately. [sent-122, score-0.771]

67 We can also see that the context based reordering model performs better than the score based model in the BTEC task, which does not perform significantly better than the regular MSD model in this task. [sent-123, score-0.372]

68 We believe this is because the alignment probabilities are much more accurate in the English-French language pair, and phrase pair scores remain consistent throughout the extraction, making the score based approach and the regular MSD model behave similarly. [sent-125, score-0.571]

69 On the other hand, in the DIALOG task, score based model has better performance than the regular MSD model, and the combination of both methods yields a significant improvement over each method alone. [sent-126, score-0.086]

70 Table 3 shows a case where the context based model is more accurate than the regular MSD model. [sent-127, score-0.086]

71 The alignment is obviously faulty, since the word “two” is aligned with both “deux”, although it should only be aligned with the first occurrence. [sent-128, score-0.393]

72 Furthermore, the word “twin” should be aligned with “ a` deux lit”, but it is aligned with “cham- bres”. [sent-155, score-0.249]

73 If we use the 1-best alignment to compute the reordering type of the sentence pair “Je voudrais deux” / “I’d like to reserve two”, the reordering type for the following orientation would be monotonous, since the next word “chambres” is falsely aligned with “twin”. [sent-156, score-1.125]

74 However, it should clearly be discontinuous, since the right alignment for “twin” is “ a` deux lit”. [sent-157, score-0.342]

75 This problem is less serious when we use the weighted MSD model, since the orientation probability mass would be divided between monotonous and discontinuous since the probability weighted matrix for the wrong alignment is 0. [sent-158, score-1.056]

76 On the BTEC task, some of the other scores are lower than the MSD model, and we suspect that this stems from the fact that our tuning process only attempts to maximize the BLEU score. [sent-160, score-0.031]

77 r ´eserver 5 Conclusions In this paper we addressed the limitations of the MSD reordering models extracted from the 1-best alignments, and presented two algorithms to extract these models from weighted alignment matrices. [sent-161, score-0.596]

78 Experiments show that our models perform bet- ter than the distance-based model and the regular MSD model. [sent-162, score-0.116]

79 The method based on scores showed a good performance for the Chinese-English language pair, but the performance for the English-French pair was similar to the MSD model. [sent-163, score-0.092]

80 On the other hand, the method based on context improves the results on Table 3: Weighted alignment matrix for a training sentence pair from BTEC, with spurious alignment probabilities. [sent-164, score-0.652]

81 The code used in this work is currently integrated with the Geppetto toolkit2 , and it will be made available in the next version for public use. [sent-169, score-0.023]

82 Wider pipelines: N-best alignments and parses in MT training. [sent-224, score-0.083]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('msd', 0.692), ('alignment', 0.243), ('reordering', 0.234), ('orientation', 0.233), ('monotonous', 0.198), ('btec', 0.196), ('wjn', 0.124), ('pc', 0.117), ('phrase', 0.115), ('matrices', 0.108), ('dialog', 0.102), ('ajn', 0.099), ('deux', 0.099), ('iwslt', 0.099), ('weighted', 0.094), ('gra', 0.086), ('alignments', 0.083), ('discontinuous', 0.082), ('swap', 0.08), ('aligned', 0.075), ('tiago', 0.074), ('orientations', 0.072), ('twin', 0.065), ('matrix', 0.064), ('ling', 0.062), ('pair', 0.061), ('fct', 0.06), ('regular', 0.06), ('wn', 0.054), ('jo', 0.051), ('coheur', 0.049), ('mono', 0.049), ('omonbteinxetd', 0.049), ('tnm', 0.049), ('lu', 0.045), ('ganchev', 0.044), ('lit', 0.044), ('isabel', 0.044), ('spurious', 0.041), ('vilar', 0.04), ('sij', 0.04), ('bleu', 0.039), ('sum', 0.038), ('translation', 0.037), ('probabilities', 0.035), ('venugopal', 0.032), ('phd', 0.032), ('scores', 0.031), ('ca', 0.031), ('ter', 0.03), ('moses', 0.029), ('kuzman', 0.029), ('classified', 0.028), ('meteor', 0.028), ('sl', 0.028), ('monotone', 0.027), ('spoken', 0.027), ('koehn', 0.026), ('points', 0.026), ('dyer', 0.026), ('model', 0.026), ('algorithms', 0.025), ('tthhee', 0.025), ('ao', 0.024), ('probability', 0.024), ('composed', 0.024), ('default', 0.023), ('next', 0.023), ('sa', 0.023), ('qun', 0.023), ('liu', 0.022), ('formally', 0.022), ('wang', 0.022), ('sed', 0.022), ('thie', 0.022), ('disc', 0.022), ('voudrais', 0.022), ('aachen', 0.022), ('terp', 0.022), ('itino', 0.022), ('rwth', 0.022), ('aij', 0.022), ('trancoso', 0.022), ('faulty', 0.022), ('postcat', 0.022), ('tian', 0.022), ('xinyan', 0.022), ('smoothing', 0.021), ('marcello', 0.021), ('source', 0.021), ('changing', 0.021), ('grouped', 0.02), ('configurations', 0.02), ('configuration', 0.02), ('federico', 0.02), ('stabilize', 0.02), ('wpo', 0.02), ('tiso', 0.02), ('itnh', 0.02), ('funds', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

2 0.18611142 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

3 0.18105252 266 acl-2011-Reordering with Source Language Collocations

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1

4 0.1354809 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

Author: Graham Neubig ; Taro Watanabe ; Eiichiro Sumita ; Shinsuke Mori ; Tatsuya Kawahara

Abstract: We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

5 0.13034296 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r

6 0.12811875 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

7 0.12591407 264 acl-2011-Reordering Metrics for MT

8 0.11627126 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

9 0.11408769 141 acl-2011-Gappy Phrasal Alignment By Agreement

10 0.11345201 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

11 0.10590615 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

12 0.10492428 263 acl-2011-Reordering Constraint Based on Document-Level Context

13 0.10370554 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

14 0.10011992 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

15 0.095987216 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

16 0.094254464 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

17 0.092514597 162 acl-2011-Identifying the Semantic Orientation of Foreign Words

18 0.089098848 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

19 0.084252879 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

20 0.081527412 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.157), (1, -0.135), (2, 0.116), (3, 0.107), (4, 0.058), (5, 0.039), (6, 0.029), (7, 0.011), (8, -0.015), (9, 0.052), (10, 0.12), (11, 0.05), (12, 0.009), (13, 0.014), (14, -0.124), (15, 0.013), (16, 0.065), (17, 0.003), (18, -0.136), (19, -0.003), (20, -0.077), (21, 0.01), (22, -0.087), (23, -0.02), (24, -0.025), (25, 0.044), (26, 0.121), (27, 0.047), (28, -0.044), (29, 0.01), (30, 0.075), (31, 0.002), (32, 0.078), (33, 0.017), (34, 0.046), (35, -0.133), (36, -0.025), (37, 0.062), (38, 0.019), (39, 0.046), (40, 0.018), (41, -0.011), (42, -0.039), (43, 0.092), (44, -0.031), (45, -0.052), (46, -0.025), (47, 0.136), (48, 0.053), (49, -0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93100578 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

2 0.72537327 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

3 0.69456148 266 acl-2011-Reordering with Source Language Collocations

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li

4 0.67297876 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

5 0.66610229 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

Author: Graham Neubig ; Taro Watanabe ; Eiichiro Sumita ; Shinsuke Mori ; Tatsuya Kawahara

6 0.6537168 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

7 0.64129281 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

8 0.63182533 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

9 0.62594157 263 acl-2011-Reordering Constraint Based on Document-Level Context

10 0.61388171 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

11 0.60971648 264 acl-2011-Reordering Metrics for MT

12 0.59357405 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

13 0.56664515 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

14 0.55258924 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

15 0.5257079 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

16 0.51425689 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

17 0.51383501 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

18 0.49005976 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

19 0.47267044 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

20 0.45371547 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.03), (17, 0.061), (26, 0.013), (37, 0.056), (39, 0.032), (41, 0.066), (44, 0.289), (55, 0.024), (59, 0.014), (72, 0.057), (77, 0.02), (91, 0.025), (96, 0.202)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81397349 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations

Author: Wen Wang ; Sibel Yaman ; Kristin Precoda ; Colleen Richey ; Geoffrey Raymond

Abstract: We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows. We develop annotation approaches for a variety of linguistic phenomena. Various lexical, structural, durational, and prosodic features are explored. We compare the performance when using features extracted from automatically generated annotations against that when using human annotations. We investigate the efficacy of adding prosodic features on top of lexical, structural, and durational features. Since the training data is highly imbalanced, we explore two sampling approaches, random downsampling and ensemble downsampling. Overall, our approach achieves 79.2% (precision), 50.5% (recall), 61.7% (F1) for agreement detection and 69.2% (precision), 46.9% (recall), and 55.9% (F1) for disagreement detection, on the English broadcast conversation data. 1 ?yIntroduction In ?ythis work, we present models for detecting agre?yement/disagreement (denoted (dis)agreement) betwy?een speakers in English broadcast conversation show?ys. The Broadcast Conversation (BC) genre differs from the Broadcast News (BN) genre in that it is?y more interactive and spontaneous, referring to freey? speech in news-style TV and radio programs and consisting of talk shows, interviews, call-in prog?yrams, live reports, and round-tables. Previous y? y?This work was performed while the author was at ICSI. syaman@us . ibm .com, graymond@ s oc .uc sb . edu work on detecting (dis)agreements has been focused on meeting data. (Hillard et al., 2003), (Galley et al., 2004), (Hahn et al., 2006) used spurt-level agreement annotations from the ICSI meeting corpus (Janin et al., 2003). (Hillard et al., 2003) explored unsupervised machine learning approaches and on manual transcripts, they achieved an overall 3-way agreement/disagreement classification ac- curacy as 82% with keyword features. (Galley et al., 2004) explored Bayesian Networks for the detection of (dis)agreements. They used adjacency pair information to determine the structure of their conditional Markov model and outperformed the results of (Hillard et al., 2003) by improving the 3way classification accuracy into 86.9%. (Hahn et al., 2006) explored semi-supervised learning algorithms and reached a competitive performance of 86.7% 3-way classification accuracy on manual transcriptions with only lexical features. (Germesin and Wilson, 2009) investigated supervised machine learning techniques and yields competitive results on the annotated data from the AMI meeting corpus (McCowan et al., 2005). Our work differs from these previous studies in two major categories. One is that a different definition of (dis)agreement was used. In the current work, a (dis)agreement occurs when a responding speaker agrees with, accepts, or disagrees with or rejects, a statement or proposition by a first speaker. Second, we explored (dis)agreement detection in broadcast conversation. Due to the difference in publicity and intimacy/collegiality between speakers in broadcast conversations vs. meet- ings, (dis)agreement may have different character374 Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 374–378, istics. Different from the unsupervised approaches in (Hillard et al., 2003) and semi-supervised approaches in (Hahn et al., 2006), we conducted supervised training. Also, different from (Hillard et al., 2003) and (Galley et al., 2004), our classification was carried out on the utterance level, instead of on the spurt-level. Galley et al. extended Hillard et al.’s work by adding features from previous spurts and features from the general dialog context to infer the class of the current spurt, on top of features from the current spurt (local features) used by Hillard et al. Galley et al. used adjacency pairs to describe the interaction between speakers and the relations between consecutive spurts. In this preliminary study on broadcast conversation, we directly modeled (dis)agreement detection without using adjacency pairs. Still, within the conditional random fields (CRF) framework, we explored features from preceding and following utterances to consider context in the discourse structure. We explored a wide variety of features, including lexical, structural, du- rational, and prosodic features. To our knowledge, this is the first work to systematically investigate detection of agreement/disagreement for broadcast conversation data. The remainder of the paper is organized as follows. Section 2 presents our data and automatic annotation modules. Section 3 describes various features and the CRF model we explored. Experimental results and discussion appear in Section 4, as well as conclusions and future directions. 2 Data and Automatic Annotation In this work, we selected English broadcast conversation data from the DARPA GALE program collected data (GALE Phase 1 Release 4, LDC2006E91; GALE Phase 4 Release 2, LDC2009E15). Human transcriptions and manual speaker turn labels are used in this study. Also, since the (dis)agreement detection output will be used to analyze social roles and relations of an interacting group, we first manually marked soundbites and then excluded soundbites during annotation and modeling. We recruited annotators to provide manual annotations of speaker roles and (dis)agreement to use for the supervised training of models. We de- fined a set of speaker roles as follows. Host/chair is a person associated with running the discussions 375 or calling the meeting. Reporting participant is a person reporting from the field, from a subcommittee, etc. Commentator participant/Topic participant is a person providing commentary on some subject, or person who is the subject of the conversation and plays a role, e.g., as a newsmaker. Audience participant is an ordinary person who may call in, ask questions at a microphone at e.g. a large presentation, or be interviewed because of their presence at a news event. Other is any speaker who does not fit in one of the above categories, such as a voice talent, an announcer doing show openings or commercial breaks, or a translator. Agreements and disagreements are composed of different combinations of initiating utterances and responses. We reformulated the (dis)agreement detection task as the sequence tagging of 11 (dis)agreement-related labels for identifying whether a given utterance is initiating a (dis)agreement opportunity, is a (dis)agreement response to such an opportunity, or is neither of these, in the show. For example, a Negative tag question followed by a negation response forms an agreement, that is, A: [Negative tag] This is not black and white, is it? B: [Agreeing Response] No, it isn’t. The data sparsity problem is serious. Among all 27,071 utterances, only 2,589 utterances are involved in (dis)agreement as initiating or response utterances, about 10% only among all data, while 24,482 utterances are not involved. These annotators also labeled shows with a variety of linguistic phenomena (denoted language use constituents, LUC), including discourse markers, disfluencies, person addresses and person mentions, prefaces, extreme case formulations, and dialog act tags (DAT). We categorized dialog acts into statement, question, backchannel, and incomplete. We classified disfluencies (DF) into filled pauses (e.g., uh, um), repetitions, corrections, and false starts. Person address (PA) terms are terms that a speaker uses to address another person. Person mentions (PM) are references to non-participants in the conversation. Discourse markers (DM) are words or phrases that are related to the structure of the discourse and express a relation between two utter- ances, for example, I mean, you know. Prefaces (PR) are sentence-initial lexical tokens serving functions close to discourse markers (e.g., Well, I think that...). Extreme case formulations (ECF) are lexical patterns emphasizing extremeness (e.g., This is the best book I have ever read). In the end, we manually annotated 49 English shows. We preprocessed English manual transcripts by removing transcriber annotation markers and noise, removing punctuation and case information, and conducting text normalization. We also built automatic rule-based and statistical annotation tools for these LUCs. 3 Features and Model We explored lexical, structural, durational, and prosodic features for (dis)agreement detection. We included a set of “lexical” features, including ngrams extracted from all of that speaker’s utterances, denoted ngram features. Other lexical features include the presence of negation and acquiescence, yes/no equivalents, positive and negative tag questions, and other features distinguishing different types of initiating utterances and responses. We also included various lexical features extracted from LUC annotations, denoted LUC features. These additional features include features related to the presence of prefaces, the counts of types and tokens of discourse markers, extreme case formulations, disfluencies, person addressing events, and person mentions, and the normalized values of these counts by sentence length. We also include a set of features related to the DAT of the current utterance and preceding and following utterances. We developed a set of “structural” and “durational” features, inspired by conversation analysis, to quantitatively represent the different participation and interaction patterns of speakers in a show. We extracted features related to pausing and overlaps between consecutive turns, the absolute and relative duration of consecutive turns, and so on. We used a set of prosodic features including pause, duration, and the speech rate of a speaker. We also used pitch and energy of the voice. Prosodic features were computed on words and phonetic alignment of manual transcripts. Features are computed for the beginning and ending words of an utterance. For the duration features, we used the average and maximum vowel duration from forced align- ment, both unnormalized and normalized for vowel identity and phone context. For pitch and energy, we 376 calculated the minimum, maximum,E range, mean, standard deviation, skewnesSs and kurEtosis values. A decision tree model was useSd to comEpute posteriors fFrom prosodic features and Swe used cuEmulative binnFing of posteriors as final feSatures , simEilar to (Liu et aFl., 2006). As ilPlu?stErajtSed?F i?n SectionS 2, we refEormulated the F(dis)agrePe?mEEejnSt? Fdet?ection taSsk as a seqEuence tagging FproblemP. EWEejS u?sFe?d the MalSlet packagEe (McCallum, 2F002) toP i?mEEpjSle?mFe?nt the linSear chain CERF model for FsequencPe ?tEEagjSgi?nFg.? A CRFS is an undEirected graphiFcal modPe?lEE EthjSa?t Fde?fines a glSobal log-lEinear distributFion of Pthe?EE sjtaSt?eF (o?r label) Ssequence E conditioned oFn an oPbs?EeErvjaSt?ioFn? sequencSe, in our case including Fthe sequPe?nEcEej So?fF Fse?ntences S and the corresponding sFequencPe ?oEEf jfSea?Ftur?es for this sequence of sentences F. TheP ?mEEodjSe?l Fis? optimized globally over the entire seqPue?nEEcejS. TFh?e CRF model is trained to maximize theP c?oEEnjdSit?iFon?al log-likelihood of a given training set P?EEjS? F?. During testing, the most likely sequence E is found using the Viterbi algorithm. One of the motivations of choosing conditional random fields was to avoid the label-bias problem found in hidden Markov models. Compared to Maximum Entropy modeling, the CRF model is optimized globally over the entire sequence, whereas the ME model makes a decision at each point individually without considering the context event information. 4 Experiments All (dis)agreement detection results are based on nfold cross-validation. In this procedure, we held out one show as the test set, randomly held out another show as the dev set, trained models on the rest of the data, and tested the model on the heldout show. We iterated through all shows and computed the overall accuracy. Table 1 shows the results of (dis)agreement detection using all features except prosodic features. We compared two conditions: (1) features extracted completely from the automatic LUC annotations and automatically detected speaker roles, and (2) features from manual speaker role labels and manual LUC annotations when man- ual annotations are available. Table 1 showed that running a fully automatic system to generate automatic annotations and automatic speaker roles produced comparable performance to the system using features from manual annotations whenever available. Table 1: Precision (%), recall (%), and F1 (%) of (dis)agreement detection using features extracted from manual speaker role labels and manual LUC annotations when available, denoted Manual Annotation, and automatic LUC annotations and automatically detected speaker roles, denoted Automatic Annotation. AMuatnoumaltAicn Aontaoitantio78P91.5Agr4eR3em.26en5tF671.5 AMuatnoumal tAicn Aontaoitanio76P04D.13isag3rR86e.56emn4F96t.176 We then focused on the condition of using features from manual annotations when available and added prosodic features as described in Section 3. The results are shown in Table 2. Adding prosodic features produced a 0.7% absolute gain on F1 on agreement detection, and 1.5% absolute gain on F1 on disagreement detection. Table 2: Precision (%), recall (%), and F1 (%) of (dis)agreement detection using manual annotations without and with prosodic features. w /itohp ro s o d ic 8 P1 .58Agr4 e34Re.m02en5t F76.125 w i/tohp ro s o d ic 7 0 PD.81isag43r0R8e.15eme5n4F19t.172 Note that only about 10% utterances among all data are involved in (dis)agreement. This indicates a highly imbalanced data set as one class is more heavily represented than the other/others. We suspected that this high imbalance has played a major role in the high precision and low recall results we obtained so far. Various approaches have been studied to handle imbalanced data for classifications, 377 trying to balaNnce the class distribution in the training set by eithNer oversaNmpling the minority class or downsamplinNg the maNjority class. In this preliminary study of NNsamplingN Napproaches for handling imbalanced dataN NNfor CRF Ntraining, we investigated two apprNoaches, rNNandom dNownsampling and ensemble dowNnsamplinNgN. RandoNm downsampling randomly dowNnsamples NNthe majorNity class to equate the number Nof minoritNNy and maNjority class samples. Ensemble NdownsampNNling is a N refinement of random downsamNpling whiNNch doesn’Nt discard any majority class samNples. InstNNead, we pNartitioned the majority class samNples into NN subspaNces with each subspace containiNng the samNe numbNer of samples as the minority clasNs. Then wNe train N CRF models, each based on thNe minoritNy class samples and one disjoint partitionN Nfrom the N subspaces. During testing, the posterioNr probability for one utterance is averaged over the N CRF models. The results from these two sampling approaches as well as the baseline are shown in Table 3. Both sampling approaches achieved significant improvement over the baseline, i.e., train- ing on the original data set, and ensemble downsampling produced better performance than downsampling. We noticed that both sampling approaches degraded slightly in precision but improved significantly in recall, resulting in 4.5% absolute gain on F1 for agreement detection and 4.7% absolute gain on F1 for disagreement detection. Table 3: Precision (%), recall (%), and F1 (%) of (dis)agreement detection without sampling, with random downsampling and ensemble downsampling. Manual annotations and prosodic features are used. BERansedlmoinbedwonsampling78P19D.825Aisagr4e8R0.m7e5n6 tF701. 2 EBRa ns ne dlmoinmbel dodwowns asmamp lin gn 67 09. 8324 046. 8915 351. 892 In conclusion, this paper presents our work on detection of agreements and disagreements in English broadcast conversation data. We explored a variety of features, including lexical, structural, durational, and prosodic features. We experimented these features using a linear-chain conditional random fields model and conducted supervised training. We observed significant improvement from adding prosodic features and employing two sampling approaches, random downsampling and ensemble downsampling. Overall, we achieved 79.2% (precision), 50.5% (recall), 61.7% (F1) for agreement detection and 69.2% (precision), 46.9% (recall), and 55.9% (F1) for disagreement detection, on English broadcast conversation data. In future work, we plan to continue adding and refining features, explore dependencies between features and contextual cues with respect to agreements and disagreements, and investigate the efficacy of other machine learning approaches such as Bayesian networks and Support Vector Machines. Acknowledgments The authors thank Gokhan Tur and Dilek HakkaniT u¨r for valuable insights and suggestions. This work has been supported by the Intelligence Advanced Research Projects Activity (IARPA) via Army Research Laboratory (ARL) contract number W91 1NF-09-C-0089. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, ARL, or the U.S. Government. References M. Galley, K. McKeown, J. Hirschberg, and E. Shriberg. 2004. Identifying agreement and disagreement in conversational speech: Use ofbayesian networks to model pragmatic dependencies. In Proceedings of ACL. S. Germesin and T. Wilson. 2009. Agreement detection in multiparty conversation. In Proceedings of International Conference on Multimodal Interfaces. S. Hahn, R. Ladner, and M. Ostendorf. 2006. Agreement/disagreement classification: Exploiting unlabeled data using constraint classifiers. In Proceedings of HLT/NAACL. 378 D. Hillard, M. Ostendorf, and E. Shriberg. 2003. Detection of agreement vs. disagreement in meetings: Training with unlabeled data. In Proceedings of HLT/NAACL. A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters. 2003. The ICSI Meeting Corpus. In Proc. ICASSP, Hong Kong, April. Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, and Mary Harper. 2006. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on Audio, Speech, and Language Processing, 14(5): 1526–1540, September. Special Issue on Progress in Rich Transcription. Andrew McCallum. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu. I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, W. Post, D. Reidsma, and P. Wellner. 2005. The AMI meeting corpus. In Proceedings of Measuring Behavior 2005, the 5th International Conference on Methods and Techniques in Behavioral Research.

2 0.78909147 135 acl-2011-Faster and Smaller N-Gram Language Models

Author: Adam Pauls ; Dan Klein

Abstract: N-gram language models are a major resource bottleneck in machine translation. In this paper, we present several language model implementations that are both highly compact and fast to query. Our fastest implementation is as fast as the widely used SRILM while requiring only 25% of the storage. Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques. We also discuss techniques for improving query speed during decoding, including a simple but novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%.

3 0.77581978 150 acl-2011-Hierarchical Text Classification with Latent Concepts

Author: Xipeng Qiu ; Xuanjing Huang ; Zhao Liu ; Jinlong Zhou

Abstract: Recently, hierarchical text classification has become an active research topic. The essential idea is that the descendant classes can share the information of the ancestor classes in a predefined taxonomy. In this paper, we claim that each class has several latent concepts and its subclasses share information with these different concepts respectively. Then, we propose a variant Passive-Aggressive (PA) algorithm for hierarchical text classification with latent concepts. Experimental results show that the performance of our algorithm is competitive with the recently proposed hierarchical classification algorithms.

same-paper 4 0.76077861 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

5 0.74933374 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

Author: Nikhil Garg ; James Henderson

Abstract: We propose a generative model based on Temporal Restricted Boltzmann Machines for transition based dependency parsing. The parse tree is built incrementally using a shiftreduce parse and an RBM is used to model each decision step. The RBM at the current time step induces latent features with the help of temporal connections to the relevant previous steps which provide context information. Our parser achieves labeled and unlabeled attachment scores of 88.72% and 91.65% respectively, which compare well with similar previous models and the state-of-the-art.

6 0.74793601 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base

7 0.64163929 76 acl-2011-Comparative News Summarization Using Linear Programming

8 0.62807006 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model

9 0.62713814 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

10 0.62706202 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

11 0.62567556 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

12 0.62514162 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

13 0.62481976 71 acl-2011-Coherent Citation-Based Summarization of Scientific Papers

14 0.62388456 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

15 0.62383282 141 acl-2011-Gappy Phrasal Alignment By Agreement

16 0.62329203 11 acl-2011-A Fast and Accurate Method for Approximate String Search

17 0.62294811 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

18 0.62236571 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

19 0.621961 175 acl-2011-Integrating history-length interpolation and classes in language modeling

20 0.6217913 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports