emnlp emnlp2010 emnlp2010-75 knowledge-graph by maker-knowledge-mining

75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech


Source: pdf

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper examines tagging models for spontaneous English speech transcripts. [sent-3, score-0.386]

2 We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i. [sent-4, score-0.86]

3 We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal. [sent-8, score-1.267]

4 In contrast to text, conversational speech represents a significant challenge because the transcripts are not segmented into sentences. [sent-12, score-0.242]

5 One potentially beneficial type of information is prosody (Cutler et al. [sent-15, score-0.36]

6 Prosody provides cues for lexical disambiguation, sentence segmentation and classification, phrase structure and attachment, discourse structure, speaker affect, etc. [sent-17, score-0.207]

7 Additionally, prosodic features such as pause length, duration of words and phones, pitch contours, energy contours, and their normalized values have been used for speech processing tasks like sentence boundary detection (Liu et al. [sent-23, score-0.594]

8 In the ToBI scheme, aspects of prosody such as tone, prominence, and degree of juncture between words are represented symbolically. [sent-31, score-0.36]

9 For instance, Dreyer and Shafran (2007) use three classes of automatically detected ToBI break indexes, indicating major intonational breaks with a 4 , hesitation with a p, and all other breaks with a 1. [sent-32, score-0.643]

10 Recently, Huang and Harper (2010) found that they could effectively integrate prosodic informaProceMedITin,g Ms oasfs thaceh 2u0se1t0ts C,o UnSfAer,e n9c-e1 on O Ectmobpeir ic 2a0l1 M0. [sent-33, score-0.372]

11 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 8t2ic1s–831, tion in the form of this simplified three class ToBI encoding when parsing spontaneous speech by using a prosodically enriched PCFG model with latent annotations (PCFG-LA) (Matsuzaki et al. [sent-35, score-0.611]

12 , 2005; Petrov and Klein, 2007) to rescore n-best parses produced by a baseline PCFG-LA model without prosodic enrichment. [sent-36, score-0.402]

13 However, the prosodically enriched models by themselves did not perform significantly better than the baseline PCFG-LA model without enrichment, due to the negative effect that misalignments between automatic prosodic breaks and true phrase boundaries have on the model. [sent-37, score-1.105]

14 This paper investigates methods for using stateof-the-art taggers on conversational speech transcriptions and the effect that prosody has on tagging accuracy. [sent-38, score-0.857]

15 Improving POS tagging performance of speech transcriptions has implications for improving downstream applications that rely on accurate POS tags, including sentence boundary detection (Liu et al. [sent-39, score-0.477]

16 While there have been several attempts to integrate prosodic information to improve parse accuracy of speech transcripts, to the best of our knowledge there has been little work on using this type of information for POS tagging. [sent-42, score-0.509]

17 Furthermore, most of the parsing work has involved generative models and rescoring/reranking of hypotheses from the generative models. [sent-43, score-0.226]

18 As the first of our generative models, we used a Hidden Markov 822 Model (HMM) trigram tagger (Thede and Harper, 1999), which serves to establish a baseline and to gauge the difficulty of the task at hand. [sent-46, score-0.23]

19 (2009), which achieved state-of-the-art tagging performance by introducing latent tags to weaken the stringent Markov independence assumptions that generally hinder tagging performance in generative models. [sent-48, score-0.583]

20 Since prior work on parsing speech with prosody has relied on generative models, it was necessary to modify equations of the model in order to incorporate the prosodic information, and then perform rescoring in order to achieve gains. [sent-52, score-1.015]

21 However, it is far simpler to directly integrate prosody as features into the model by using a discriminative approach. [sent-53, score-0.408]

22 We also evaluate state-of-the-art Maximum Entropy taggers: the Stanford Left5 tagger (Toutanova and Manning, 2000) and the Stanford bidirectional tagger (Toutanova et al. [sent-60, score-0.363]

23 In order to assess the quality of our models, we evaluate them on the section 23 test set of the standard newswire WSJ tagging task after training all models on sections 0-22. [sent-63, score-0.294]

24 Clearly, all the models have high accuracy on newswire data, but the Stanford bidirectional tagger significantly outperforms the other models with the exception of the HMM-LA-Bidir model on this task. [sent-65, score-0.425]

25 81 Table 2: Tagging accuracy on WSJ 3 Experimental Setup In the rest of this paper, we evaluate the tagging models described in Section 2 on conversational speech. [sent-74, score-0.38]

26 , 2006) because they provide gold standard tags for conversational speech and we have access to corresponding automatically generated ToBI break indexes provided by 2005)2. [sent-78, score-0.599]

27 2A small fraction of words in the Switchboard treebank do not align with the break indexes because they were produced based on a later refinement of the transcripts used to produce the treebank. [sent-81, score-0.416]

28 For these cases, we heuristically added break * 1* to words in the middle of a sentence and *4* to words that end a sentence. [sent-82, score-0.208]

29 4 Integration of Prosodic Information In this work, we use three classes of automatically generated ToBI break indexes to represent prosodic information (Kahn et al. [sent-90, score-0.741]

30 Consider the following speech transcription example, which is enriched with ToBI break indexes in parentheses and tags: i 1) /PRP did ( 1) /VBD ( n ’ t ( 1) /RB you ( 1) /PRP know ( 4 ) /VBP i( 1) /PRP did ( 1) /AUX n ’ t ( 1) /RB . [sent-92, score-0.655]

31 The automatically predicted break 4 associated with know in the utterance compellingly indicates an intonational phrase boundary and could provide useful information for tagging if we can model it appropriately. [sent-96, score-0.484]

32 To integrate prosody into our generative models, we utilize the method from (Dreyer and Shafran, 2007) to add prosodic breaks. [sent-97, score-0.821]

33 As Figure 1 shows, ToBI breaks provide a secondary sequence of observations that is parallel to the sequence of words that comprise the sentence. [sent-98, score-0.215]

34 The discriminative models are able to utilize prosodic features directly, enabling the use of contextual interactions with other features to further improve tagging accuracy. [sent-100, score-0.676]

35 Specifically, in addition to the standard set of features used in the tagging literature, we use the feature templates presented in Table 3, where each feature associates the break bi, word wi, or some combination of the two with the current tag ti4. [sent-101, score-0.423]

36 Break and/or word values bi=B bi=B & bi−1=C wi=W & bi=B wi+1=W & bi=B wi+2=W & bi=B wi−1=W & bi=B wi−2=W & bi=B Tag value ti = T ti = T ti = T ti = T ti = T ti = T ti = T wi=W & bi=B & bi−1=C ti = T i−1 Table 3: Prosodic feature templates 5 Experiments 5. [sent-102, score-0.288]

37 1 Conversation side segmentation When working with raw speech transcripts, we initially have a long stream of unpunctuated words, which is called a conversation side. [sent-103, score-0.375]

38 As the average length of conversation side segments in our data is approximately 630 words, it poses quite a challenging tagging task. [sent-104, score-0.554]

39 Thus, we hypothesize that it is on these large segments that we should achieve the most 4We modified the Stanford taggers to handle these prosodic features. [sent-105, score-0.525]

40 Baseline Prosody OracleBreak OracleBreak+Sent OracleSent OracleBreak-Sent Rescoring Figure 2: Tagging accuracy on conversation sides improvement from the addition of prosodic informa- tion. [sent-106, score-0.758]

41 Note that the Stanford bidirectional and HMM-LA tagger perform very similarly, although the HMM-LA-Bidir tagger performs significantly better than both. [sent-113, score-0.363]

42 In contrast to the newswire task on which the Stanford bidirectional tagger performed the best, on this genre, it is slightly worse than the HMM-LA tagger, albeit the difference is not statistically significant. [sent-114, score-0.296]

43 With the direct integration of prosody into the generative models (see Figure 2), there is a slight but statistically insignificant shift in performance. [sent-115, score-0.497]

44 However, integrating prosody directly into the discriminative models leads to significant improvements in the CRF and Stanford Left5 taggers. [sent-116, score-0.456]

45 The gain in the Stanford bidirectional tagger is not statistically significant, however, which suggests that the leftto-right models benefit more from the addition of prosody than bidirectional models. [sent-117, score-0.819]

46 Figure 3 reports the baseline tagging accuracy on sentence segments, and we see significant improvements across all models. [sent-120, score-0.266]

47 The HMM-LA taggers once again achieve the best performance, with the Stanford bidirectional close behind. [sent-125, score-0.2]

48 Although the addition of prosody has very little impact on either the generative or discriminative models when applied to sentences, the baseline tagging models (i. [sent-126, score-0.801]

49 , not prosodically enriched) significantly outperform all of the prosodically enriched models operating on conversation sides. [sent-128, score-0.982]

50 Table 4 presents the results of using baseline models without prosodic enrichment trained on the human-annotated sentences to tag automatically segmented speech5. [sent-130, score-0.516]

51 As can be seen, the results are quite similar to the conversation side segmentation performances, and thus significantly lower than when tagging human-annotated sentences. [sent-131, score-0.479]

52 Baseline Prosody OracleBreak Rescoring Figure 3: Tagging accuracy on human-annotated segments other segmentation method to shorten the segments automatically, i. [sent-136, score-0.276]

53 29 Table 4: Baseline tagging accuracy on automatically detected sentence boundaries 5. [sent-145, score-0.298]

54 Hence, we conduct a series of experiments in which we systematically eliminate noisy phrase and disfluency breaks and show that under these improved conditions, prosodically enriched models can indeed be more effective. [sent-148, score-0.701]

55 The results from using Oracle Breaks on conversation sides can be seen in Figure 2. [sent-150, score-0.353]

56 To further analyze why prosodically enriched models achieve more improvement on conversation sides than on sentences, we conducted three more Oracle experiments on conversation sides. [sent-153, score-1.079]

57 For the first, OracleBreak-Sent, we further modified the data such that all breaks corresponding to a sentence ending in the human-annotated segments were converted to break 1, thus effectively only leaving inside sentence phrasal boundaries. [sent-154, score-0.554]

58 For the second, OracleSent, we converted all the breaks corresponding to a sentence end in the human-annotated segmentations to break 4, and all the others to break 1, thus effectively only leaving sentence boundary breaks. [sent-156, score-0.72]

59 This performed largely on par with OracleBreak, suggesting that the phrasealigned prosodic breaks seem to be a stand-in for sentence boundaries. [sent-157, score-0.612]

60 Finally, in the last condition, OracleBreak+Sent, we modified the OracleBreak data such that all breaks corresponding to a sentence ending in the human-annotated sentences were converted to break 9 943 9. [sent-158, score-0.423]

61 As Figure 2 indicates, this modification results in the best tagging accuracies for all the models. [sent-160, score-0.241]

62 This suggests that when we have breaks that align with phrasal and sentence boundaries, prosodically enriched models are highly effective. [sent-162, score-0.726]

63 In this manner, the prosodically enriched model can avoid poor tag sequences produced due to the misaligned break indexes. [sent-165, score-0.653]

64 As Figure 2 shows, using the baseline conversation side model to produce an n-best list for the prosodically enriched model to rescore results in significant improvements in performance for the HMM-LA model, similar to the parsing results of (Huang and Harper, 2010). [sent-166, score-0.708]

65 The size of the n-best list directly impacts performance, as reducing to n = 1is akin to tagging with the baseline model, and increasing n → ∞ amounts to tagging with the prosodically egnr nic →hed ∞ ∞m aomdeol. [sent-167, score-0.672]

66 5 Speaker turn segmentation The results presented thus far indicate that if we have access to close to perfect break indexes, we can use them effectively, but this is not likely to be true in practice. [sent-171, score-0.25]

67 We have also observed that tagging accuracy on shorter conversation sides is greater than longer conversation sides, suggesting that postprocessing the conversation sides to produce shorter segments would be desirable. [sent-172, score-1.357]

68 We thus devised a scheme by which we could automatically extract shorter speaker turn segments from conversation sides. [sent-173, score-0.541]

69 For this study, speaker turns, which effectively indicate speaker alternations, were obtained by using the metadata in the treebank to split the sentences into chunks based on speaker change. [sent-174, score-0.407]

70 Every time a speaker begins talking after the other speaker was talking, we start a new segment for that speaker. [sent-175, score-0.254]

71 Figure 4 presents tagging results on speaker turn segments. [sent-177, score-0.371]

72 We believe this is due to the fact that the prosodically enriched CRF model was able to directly use the break index information, and so restricting it to the baseline CRF model search space limits the performance to that of the baseline model. [sent-180, score-0.621]

73 With the addition of break indexes, we see marginal changes in most of the models; only the CRF tagger receives a significant boost. [sent-182, score-0.288]

74 Thus, models achieve performance gains from tagging shorter segments, but at the cost of limited usefulness of the prosodic breaks. [sent-183, score-0.66]

75 Overall, speaker turn segmenta- tion is an attractive compromise between the original conversation sides and human-annotated sentences. [sent-184, score-0.516]

76 6 Discussion Across the different models, we have found that tag- gers applied to shorter segments, either sentences or speaker turns, do not tend to benefit significantly from prosodic enrichment, in contrast to conversation sides. [sent-185, score-0.771]

77 To analyze this further we broke down the results by part of speech for the two models for which break indexes improved performance the most: the CRF and HMM-LA rescoring models, which achieved an overall error reduction of 2. [sent-186, score-0.611]

78 We present those categories that obtained the greatest benefit from prosody in Figure 5 (a) and (b). [sent-189, score-0.387]

79 Table 5 lists the prosodic 828 features that received the highest weight in the CRF model. [sent-192, score-0.372]

80 These are quite intuitive, as they seem to represent places where the prosody indicates sentence or clausal boundaries. [sent-193, score-0.385]

81 As can be seen in Table 6, the speaker turn segments are more comparable in length to sentences. [sent-206, score-0.269]

82 01 Table 6: Length statistics of different data segmentations Next, we return to the large performance degradation when tagging speech rather than newswire text to examine the major differences among the models. [sent-231, score-0.376]

83 Using two of our best performing models, the Stanford bidirectional and HMM-LA, in Figure 7 we present the categories for which performance degradation was the greatest when comparing performance of a tagger trained on WSJ to a tagger trained on spoken sentences and conversation sides. [sent-232, score-0.63]

84 Unsurprisingly, both the discriminative and generative bidirectional models achieve the most im- pressive results. [sent-234, score-0.338]

85 Since the prosodic breaks are noisier features than the others incorporated in the discriminative models, it may be useful to set their regularization parameter separately from the rest of the features, however, we have not explored this alternative. [sent-240, score-0.635]

86 Our ex- periments used human transcriptions of the conversational speech; however, realistically our models would be applied to speech recognition transcripts. [sent-241, score-0.29]

87 In such a case, word error will introduce noise in addition to the prosodic breaks. [sent-242, score-0.372]

88 In future work, we will evaluate the use of break indexes for tagging when there is lexical error. [sent-243, score-0.577]

89 We would also apply the nbest rescoring method to exploit break indexes in the HMM-LA bidirectional model, as this would likely produce further improvements. [sent-244, score-0.612]

90 7 Conclusion In this work, we have evaluated factors that are important for developing accurate tagging models for speech. [sent-245, score-0.256]

91 Given that prosodic breaks were effective knowledge sources for parsing, an important goal of this work was to evaluate their impact on various tagging model configurations. [sent-246, score-0.795]

92 Specifically, we have examined the use of prosodic information for tagging conversational speech with several different discriminative and generative models across three different speech transcript segmentations. [sent-247, score-1.099]

93 In the case that no such annotation is available, then using automatic sentence boundary detection does not serve as an appropriate replacement, but if automatic speaker turn segments can be obtained, then this is a good alternative, despite the fact that prosodic enrichment is less effective. [sent-250, score-0.823]

94 For tagging, the most important role of the break indexes appears to be as a stand in for sentence boundaries. [sent-252, score-0.394]

95 The oracle break experiments suggest that if the accuracy of the automatically induced break indexes can be improved, then the prosodically enriched models will perform as well, or even better, than their human-annotated sentence counterparts. [sent-253, score-1.157]

96 Prosodic models, automatic speech understanding, and speech synthesis: toward the common ground. [sent-259, score-0.208]

97 Sentence-internal prosody does not help parsing the way punctuation does. [sent-299, score-0.36]

98 Simultaneous recognition of words and prosody in the boston university radio speech corpus. [sent-309, score-0.464]

99 Impact of automatic comma prediction on POS/name tagging of speech. [sent-319, score-0.208]

100 Improving a simple bigram hmm partof-speech tagger by latent annotation and self-training. [sent-327, score-0.201]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('prosodic', 0.372), ('prosody', 0.36), ('prosodically', 0.256), ('conversation', 0.24), ('breaks', 0.215), ('tagging', 0.208), ('indexes', 0.186), ('break', 0.183), ('enriched', 0.182), ('tobi', 0.166), ('harper', 0.16), ('bidirectional', 0.153), ('bi', 0.142), ('speaker', 0.127), ('oraclebreak', 0.12), ('crf', 0.12), ('sides', 0.113), ('segments', 0.106), ('tagger', 0.105), ('speech', 0.104), ('shafran', 0.103), ('stanford', 0.103), ('conversational', 0.091), ('rescoring', 0.09), ('generative', 0.089), ('mary', 0.073), ('wi', 0.07), ('dreyer', 0.07), ('uh', 0.064), ('enrichment', 0.064), ('boundary', 0.063), ('oracle', 0.061), ('bidir', 0.06), ('switchboard', 0.06), ('hmm', 0.053), ('huang', 0.053), ('izhak', 0.052), ('mari', 0.052), ('discriminative', 0.048), ('models', 0.048), ('taggers', 0.047), ('transcripts', 0.047), ('ostendorf', 0.047), ('transcriptions', 0.047), ('zhongqiang', 0.047), ('kahn', 0.045), ('oraclesent', 0.045), ('latent', 0.043), ('newswire', 0.038), ('matthew', 0.037), ('trigram', 0.036), ('ti', 0.036), ('turn', 0.036), ('transcript', 0.035), ('tags', 0.035), ('accuracies', 0.033), ('accuracy', 0.033), ('boundaries', 0.032), ('shorter', 0.032), ('liu', 0.032), ('fisher', 0.032), ('vbn', 0.032), ('tag', 0.032), ('segmentation', 0.031), ('detection', 0.03), ('batliner', 0.03), ('contours', 0.03), ('cutler', 0.03), ('dtvbdin', 0.03), ('elmar', 0.03), ('eval', 0.03), ('gallwitz', 0.03), ('intonational', 0.03), ('rbs', 0.03), ('rescore', 0.03), ('silverman', 0.03), ('thede', 0.03), ('yoon', 0.03), ('roark', 0.03), ('lease', 0.03), ('turns', 0.028), ('petrov', 0.027), ('wsj', 0.027), ('greatest', 0.027), ('hillard', 0.026), ('spontaneous', 0.026), ('lessons', 0.026), ('filimonov', 0.026), ('hale', 0.026), ('humanannotated', 0.026), ('krasnyanskaya', 0.026), ('metadata', 0.026), ('reductions', 0.026), ('segmentations', 0.026), ('johns', 0.026), ('toutanova', 0.026), ('sentence', 0.025), ('pos', 0.025), ('dev', 0.025), ('cues', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

2 0.19478771 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

Author: Wei Lu ; Hwee Tou Ng

Abstract: This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances. We performed evaluations on a transcribed conversational speech domain consisting of both English and Chinese texts. Empirical results show that our method outperforms an approach based on linear-chain conditional random fields and other previous approaches.

3 0.18224306 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

4 0.12124556 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

Author: Xian Qian ; Qi Zhang ; Yaqian Zhou ; Xuanjing Huang ; Lide Wu

Abstract: Many sequence labeling tasks in NLP require solving a cascade of segmentation and tagging subtasks, such as Chinese POS tagging, named entity recognition, and so on. Traditional pipeline approaches usually suffer from error propagation. Joint training/decoding in the cross-product state space could cause too many parameters and high inference complexity. In this paper, we present a novel method which integrates graph structures of two subtasks into one using virtual nodes, and performs joint training and decoding in the factorized state space. Experimental evaluations on CoNLL 2000 shallow parsing data set and Fourth SIGHAN Bakeoff CTB POS tagging data set demonstrate the superiority of our method over cross-product, pipeline and candidate reranking approaches.

5 0.10724336 84 emnlp-2010-NLP on Spoken Documents Without ASR

Author: Mark Dredze ; Aren Jansen ; Glen Coppersmith ; Ken Church

Abstract: There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-ofvocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (∼ 1 sec) repetitions in speech, fainndd scl luostnegrs t∼he 1m sinecto) pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudoterms; performance on a Switchboard task approaches a baseline using gold standard man- ual transcriptions.

6 0.1027797 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

7 0.10051903 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

8 0.082618125 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

9 0.079280637 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

10 0.072871193 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

11 0.068042651 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

12 0.067566179 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

13 0.066629253 114 emnlp-2010-Unsupervised Parse Selection for HPSG

14 0.066121049 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

15 0.064834498 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

16 0.063600332 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

17 0.062587261 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

18 0.061240412 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

19 0.059178423 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

20 0.058478545 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.193), (1, 0.113), (2, 0.088), (3, -0.102), (4, -0.228), (5, 0.001), (6, 0.016), (7, -0.081), (8, -0.056), (9, 0.049), (10, -0.22), (11, -0.295), (12, 0.007), (13, 0.197), (14, -0.028), (15, -0.089), (16, -0.025), (17, 0.04), (18, 0.0), (19, 0.131), (20, -0.091), (21, 0.03), (22, 0.068), (23, -0.031), (24, 0.084), (25, 0.001), (26, -0.206), (27, 0.07), (28, 0.071), (29, -0.091), (30, 0.119), (31, 0.025), (32, -0.145), (33, -0.063), (34, -0.075), (35, -0.174), (36, 0.062), (37, 0.174), (38, 0.03), (39, -0.051), (40, -0.044), (41, 0.055), (42, -0.014), (43, 0.046), (44, -0.105), (45, 0.037), (46, -0.005), (47, 0.041), (48, -0.03), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95613825 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

2 0.60814887 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

3 0.58069634 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

Author: Wei Lu ; Hwee Tou Ng

Abstract: This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances. We performed evaluations on a transcribed conversational speech domain consisting of both English and Chinese texts. Empirical results show that our method outperforms an approach based on linear-chain conditional random fields and other previous approaches.

4 0.40762323 84 emnlp-2010-NLP on Spoken Documents Without ASR

Author: Mark Dredze ; Aren Jansen ; Glen Coppersmith ; Ken Church

Abstract: There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-ofvocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (∼ 1 sec) repetitions in speech, fainndd scl luostnegrs t∼he 1m sinecto) pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudoterms; performance on a Switchboard task approaches a baseline using gold standard man- ual transcriptions.

5 0.30996132 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

Author: Yoong Keok Lee ; Aria Haghighi ; Regina Barzilay

Abstract: Part-of-speech (POS) tag distributions are known to exhibit sparsity a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorporating this sparsity constraint improves tagging accuracy. However, in existing systems, this expansion come with a steep increase in model complexity. This paper proposes a simple and effective tagging method that directly models tag sparsity and other distributional properties of valid POS tag assignments. In addition, this formulation results in a dramatic reduction in the number of model parameters thereby, enabling unusually rapid training. Our experiments consistently demonstrate that this model architecture yields substantial performance gains over more complex tagging — counterparts. On several languages, we report performance exceeding that of more complex state-of-the art systems.1

6 0.30115262 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

7 0.29346985 41 emnlp-2010-Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

8 0.28813055 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

9 0.28401449 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

10 0.27192134 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

11 0.26515514 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction

12 0.26052994 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

13 0.24852858 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

14 0.23465836 114 emnlp-2010-Unsupervised Parse Selection for HPSG

15 0.23432393 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions

16 0.22565199 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

17 0.20588088 115 emnlp-2010-Uptraining for Accurate Deterministic Question Parsing

18 0.19765659 10 emnlp-2010-A Probabilistic Morphological Analyzer for Syriac

19 0.1974746 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

20 0.19696125 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.016), (11, 0.011), (12, 0.023), (29, 0.161), (30, 0.02), (32, 0.018), (43, 0.317), (52, 0.021), (56, 0.082), (62, 0.013), (66, 0.093), (72, 0.049), (76, 0.027), (79, 0.018), (87, 0.019), (89, 0.016), (99, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78328246 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

2 0.71411854 44 emnlp-2010-Enhancing Mention Detection Using Projection via Aligned Corpora

Author: Yassine Benajiba ; Imed Zitouni

Abstract: The research question treated in this paper is centered on the idea of exploiting rich resources of one language to enhance the performance of a mention detection system of another one. We successfully achieve this goal by projecting information from one language to another via a parallel corpus. We examine the potential improvement using various degrees of linguistic information in a statistical framework and we show that the proposed technique is effective even when the target language model has access to a significantly rich feature set. Experimental results show up to 2.4F improvement in performance when the system has access to information obtained by projecting mentions from a resource-richlanguage mention detection system via a parallel corpus.

3 0.52886724 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.

4 0.52853191 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Author: Taesun Moon ; Katrin Erk ; Jason Baldridge

Abstract: We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perform poorly in unsupervised and semisupervised POS tagging. This modification significantly improves unsupervised POS tagging performance across several measures on five data sets for four languages. We also show that simply using different hyperparameter values for content and function word states in a standard HMM (which we call HMM+) is surprisingly effective.

5 0.52489752 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

6 0.52466017 77 emnlp-2010-Measuring Distributional Similarity in Context

7 0.52205098 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

8 0.52186453 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

9 0.52056926 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

10 0.51890242 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

11 0.5169493 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

12 0.51694459 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

13 0.5168339 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

14 0.51529449 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

15 0.51430374 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

16 0.51310229 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

17 0.51231962 52 emnlp-2010-Further Meta-Evaluation of Broad-Coverage Surface Realization

18 0.51030278 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

19 0.50958657 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

20 0.50906527 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding