emnlp emnlp2010 emnlp2010-107 knowledge-graph by maker-knowledge-mining

107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation


Source: pdf

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Chai Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA { zhangch 6 , Abstract While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. [sent-2, score-0.579]

2 To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. [sent-3, score-1.472]

3 We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. [sent-4, score-1.807]

4 For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. [sent-5, score-0.762]

5 Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations. [sent-6, score-1.302]

6 Given a segment from a textual document, the task of textual entailment is to automatically determine whether a given hypothesis can be entailed from the segment. [sent-13, score-0.948]

7 edu Textual entailment has mainly focused on inference from written text in monologue. [sent-17, score-0.392]

8 Recent years also observed an increasing amount of conversational data such as conversation scripts of meetings, call center records, court proceedings, as well as online chatting. [sent-18, score-0.756]

9 Although conversation is a form of language, it is different from monologue text with several unique characteristics. [sent-19, score-0.669]

10 The key distinctive features include turn-taking between participants, grounding between participants, different linguistic phenomena of utterances, and conversation implicatures. [sent-20, score-0.669]

11 Traditional approaches dealing with textual entailment were not designed to handle these unique conversation behaviors and thus to support automated entailment from conversation scripts. [sent-21, score-2.166]

12 To address this limitation, our previous work (Zhang and Chai, 2009) has initiated an investigation on the problem of conversation entailment. [sent-22, score-0.741]

13 The problem was formulated as follows: given a conversation discourse D and a hypothesis Hconcerning its participant, the goal was to identify whether D entails H. [sent-23, score-0.846]

14 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 7t5ic6s–76 , conversation segment while the second hypothesis cannot. [sent-26, score-0.905]

15 It is not clear whether the previous results are generally applicable, how different components in the entailment framework interact with each other, and how different representations may influence the entailment outcome. [sent-28, score-0.747]

16 To reach a better understanding of conversation entailment, we conducted a further investigation based on the larger set of test data collected in our previous work (Zhang and Chai, 2009). [sent-29, score-0.716]

17 We specifically examined two levels of representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. [sent-30, score-1.777]

18 For each of these levels, we further explored two ways of capturing long distance relations: (1) implicit modeling based on the length of distance and (2) explicit modeling based on actual patterns of relations. [sent-31, score-0.713]

19 Our empirical findings have shown that augmented representation with conversation structures is important in conversation entailment. [sent-32, score-1.586]

20 Combining conversation structures with explicit modeling of long distance relations results in the best performance. [sent-33, score-1.143]

21 2 Related Work Our work here is related to recent advances in textual entailment, automated processing of conversation scripts, and our initial investigation on conversation entailment. [sent-34, score-1.541]

22 There is a large body of work on textual entailment initiated by the Pascal Recognizing Textual Entailment (RTE) Challenges (Dagan et al. [sent-35, score-0.491]

23 These results indicate that, while progress has been made, textual entailment remains a challenging problem. [sent-51, score-0.466]

24 As more and more conversation data becomes available, researchers have investigated automated processing of conversation data to acquire useful information, for example, related to opinions (Somasundaran et al. [sent-52, score-1.364]

25 Recent studies have also developed approaches to summarize conversations (Murray and Carenini, 2008) and to model conversation structures (dialogue acts) from online Twitter conversations (Ritter et al. [sent-58, score-0.709]

26 Here we address a different angle regarding conversation scripts, namely conversation entailment. [sent-60, score-1.338]

27 In our previous work (Zhang and Chai, 2009), we started an initial investigation on conversation entailment. [sent-61, score-0.716]

28 Each instance consists of a conversation segment and a hypothesis (as described in Section 1). [sent-63, score-0.905]

29 The hypotheses are statements about conversation participants and are further categorized into four types: about their profile information, their beliefs and opinions, their desires, and their communicative intentions. [sent-64, score-0.818]

30 (2006) developed for textual entailment: an alignment stage followed by an entailment stage. [sent-68, score-0.591]

31 Building upon our previous work, in this paper, we systematically examine different representations of the conversation segment and different modeling of long distance relations between language constituents. [sent-69, score-1.23]

32 We compare the roles of these different representations on the performance of entailment prediction using a larger testing dataset that was not previously evaluated. [sent-70, score-0.387]

33 3 Overall Framework In our previous work (Zhang and Chai, 2009), conversation entailment is formulated as the following: given a conversation segment D which is represented by a set of clauses D = d1 ∧ . [sent-72, score-1.92]

34 de ∧ter hmined by the product of probabilities that each hypothesis clause hj is entailed from all the conversation segment clauses d1 . [sent-84, score-1.377]

35 This is based on a simple assumption that whether a clause is entailed from a conversation segment is conditionally independent from other clauses. [sent-88, score-1.069]

36 (2006), and predict the entailment decision in two stages of processing: (1) an alignment model aligns terms in the hypothesis to terms in the conversation segment; and (2) an inference model predicts the entailment based on the alignment between the hypothesis and the conversation segment. [sent-126, score-2.543]

37 1 Alignment Model An alignment is defined as a mapping function g between a term x in the conversation segment and a term y in the hypothesis. [sent-128, score-1.011]

38 To predict these alignments, the problem is formulated as binary classification: given any two terms x from the conversation and y from the hypothesis, decide the value of their alignment function g(x, y). [sent-131, score-0.873]

39 2 Inference Model Once an alignment between a hypothesis and a con- versation segment is established, an inference model is applied to predict whether the conversation segment entails the hypothesis given such alignment. [sent-133, score-1.447]

40 More specifically, as shown in Equation 1, given a clause from the hypothesis hj, a set of clauses from the conversation segment d1, . [sent-134, score-1.084]

41 However our focus here is on the new question that how different levels of semantic representation and different approaches of modeling long distance relationship affect the alignment and inference models as well as the overall entailment performance. [sent-148, score-1.055]

42 4 Semantic Representation Given the clause representation described earlier, an important question is what information from the conversation segment should be captured and represented. [sent-149, score-1.041]

43 The first level is basic representation which only captures the information from all the utterances in the conversation segment. [sent-151, score-0.869]

44 1 Basic Representation The first representation is based on the syntactic parsing from conversation utterances and we call it a basic representation. [sent-161, score-0.869]

45 2 Augmented Representation The second representation is built upon the basic representation and incorporates conversation structure across turns and utterances. [sent-173, score-0.887]

46 Figure 2(a) shows the augmented structures of the conversation segment and Figure 2(b) shows the corresponding clause representation. [sent-175, score-1.113]

47 The second edge connects an utterance to its succeeding utterance to indicate the temporal progression of the conversation (e. [sent-192, score-0.826]

48 The incorporation of speakers and dialogue acts in our augmented representations provides additional semantics of conversation discourse. [sent-207, score-0.962]

49 5 Modeling LDR A critical part in predicting entailment is to recognize the semantic relationship between two language constituents, especially when these two constituents are not directly related. [sent-208, score-0.5]

50 1 Implicit Modeling of LDR The first method characterizes the relationship simply by the distance between two constituents in the dependency structure (or augmented structure). [sent-212, score-0.392]

51 We call this method an implicit modeling of long distance relationship. [sent-214, score-0.392]

52 2 Explicit Modeling of LDR The second way of modeling long distance relationship is called explicit modeling. [sent-218, score-0.474]

53 The difference here is our paths are extracted from dependency parses as opposed to traditional constituent parses, and our paths also incorporate the representation of conversation structures (e. [sent-235, score-0.878]

54 A major difference between noun alignment and verb alignment is that, for verb alignment the consistency of their arguments is also important. [sent-251, score-0.541]

55 Based on two different ways of modeling long distance relationship (as described in Section 5), we explored two methods for modeling argument consistency (AC) in verb alignment models. [sent-254, score-0.752]

56 1 Implicit Modeling of AC The first approach models argument consistency based on implicit modeling of the relationship between a verb and its aligned subject/object. [sent-257, score-0.463]

57 Specifically, given a pair of verb terms (x, y) where x is from the conversation segment and y is from the hypothesis, let sy be the subject of y and sx be the aligned entity of sy in the conversation (in case of multiple alignments, sx is the one closest to x). [sent-258, score-1.858]

58 In Example 2, to decide whether the conversation term see (x11 in Figure 1(a), 1(b), and 2) and the hypothesis term watch (x4 in Figure 1(c), 1(d)) should be aligned, we first identify the subject of x4 in the hypothesis, which is x2 (A). [sent-261, score-0.894]

59 We then look for x2’s alignments in the conversation segment, among which x9 (You) is the closest to x11 (see). [sent-262, score-0.705]

60 Using the implicit modeling of argument consistency, we follow the same approach as in our previous work (Zhang and Chai, 2009) and trained a logistic regression model to predict verb alignment based on the features in Table 1. [sent-264, score-0.422]

61 2 Explicit Modeling of AC The second approach captures argument consistency based on explicit modeling of the relationship between a verb and its aligned subject (or object). [sent-267, score-0.501]

62 Given a pair of verb terms (x, y), let sy be the subject of y and sx be the aligned entity of sy in the conversation closest to x, we use the string describing the path from x to sx as the feature to capture subject consistency. [sent-268, score-1.117]

63 Figure 3 shows an example of alignment between the conversation terms and hypothesis terms in Example 2. [sent-277, score-0.923]

64 Note that in this figure the alignment between x5 = suggests from the hypothesis and u4 = opinion from the conversation segment is a pseudo alignment, which directly maps a verb term in the hypothesis to an utterance term represented by its dialogue act. [sent-278, score-1.344]

65 2 Applications in Inference Model As mentioned earlier, once an alignment is established, the inference model is to predict whether each clause in the hypothesis is entailed from the conversation segment. [sent-281, score-1.187]

66 Two separate models were 762 Conversation Segment Figure 3: The alignment result for Example 2 used to handle the inference of property clauses (hj (x)) and and the inference of relational clauses (hj (x, y)). [sent-282, score-0.417]

67 Here we focus on relational inference model and examine how different modeling of long distance relationship may affect relation inference. [sent-284, score-0.527]

68 For a relation h between x and y to be entailed from a conversation segment, we need to find a same or similar relation in the conversation segment between x’s and y’s counterparts (i. [sent-285, score-1.669]

69 , aligned entities of x and y in the conversation segment). [sent-287, score-0.713]

70 More specifically, given a relational clause from the hypothesis, hj (x, y), we find the sets of terms X0 = {x0 |x0 ∈ D, g(x0, x) = 1} and Y0 = {y0 |y0 ∈ D, g(y0, y) = 1}, gw(xhic,hx are aligned w Yith x a{ynd| y, respectively. [sent-288, score-0.438]

71 C poainrs oe-f quently, whether the target relational clause hj (x, y) is entailed is determined by the relationship between x∗ and y∗. [sent-294, score-0.572]

72 We predict that hj (x, y) is entailed if the distance between x∗ and y∗ is smaller than λL. [sent-299, score-0.425]

73 4 Explicit modeling of relation inference In order to capture more semantics from the relation between two terms, we use explicit modeling of the relationship between terms x∗ and y∗. [sent-302, score-0.589]

74 G←iv Ven ←th Vis characterization, the prediction of whether hj (x, y) is entailed from the conversation segment is formulated as a binary classification problem, using a k-nearest neighbor classification model with following features: 1. [sent-304, score-1.176]

75 , the path from x∗ to y∗ in the dependency structure of the conversation segment; 2. [sent-307, score-0.742]

76 For each pair of terms (x, y), where x is from a conversation segment and y is from a hypothesis, we measure whether the model correctly predicts that the two terms should or should not be aligned. [sent-327, score-0.908]

77 Figure 4(a) and 4(b) shows the comparison (F- measure) of two alignment models for verb align- Figure 5: Evaluation of inference models based on different representations ment, based on the basic representation and the augmented representation, respectively. [sent-329, score-0.537]

78 This suggests that the explicit modeling of semantic relationship between verbs and arguments works better than the implicit modeling used in previous work. [sent-332, score-0.55]

79 We evaluated two inference models, one with implicit modeling of long distance relationship and one with explicit modeling. [sent-341, score-0.622]

80 Overall speaking, the augmented representation outperforms the basic representation for both implicit modeling and explicit modeling of long distance relationship (McNemar’s tests, p < 0. [sent-344, score-1.031]

81 The explicit model performs better than implicit model only based on augmented representation (McNemar’s test, p < 0. [sent-346, score-0.385]

82 The augmented representation affects the intent type of hypothesis most significantly, so does the explicit modeling of long distance relationship. [sent-355, score-0.716]

83 3 Interaction between Clause Representations and LDR Modeling It was shown in previous sections that the augmented representation helps entailment prediction compared to the basic representation. [sent-357, score-0.586]

84 Here we want to study how they interact with other entailment components and what is their effect in the enhanced modeling of long distance relations. [sent-358, score-0.636]

85 Specifically, we test the performance of implicit and explicit modeling of long distance relations under two different representation settings: the basic representation and the augmented representation. [sent-359, score-0.864]

86 Table 2 compares the performance (accuracy) of entailment models with different relationship modeling. [sent-360, score-0.425]

87 We can see that the explicit model makes improvement over the implicit model for augmented representation (McNemar’s test, p < 0. [sent-361, score-0.385]

88 For fact, the most benefit of incorporating explicit modeling of long distance relationship ap- pears at the alignment stage, but not much at the inference stage. [sent-365, score-0.655]

89 However, this situation is different for intent, where the benefit of explicitly modeling long distance relationship mostly happened at the inference stage. [sent-366, score-0.445]

90 8 Discussion and Conclusion This paper presents an empirical investigation on conversation entailment. [sent-368, score-0.716]

91 We specifically examine two levels of representation of conversation segments and two different ways of modeling long distance relations between language constituents. [sent-369, score-1.137]

92 Our findings indicate that, although traditional architecture and approaches for textual entailment remain important, additional representation and processing that address conversation structures is critical. [sent-370, score-1.263]

93 The augmented representation with conversation structures, together with explicit modeling of semantic relations between language constituents, results in the best performance (58. [sent-371, score-1.168]

94 The work here only represents an initial step to- wards conversation entailment. [sent-373, score-0.669]

95 Besides the same challenges faced by textual entailment, it is further complicated by conversation implicature. [sent-376, score-0.799]

96 Finally, as the technology in conversation entailment is developed, its applications in NLP problems should be explored. [sent-386, score-1.005]

97 Example applications include information extraction, question answering, summarization from conversation scripts, and modeling of conversation participants. [sent-387, score-1.465]

98 These applications may provide new insights on the nature of the conversation entailment problem and its potential solutions. [sent-388, score-1.005]

99 An inference model for semantic entailment in natural language. [sent-409, score-0.422]

100 What do we know about conversation participants: Experiments on conversation entailment. [sent-496, score-1.338]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conversation', 0.669), ('entailment', 0.336), ('hj', 0.201), ('segment', 0.161), ('chai', 0.135), ('textual', 0.13), ('modeling', 0.127), ('alignment', 0.125), ('clause', 0.123), ('augmented', 0.12), ('giampiccolo', 0.111), ('distance', 0.109), ('entailed', 0.092), ('implicit', 0.092), ('relationship', 0.089), ('representation', 0.088), ('ldr', 0.086), ('pascal', 0.086), ('explicit', 0.085), ('hypothesis', 0.075), ('participants', 0.072), ('utterances', 0.07), ('dagan', 0.067), ('utterance', 0.065), ('long', 0.064), ('dialogue', 0.063), ('danilo', 0.057), ('somasundaran', 0.057), ('inference', 0.056), ('clauses', 0.056), ('consistency', 0.056), ('recognising', 0.055), ('verb', 0.055), ('hypotheses', 0.052), ('scripts', 0.051), ('representations', 0.051), ('relations', 0.049), ('bentivogli', 0.049), ('entails', 0.049), ('sy', 0.049), ('obj', 0.049), ('bernardo', 0.049), ('intent', 0.048), ('ido', 0.048), ('investigation', 0.047), ('constituents', 0.045), ('subject', 0.045), ('sx', 0.045), ('vertices', 0.045), ('aligned', 0.044), ('path', 0.044), ('zhang', 0.043), ('relational', 0.043), ('magnini', 0.043), ('basic', 0.042), ('maccartney', 0.041), ('structures', 0.04), ('bill', 0.039), ('relation', 0.039), ('dm', 0.036), ('conversational', 0.036), ('alignments', 0.036), ('acts', 0.035), ('actions', 0.034), ('hn', 0.033), ('swapna', 0.033), ('levels', 0.031), ('visit', 0.031), ('mcnemar', 0.031), ('semantic', 0.03), ('recognizing', 0.03), ('dependency', 0.029), ('formulated', 0.029), ('biographic', 0.029), ('braz', 0.029), ('brazil', 0.029), ('garera', 0.029), ('hoa', 0.029), ('raina', 0.029), ('salvo', 0.029), ('trang', 0.029), ('versation', 0.029), ('term', 0.028), ('terms', 0.027), ('edge', 0.027), ('paths', 0.026), ('object', 0.026), ('automated', 0.026), ('property', 0.025), ('communicative', 0.025), ('minister', 0.025), ('joyce', 0.025), ('zanzotto', 0.025), ('dang', 0.025), ('watch', 0.025), ('initiated', 0.025), ('janyce', 0.024), ('speakers', 0.024), ('whether', 0.024), ('predict', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999905 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

2 0.18224306 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

3 0.15037492 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

Author: Shafiq Joty ; Giuseppe Carenini ; Gabriel Murray ; Raymond T. Ng

Abstract: This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

4 0.1127759 84 emnlp-2010-NLP on Spoken Documents Without ASR

Author: Mark Dredze ; Aren Jansen ; Glen Coppersmith ; Ken Church

Abstract: There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-ofvocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (∼ 1 sec) repetitions in speech, fainndd scl luostnegrs t∼he 1m sinecto) pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudoterms; performance on a Switchboard task approaches a baseline using gold standard man- ual transcriptions.

5 0.10293803 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

6 0.08222767 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

7 0.080090001 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

8 0.066617772 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

9 0.063097432 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

10 0.062020201 95 emnlp-2010-SRL-Based Verb Selection for ESL

11 0.061845984 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

12 0.05921302 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

13 0.058395974 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

14 0.056722142 20 emnlp-2010-Automatic Detection and Classification of Social Events

15 0.054107554 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

16 0.048820172 46 emnlp-2010-Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks

17 0.04860767 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

18 0.047367986 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

19 0.047035865 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling

20 0.046585925 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.165), (1, 0.073), (2, 0.005), (3, 0.05), (4, 0.013), (5, -0.133), (6, -0.003), (7, -0.144), (8, 0.025), (9, 0.081), (10, -0.343), (11, -0.302), (12, -0.034), (13, 0.137), (14, 0.036), (15, -0.102), (16, -0.105), (17, 0.014), (18, -0.054), (19, 0.036), (20, 0.005), (21, 0.039), (22, -0.025), (23, 0.038), (24, 0.088), (25, 0.081), (26, -0.154), (27, 0.087), (28, 0.151), (29, 0.057), (30, 0.108), (31, 0.096), (32, -0.136), (33, -0.109), (34, 0.02), (35, -0.01), (36, 0.078), (37, 0.12), (38, 0.23), (39, 0.119), (40, 0.024), (41, -0.073), (42, -0.02), (43, -0.039), (44, 0.041), (45, 0.009), (46, -0.091), (47, -0.04), (48, -0.026), (49, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97000068 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

2 0.58264023 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech

Author: Vladimir Eidelman ; Zhongqiang Huang ; Mary Harper

Abstract: This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly im- prove performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.

3 0.53193992 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

Author: Shafiq Joty ; Giuseppe Carenini ; Gabriel Murray ; Raymond T. Ng

Abstract: This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.

4 0.39755583 84 emnlp-2010-NLP on Spoken Documents Without ASR

Author: Mark Dredze ; Aren Jansen ; Glen Coppersmith ; Ken Church

Abstract: There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-ofvocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (∼ 1 sec) repetitions in speech, fainndd scl luostnegrs t∼he 1m sinecto) pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudoterms; performance on a Switchboard task approaches a baseline using gold standard man- ual transcriptions.

5 0.36119741 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

6 0.25351417 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

7 0.24327649 72 emnlp-2010-Learning First-Order Horn Clauses from Web Text

8 0.23141783 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

9 0.22178304 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

10 0.21934892 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions

11 0.21848544 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

12 0.2129561 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

13 0.20227104 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

14 0.1908423 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

15 0.1832394 28 emnlp-2010-Collective Cross-Document Relation Extraction Without Labelled Data

16 0.18015401 68 emnlp-2010-Joint Inference for Bilingual Semantic Role Labeling

17 0.17286499 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task

18 0.17083625 95 emnlp-2010-SRL-Based Verb Selection for ESL

19 0.16944548 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text

20 0.16421628 92 emnlp-2010-Predicting the Semantic Compositionality of Prefix Verbs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.019), (12, 0.058), (29, 0.069), (30, 0.02), (52, 0.031), (56, 0.13), (62, 0.036), (65, 0.266), (66, 0.076), (72, 0.045), (76, 0.062), (77, 0.012), (79, 0.028), (87, 0.011), (89, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76328957 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation

Author: Chen Zhang ; Joyce Chai

Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.

2 0.53475976 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text

Author: Michael Paul ; ChengXiang Zhai ; Roxana Girju

Abstract: This paper presents a two-stage approach to summarizing multiple contrastive viewpoints in opinionated text. In the first stage, we use an unsupervised probabilistic approach to model and extract multiple viewpoints in text. We experiment with a variety of lexical and syntactic features, yielding significant performance gains over bag-of-words feature sets. In the second stage, we introduce Comparative LexRank, a novel random walk formulation to score sentences and pairs of sentences from opposite viewpoints based on both their representativeness of the collection as well as their contrastiveness with each other. Exper- imental results show that the proposed approach can generate informative summaries of viewpoints in opinionated text.

3 0.52770251 1 emnlp-2010-"Poetic" Statistical Machine Translation: Rhyme and Meter

Author: Dmitriy Genzel ; Jakob Uszkoreit ; Franz Och

Abstract: As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.

4 0.52112931 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the linguistic intuition that rich contextual information should be useful in these tasks. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.1

5 0.51917076 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification

Author: Ainur Yessenalina ; Yisong Yue ; Claire Cardie

Abstract: In this paper, we investigate structured models for document-level sentiment classification. When predicting the sentiment of a subjective document (e.g., as positive or negative), it is well known that not all sentences are equally discriminative or informative. But identifying the useful sentences automatically is itself a difficult learning problem. This paper proposes a joint two-level approach for document-level sentiment classification that simultaneously extracts useful (i.e., subjec- tive) sentences and predicts document-level sentiment based on the extracted sentences. Unlike previous joint learning methods for the task, our approach (1) does not rely on gold standard sentence-level subjectivity annotations (which may be expensive to obtain), and (2) optimizes directly for document-level performance. Empirical evaluations on movie reviews and U.S. Congressional floor debates show improved performance over previous approaches.

6 0.51617032 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

7 0.51116335 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning

8 0.50253308 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

9 0.49278897 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

10 0.49253848 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation

11 0.49175045 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields

12 0.48787358 48 emnlp-2010-Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

13 0.48707706 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

14 0.48625216 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

15 0.48549923 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

16 0.48416644 6 emnlp-2010-A Latent Variable Model for Geographic Lexical Variation

17 0.48233032 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

18 0.48159352 80 emnlp-2010-Modeling Organization in Student Essays

19 0.47963345 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields

20 0.47657502 24 emnlp-2010-Automatically Producing Plot Unit Representations for Narrative Text