emnlp emnlp2011 emnlp2011-27 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ashequl Qadir ; Ellen Riloff
Abstract: This research studies the text genre of message board forums, which contain a mixture of expository sentences that present factual information and conversational sentences that include communicative acts between the writer and readers. Our goal is to create sentence classifiers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. We conduct experiments using a wide variety of features, including lexical and syntactic features, speech act word lists from external resources, and domain-specific semantic class features. We evaluate our results on a collection of message board posts in the domain of veterinary medicine.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This research studies the text genre of message board forums, which contain a mixture of expository sentences that present factual information and conversational sentences that include communicative acts between the writer and readers. [sent-3, score-1.214]
2 Our goal is to create sentence classifiers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. [sent-4, score-1.161]
3 We conduct experiments using a wide variety of features, including lexical and syntactic features, speech act word lists from external resources, and domain-specific semantic class features. [sent-5, score-0.793]
4 We evaluate our results on a collection of message board posts in the domain of veterinary medicine. [sent-6, score-0.794]
5 From a natural language processing perspective, message board posts are an interesting hybrid text genre because they consist of both expository text and conversational text. [sent-20, score-0.774]
6 Most message board posts contain both expository sentences as well as speech acts. [sent-24, score-1.015]
7 The person posting a message (the “writer”) often engages in speech acts with the readers. [sent-25, score-0.856]
8 Our research goals are twofold: (1) to distin- guish between expository sentences and speech act sentences in message board posts, and (2) to clasProce Ed iningbsu orfg th ,e S 2c0o1tl1an Cdo,n UfeKr,en Jcuely on 27 E–m31p,ir 2ic0a1l1 M. [sent-31, score-1.393]
9 ec th2o0d1s1 i Ans Nsoactuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgaugies ti 7c4s8–758, sify speech act sentences into four types: Commissives, Directives, Expressives, and Representatives, following Searle’s original taxonomy (Searle, 1976). [sent-33, score-0.825]
10 Information extraction systems could benefit from filtering speech act sentences (e. [sent-35, score-0.827]
11 In this paper, we present sentence classifiers that can identify speech act sentences and classify them as Commissive, Directive, Expressive, and Representative. [sent-40, score-0.866]
12 First, we explain how each speech act class is manifested in message board posts, which can be different from how they occur in spoken dialogue. [sent-41, score-1.207]
13 Second, we train classifiers to identify speech act sentences using a variety of lexical, syntactic, and semantic features. [sent-42, score-0.881]
14 Finally, we evaluate our system on a collection of message board posts in the domain of veterinary medicine. [sent-43, score-0.794]
15 2 Related Work There has been relatively little work on applying speech act theory to written text genres, and most of the previous work has focused on email classification. [sent-44, score-0.852]
16 Carvalho and Cohen (2006) later employed N-gram sequence features to determine which N-grams are meaningfully related to different email speech acts with a goal towards improving their earlier email classification based on the writer’s intention. [sent-51, score-0.824]
17 (2006) performed speech act classification in email messages following a verbal re749 sponse modes (VRM) speech act taxonomy. [sent-53, score-1.605]
18 They also provided a comparison of VRM taxonomy with Searle’s taxonomy (Searle, 1976) of speech act classes. [sent-54, score-0.753]
19 Mildinhall and Noyes (2008) presented a stochastic speech act model based on verbal response modes (VRM) to classify email intentions. [sent-56, score-0.877]
20 Some research has considered speech act classes in other means of online conversations. [sent-57, score-0.782]
21 (2004) employed speech act profiling by plotting potential dialogue categories in a radar graph to classify conversations in instant messages and chat rooms. [sent-60, score-0.856]
22 Ravi and Kim (2007) employed speech act profiling in online threaded discussions to determine message roles and to identify threads with questions, answers, and unanswered questions. [sent-63, score-1.02]
23 They designed their own speech act categories based on their analysis of student interactions in discussion threads. [sent-64, score-0.753]
24 (2009) on semi-supervised speech act recognition in both emails and forums. [sent-66, score-0.788]
25 However, they trained their classifier on spoken telephone (SWBD-DAMSL corpus) and meeting (MRDA corpus) conversations and mapped the labelled dialog act classes of these corpora to 12 dialog act classes that they found suitable for email and forum text genres. [sent-68, score-1.246]
26 These dialog act classes (addressed as speech acts by them) are somewhat different from Searle’s original speech act classes. [sent-69, score-1.893]
27 Our goal was to try and use Searle’s original speech act definitions and categories as the basis for our work to the greatest extent possible, allowing for some interpretation as warranted by the WWW message board text genre. [sent-73, score-1.207]
28 For the purposes of defining and evaluating our work, we created detailed annotation guidelines for four of Searle’s speech act classes that commonly occur in message board posts: Commissives, Directives, Expressives, and Representatives. [sent-74, score-1.296]
29 We omitted the fifth of Searle’s original speech act classes, Declarations, because we virtually never saw declarative speech acts in our data set. [sent-75, score-1.379]
30 1 The data set used in our study is a collection of message board posts in the domain of veterinary medicine. [sent-76, score-0.794]
31 We designed our definitions and guidelines to reflect language use in the text genre of message board posts, trying to be as domain-independent as possible so that these definitions should also apply to message board texts rep- resenting other topics. [sent-77, score-0.938]
32 However, we give examples from the veterinary domain to illustrate how these speech act classes are manifested in our data set. [sent-78, score-0.967]
33 Commissives: A Commissive speech act occurs when the speaker commits to a future course of action. [sent-79, score-0.815]
34 However, statements indicating that an action will not occur because of circumstances beyond the writer’s control were considered to be factual statements and not speech acts (e. [sent-89, score-0.702]
35 Directives: A Directive speech act occurs when 1Searle defines Declarative speech acts as statements that bring about a change in status or condition to an object by virtue of the statement itself. [sent-93, score-1.379]
36 Directive speech acts are common in message board posts, especially in the initial post of each thread when the writer explicitly requests help or advice regarding a specific topic. [sent-97, score-1.219]
37 Furthermore, many Directive speech acts are not stated as a question but as a request for assistance. [sent-103, score-0.69]
38 Expressives: An Expressive speech act occurs in conversation when a speaker expresses his or her psychological state to the listener. [sent-107, score-0.792]
39 Expressive speech acts are common in message boards because writers often greet readers at the beginning of a post ( “Hi everyone! [sent-109, score-0.992]
40 Representatives: According to Searle, a Representative speech act commits the speaker to the truth of an expressed proposition. [sent-113, score-0.815]
41 In the veterinary domain, we considered sentences to be a Representative speech act when a doctor explicitly confirmed a diagnosis or expressed their suspicion or hypothesis about the presence (or absence) of a disease or symptom. [sent-116, score-1.13]
42 A sentence was only labelled as a Representative speech act if the writer explicitly expressed his belief. [sent-128, score-0.878]
43 2 Features for Speech Act Classification To create speech act classifiers, we designed a variety of lexical, syntactic, and semantic features. [sent-130, score-0.793]
44 We tried to capture linguistic properties associated with speech act expressions as well as discourse prop- erties associated with individual sentences and the message board post as a whole. [sent-131, score-1.292]
45 We also incorporated speech act word lists that were acquired from external resources, and used two types of semantic features to represent semantic entities associated with the veterinary domain. [sent-132, score-1.018]
46 Except for the semantic features, all of our features are domain-independent so should be able to recognize speech act sentences across different domains. [sent-133, score-0.869]
47 We experimented with domain-specific semantic features to test our hypothesis that Commissive speech acts can be associated with domain-specific semantic entities. [sent-134, score-0.706]
48 2 Speech Act Word Clues We collected speech act word lists (mostly verbs) from two external sources. [sent-176, score-0.753]
49 We also collected a list of speech act verbs published in (Wierzbicka, 1987). [sent-179, score-0.778]
50 The details for these speech act clue lists are given below. [sent-180, score-0.781]
51 Searle Keywords: We created one feature for each speech act class. [sent-182, score-0.789]
52 Wierzbicka Verbs: We created one feature that included 228 speech act verbs listed in the book “English speech act verbs: a semantic dictionary ” 752 (Wierzbicka, 1987)2. [sent-190, score-1.607]
53 We ran Basilisk over our collection of 15,383 veterinary message board posts to create a semantic lexicon for veterinary medicine. [sent-202, score-1.053]
54 The DISEASE/SYMPTOM lexicon appeared to be of good quality, but it did not improve the performance of our speech act classifiers. [sent-206, score-0.787]
55 3 Representative speech acts are typically associated with disease diagnoses 2openl ibrary . [sent-208, score-0.688]
56 The taggers were trained on 4,629 veterinary message board posts using 10 seed words for each semantic category (see (Huang and Riloff, 2010) for details). [sent-220, score-0.889]
57 Our speech act classidfieenrsc ues veadlu uthee ≥ tags a wsseoreci uasteedd . [sent-223, score-0.753]
58 Because a sentence can include multiple speech acts, we created a set of binary classifiers, one for each of the four speech act classes. [sent-229, score-1.081]
59 All four classifiers were applied to each sentence, so a sentence could be assigned multiple speech act classes. [sent-230, score-0.817]
60 Among other things, VIN hosts message board forums where veterinarians and other veterinary professionals can discuss issues and pose questions to each other. [sent-235, score-0.733]
61 Since the goal of our work was to study speech acts in sentences, and not the conversational dialogue between different writers, we used only the initial post of each thread. [sent-244, score-0.708]
62 In the next section, we explain how we manually annotated each sentence in our data set to create gold standard speech act labels. [sent-247, score-0.753]
63 Identifying speech acts is not always obvious, even to people, so we gave them detailed annotation guidelines describing the four speech act classes discussed in Section 3. [sent-250, score-1.432]
64 Each annotator was told to assign one or more speech act classes to each sentence (COM, DIR, EXP, REP), or to label the sentence as having no speech acts (NONE). [sent-253, score-1.408]
65 The vast majority of sentences had either no speech acts or at most one speech act, but a small number of sentences contained multiple types of speech acts. [sent-254, score-1.258]
66 In the first scheme, we discarded the small number of sentences that had multiple speech act labels and computed kappa on the rest. [sent-258, score-0.839]
67 However, over 70% of the sentences in our data set have no speech act at all, so NONE was by far the most common label. [sent-261, score-0.801]
68 Consequently, this agreement score does not necessarily reflect how consistently the judges agreed on the four speech act classes. [sent-262, score-0.848]
69 4Of the 594 sentences in these 50 posts, only 22 sentences contained multiple speech act classes. [sent-263, score-0.849]
70 In the second scheme, we computed kappa for each speech act category independently. [sent-264, score-0.82]
71 Table 2 shows the distribution of speech act labels in our data set. [sent-273, score-0.753]
72 Directive and Expressive speech acts are by far the most common, with nearly 26% of all sentences containing one of these speech acts. [sent-277, score-0.942]
73 1 Speech Act Filtering For our first experiment, we created a speech act filtering classifier to distinguish sentences that contain one or more speech acts from sentences that do not contain any speech acts. [sent-282, score-1.835]
74 having one or more speech acts were positive instances, and sentences labelled as NONE were negative instances. [sent-288, score-0.697]
75 For speech act filtering, we used the minimal lexsyn features plus the speech act clues and semantic features. [sent-298, score-1.663]
76 89F4 Table 3: Precision, Recall, F-measure for speech act filtering. [sent-302, score-0.753]
77 Table 3 shows the performance for speech act filtering with respect to Precision (P), Recall (R), and F-measure score (F). [sent-303, score-0.779]
78 7 The classifier performed well, recognizing 83% of the speech act sentences with 86% precision, and 95% of the expository (no 6This is the same feature set used to produce the results for row E of Table 4. [sent-304, score-0.961]
79 Table 4: Precision, Recall, F-measure for four speech act classes. [sent-306, score-0.777]
80 2 Speech Act Categorization BASELINES Our next set of experiments focused on labelling sentences with the four specific speech act classes: Commissive, Directive. [sent-311, score-0.849]
81 To assess the difficulty of identifying each speech act category, we created several simple baselines using our intuitions about each category. [sent-313, score-0.789]
82 For Commissives, we created a heuristic to cap- ture the most obvious cases of future tense (because Commissive speech acts represent a writer’s commitment toward a future course of action). [sent-314, score-0.71]
83 Directive speech acts are often questions, so we created a baseline system that labels all sentences containing a question mark as a Directive. [sent-320, score-0.738]
84 3, we created one classifier for each speech act category, and all four classifiers were applied to each sentence. [sent-348, score-0.883]
85 So a sentence could receive anywhere from 0-4 speech act labels indicating how many different types of speech acts appeared in the sentence. [sent-349, score-1.379]
86 Row C shows the results of adding the speech act clue words (see Section 3. [sent-366, score-0.781]
87 The speech act clue words produced an additional recall gain of 3% for Expressives and 2% for Representatives, although performance on Commissives dropped 2% in both recall and precision. [sent-369, score-0.831]
88 The Commissive speech act class benefitted the most from the rich feature set. [sent-386, score-0.753]
89 However, there is still ample room for improvement, illustrating that speech act classification is a challenging problem. [sent-390, score-0.753]
90 5 Conclusions Our goal was to identify speech act sentences in message board posts and to classify the sentences with respect to four categories in Searle’s (1976) speech act taxonomy. [sent-401, score-2.26]
91 We achieved good results for speech act filtering and the identification of Directive and Expressive speech act sentences. [sent-402, score-1.532]
92 We found that Representative and Commissive speech acts are much more difficult to identify, although the performance of our Commissive classifier substantially improved with the addition of lexical, syntactic, and semantic features. [sent-403, score-0.696]
93 Except for the semantic class information, our feature set is domain-independent and could be used to recognize speech act sentences 757 in message boards for any domain. [sent-404, score-1.144]
94 Ultimately, we would like to identify the speech act expressions themselves because some sentences contain speech acts as well as factual information. [sent-407, score-1.466]
95 Extracting the speech act expressions and clauses from message boards and similar text genres could provide better tracking of questions and answers in web forums and be used for summarization. [sent-408, score-1.129]
96 Using speech acts to categorize email and identify email gen- res. [sent-450, score-0.824]
97 Toward a stochastic speech act model of email behavior. [sent-483, score-0.852]
98 The construction of away messages: A speech act analysis. [sent-488, score-0.753]
99 Profiling student interactions in threaded discussions with speech act classifiers. [sent-493, score-0.753]
100 Using speech act theory to model conversations for automated classification and retrieval. [sent-531, score-0.795]
wordName wordTfidf (topN-words)
[('act', 0.485), ('acts', 0.358), ('speech', 0.268), ('board', 0.259), ('commissive', 0.22), ('directive', 0.196), ('searle', 0.196), ('message', 0.195), ('veterinary', 0.185), ('commissives', 0.173), ('posts', 0.155), ('directives', 0.127), ('expressives', 0.127), ('writer', 0.102), ('email', 0.099), ('lexsyn', 0.092), ('representatives', 0.092), ('expository', 0.09), ('boards', 0.08), ('doctor', 0.075), ('representative', 0.065), ('expressive', 0.06), ('drug', 0.056), ('carvalho', 0.05), ('sentences', 0.048), ('tense', 0.048), ('basilisk', 0.046), ('twitchell', 0.046), ('conversational', 0.045), ('conversations', 0.042), ('riloff', 0.042), ('semantic', 0.04), ('classifiers', 0.04), ('row', 0.04), ('speaker', 0.039), ('disease', 0.039), ('factual', 0.039), ('judges', 0.038), ('kappa', 0.038), ('questions', 0.038), ('action', 0.037), ('keywords', 0.037), ('post', 0.037), ('created', 0.036), ('pronoun', 0.036), ('appreciate', 0.036), ('request', 0.036), ('profiling', 0.036), ('threads', 0.036), ('vin', 0.036), ('emails', 0.035), ('vrm', 0.035), ('wierzbicka', 0.035), ('person', 0.035), ('lexicon', 0.034), ('cat', 0.034), ('forums', 0.033), ('agreement', 0.033), ('anyone', 0.031), ('readers', 0.031), ('classifier', 0.03), ('genres', 0.03), ('hi', 0.03), ('plan', 0.03), ('everyone', 0.03), ('diagnosed', 0.03), ('suspicion', 0.03), ('thelen', 0.03), ('vitor', 0.03), ('genre', 0.03), ('classes', 0.029), ('category', 0.029), ('recognize', 0.028), ('question', 0.028), ('clue', 0.028), ('diseases', 0.028), ('begins', 0.028), ('stroudsburg', 0.028), ('cohen', 0.026), ('taggers', 0.026), ('checks', 0.026), ('ellen', 0.026), ('filtering', 0.026), ('classify', 0.025), ('clues', 0.025), ('verbs', 0.025), ('recall', 0.025), ('four', 0.024), ('unigram', 0.024), ('drugs', 0.024), ('forum', 0.024), ('labelling', 0.024), ('labelled', 0.023), ('commits', 0.023), ('diagnoses', 0.023), ('greet', 0.023), ('mildinhall', 0.023), ('nastri', 0.023), ('nunamaker', 0.023), ('professionals', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
Author: Ashequl Qadir ; Ellen Riloff
Abstract: This research studies the text genre of message board forums, which contain a mixture of expository sentences that present factual information and conversational sentences that include communicative acts between the writer and readers. Our goal is to create sentence classifiers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. We conduct experiments using a wide variety of features, including lexical and syntactic features, speech act word lists from external resources, and domain-specific semantic class features. We evaluate our results on a collection of message board posts in the domain of veterinary medicine.
2 0.20379986 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
Author: Li Wang ; Marco Lui ; Su Nam Kim ; Joakim Nivre ; Timothy Baldwin
Abstract: Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled dependency relationships between posts, has the potential to greatly improve information access over web forum archives. In this paper, we present the task of parsing user forum threads to determine the labelled dependencies between posts. Three methods, including a dependency parsing approach, are proposed to jointly classify the links (relationships) between posts and the dialogue act (type) of each link. The proposed methods significantly surpass an informed baseline. We also experiment with “in situ” classification of evolving threads, and establish that our best methods are able to perform equivalently well over partial threads as complete threads.
3 0.093533754 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
Author: Yoav Artzi ; Luke Zettlemoyer
Abstract: Conversations provide rich opportunities for interactive, continuous learning. When something goes wrong, a system can ask for clarification, rewording, or otherwise redirect the interaction to achieve its goals. In this paper, we present an approach for using conversational interactions of this type to induce semantic parsers. We demonstrate learning without any explicit annotation of the meanings of user utterances. Instead, we model meaning with latent variables, and introduce a loss function to measure how well potential meanings match the conversation. This loss drives the overall learning approach, which induces a weighted CCG grammar that could be used to automatically bootstrap the semantic analysis component in a complete dialog system. Experiments on DARPA Communicator conversational logs demonstrate effective learning, despite requiring no explicit mean- . ing annotations.
4 0.063259594 38 emnlp-2011-Data-Driven Response Generation in Social Media
Author: Alan Ritter ; Colin Cherry ; William B. Dolan
Abstract: Ottawa, Ontario, K1A 0R6 Co l . Cherry@ nrc-cnrc . gc . ca in Redmond, WA 98052 bi l ldol @mi cro so ft . com large corpus of status-response pairs found on Twitter to create a system that responds to Twitter status We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed. After addressing these challenges, we compare approaches based on SMT and Information Retrieval in a human evaluation. We show that SMT outperforms IR on this task, and its output is preferred over actual human responses in 15% of cases. As far as we are aware, this is the first work to investigate the use of phrase-based SMT to directly translate a linguistic stimulus into an appropriate response.
5 0.057660788 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week
Author: Marjorie Freedman ; Lance Ramshaw ; Elizabeth Boschee ; Ryan Gabbard ; Gary Kratkiewicz ; Nicolas Ward ; Ralph Weischedel
Abstract: We report on empirical results in extreme extraction. It is extreme in that (1) from receipt of the ontology specifying the target concepts and relations, development is limited to one week and that (2) relatively little training data is assumed. We are able to surpass human recall and achieve an F1 of 0.5 1 on a question-answering task with less than 50 hours of effort using a hybrid approach that mixes active learning, bootstrapping, and limited (5 hours) manual rule writing. We compare the performance of three systems: extraction with handwritten rules, bootstrapped extraction, and a combination. We show that while the recall of the handwritten rules surpasses that of the learned system, the learned system is able to improve the overall recall and F1.
6 0.045414537 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
7 0.044677731 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
8 0.042807028 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
9 0.038719565 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
10 0.037330437 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
11 0.035119247 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
12 0.034701958 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
13 0.034677219 71 emnlp-2011-Identifying and Following Expert Investors in Stock Microblogs
14 0.033519659 125 emnlp-2011-Statistical Machine Translation with Local Language Models
15 0.032406073 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
16 0.032071561 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
17 0.031291872 14 emnlp-2011-A generative model for unsupervised discovery of relations and argument classes from clinical texts
18 0.031199746 2 emnlp-2011-A Cascaded Classification Approach to Semantic Head Recognition
19 0.030608598 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
20 0.030118043 121 emnlp-2011-Semi-supervised CCG Lexicon Extension
topicId topicWeight
[(0, 0.127), (1, -0.07), (2, 0.005), (3, 0.006), (4, -0.008), (5, -0.06), (6, -0.021), (7, 0.03), (8, 0.006), (9, -0.117), (10, -0.083), (11, -0.123), (12, -0.081), (13, -0.032), (14, -0.1), (15, 0.149), (16, 0.044), (17, -0.062), (18, 0.041), (19, -0.046), (20, 0.072), (21, 0.036), (22, -0.346), (23, -0.034), (24, 0.032), (25, 0.1), (26, -0.128), (27, -0.12), (28, -0.059), (29, -0.136), (30, 0.006), (31, -0.025), (32, -0.237), (33, 0.144), (34, -0.06), (35, 0.007), (36, 0.257), (37, 0.024), (38, 0.043), (39, -0.149), (40, -0.091), (41, -0.093), (42, -0.037), (43, 0.111), (44, -0.12), (45, 0.123), (46, -0.037), (47, -0.115), (48, -0.153), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.97068554 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
Author: Ashequl Qadir ; Ellen Riloff
Abstract: This research studies the text genre of message board forums, which contain a mixture of expository sentences that present factual information and conversational sentences that include communicative acts between the writer and readers. Our goal is to create sentence classifiers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. We conduct experiments using a wide variety of features, including lexical and syntactic features, speech act word lists from external resources, and domain-specific semantic class features. We evaluate our results on a collection of message board posts in the domain of veterinary medicine.
2 0.77711207 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
Author: Li Wang ; Marco Lui ; Su Nam Kim ; Joakim Nivre ; Timothy Baldwin
Abstract: Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled dependency relationships between posts, has the potential to greatly improve information access over web forum archives. In this paper, we present the task of parsing user forum threads to determine the labelled dependencies between posts. Three methods, including a dependency parsing approach, are proposed to jointly classify the links (relationships) between posts and the dialogue act (type) of each link. The proposed methods significantly surpass an informed baseline. We also experiment with “in situ” classification of evolving threads, and establish that our best methods are able to perform equivalently well over partial threads as complete threads.
3 0.26799768 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
Author: Yoav Artzi ; Luke Zettlemoyer
Abstract: Conversations provide rich opportunities for interactive, continuous learning. When something goes wrong, a system can ask for clarification, rewording, or otherwise redirect the interaction to achieve its goals. In this paper, we present an approach for using conversational interactions of this type to induce semantic parsers. We demonstrate learning without any explicit annotation of the meanings of user utterances. Instead, we model meaning with latent variables, and introduce a loss function to measure how well potential meanings match the conversation. This loss drives the overall learning approach, which induces a weighted CCG grammar that could be used to automatically bootstrap the semantic analysis component in a complete dialog system. Experiments on DARPA Communicator conversational logs demonstrate effective learning, despite requiring no explicit mean- . ing annotations.
4 0.25235727 32 emnlp-2011-Computing Logical Form on Regulatory Texts
Author: Nikhil Dinesh ; Aravind Joshi ; Insup Lee
Abstract: The computation of logical form has been proposed as an intermediate step in the translation of sentences to logic. Logical form encodes the resolution of scope ambiguities. In this paper, we describe experiments on a modestsized corpus of regulation annotated with a novel variant of logical form, called abstract syntax trees (ASTs). The main step in computing ASTs is to order scope-taking operators. A learning model for ranking is adapted for this ordering. We design features by studying the problem ofcomparing the scope ofone operator to another. The scope comparisons are used to compute ASTs, with an F-score of 90.6% on the set of ordering decisons.
5 0.24468933 38 emnlp-2011-Data-Driven Response Generation in Social Media
Author: Alan Ritter ; Colin Cherry ; William B. Dolan
Abstract: Ottawa, Ontario, K1A 0R6 Co l . Cherry@ nrc-cnrc . gc . ca in Redmond, WA 98052 bi l ldol @mi cro so ft . com large corpus of status-response pairs found on Twitter to create a system that responds to Twitter status We present a data-driven approach to generating responses to Twitter status posts, based on phrase-based Statistical Machine Translation. We find that mapping conversational stimuli onto responses is more difficult than translating between languages, due to the wider range of possible responses, the larger fraction of unaligned words/phrases, and the presence of large phrase pairs whose alignment cannot be further decomposed. After addressing these challenges, we compare approaches based on SMT and Information Retrieval in a human evaluation. We show that SMT outperforms IR on this task, and its output is preferred over actual human responses in 15% of cases. As far as we are aware, this is the first work to investigate the use of phrase-based SMT to directly translate a linguistic stimulus into an appropriate response.
6 0.23919049 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
7 0.19746554 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week
8 0.18785144 82 emnlp-2011-Learning Local Content Shift Detectors from Document-level Information
9 0.17140129 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
10 0.15882939 88 emnlp-2011-Linear Text Segmentation Using Affinity Propagation
11 0.15466167 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
12 0.15276846 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
13 0.15069498 139 emnlp-2011-Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter
14 0.14583841 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
15 0.14573103 42 emnlp-2011-Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora
16 0.14413275 117 emnlp-2011-Rumor has it: Identifying Misinformation in Microblogs
17 0.14408208 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
18 0.14375056 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
19 0.14205597 34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images
20 0.14064489 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
topicId topicWeight
[(17, 0.035), (18, 0.016), (23, 0.09), (36, 0.016), (37, 0.025), (45, 0.061), (46, 0.188), (53, 0.018), (54, 0.022), (57, 0.025), (62, 0.04), (64, 0.015), (66, 0.026), (69, 0.016), (79, 0.046), (87, 0.184), (96, 0.033), (98, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.75457209 27 emnlp-2011-Classifying Sentences as Speech Acts in Message Board Posts
Author: Ashequl Qadir ; Ellen Riloff
Abstract: This research studies the text genre of message board forums, which contain a mixture of expository sentences that present factual information and conversational sentences that include communicative acts between the writer and readers. Our goal is to create sentence classifiers that can identify whether a sentence contains a speech act, and can recognize sentences containing four different speech act classes: Commissives, Directives, Expressives, and Representatives. We conduct experiments using a wide variety of features, including lexical and syntactic features, speech act word lists from external resources, and domain-specific semantic class features. We evaluate our results on a collection of message board posts in the domain of veterinary medicine.
2 0.73544031 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
Author: Yoav Artzi ; Luke Zettlemoyer
Abstract: Conversations provide rich opportunities for interactive, continuous learning. When something goes wrong, a system can ask for clarification, rewording, or otherwise redirect the interaction to achieve its goals. In this paper, we present an approach for using conversational interactions of this type to induce semantic parsers. We demonstrate learning without any explicit annotation of the meanings of user utterances. Instead, we model meaning with latent variables, and introduce a loss function to measure how well potential meanings match the conversation. This loss drives the overall learning approach, which induces a weighted CCG grammar that could be used to automatically bootstrap the semantic analysis component in a complete dialog system. Experiments on DARPA Communicator conversational logs demonstrate effective learning, despite requiring no explicit mean- . ing annotations.
3 0.65425247 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
Author: Zhenghua Li ; Min Zhang ; Wanxiang Che ; Ting Liu ; Wenliang Chen ; Haizhou Li
Abstract: Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6% when using automatic POS tags instead of gold ones. To solve this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We design several joint models and their corresponding decoding algorithms to incorporate different feature sets. We further present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on Chinese Penn Treebank 5 show that our joint models significantly improve the state-of-the-art parsing accuracy by about 1.5%. Detailed analysis shows that the joint method is able to choose such POS tags that are more helpful and discriminative from parsing viewpoint. This is the fundamental reason of parsing accuracy improvement.
4 0.52519566 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Author: Xinyan Xiao ; Yang Liu ; Qun Liu ; Shouxun Lin
Abstract: Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also alleviates the oracle selection problem by ensuring that a forest always contains derivations that exactly yield the reference translation. With millions of features trained on 519K sentences in 0.03 second per sentence, our system achieves significant improvement by 0.84 BLEU over the baseline system on the NIST Chinese-English test sets.
5 0.50474435 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
Author: Li Wang ; Marco Lui ; Su Nam Kim ; Joakim Nivre ; Timothy Baldwin
Abstract: Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled dependency relationships between posts, has the potential to greatly improve information access over web forum archives. In this paper, we present the task of parsing user forum threads to determine the labelled dependencies between posts. Three methods, including a dependency parsing approach, are proposed to jointly classify the links (relationships) between posts and the dialogue act (type) of each link. The proposed methods significantly surpass an informed baseline. We also experiment with “in situ” classification of evolving threads, and establish that our best methods are able to perform equivalently well over partial threads as complete threads.
6 0.4789601 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week
7 0.47330248 38 emnlp-2011-Data-Driven Response Generation in Social Media
8 0.47281364 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
9 0.46722814 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
10 0.46324766 94 emnlp-2011-Modelling Discourse Relations for Arabic
11 0.45777723 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
12 0.45759022 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
13 0.45536327 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
14 0.45416513 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
15 0.45089716 128 emnlp-2011-Structured Relation Discovery using Generative Models
16 0.45056358 77 emnlp-2011-Large-Scale Cognate Recovery
17 0.45015201 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
18 0.44953212 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.44490525 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
20 0.44445238 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax